Agentic Workflow on limdem.io: how eight AI specialists and a human editor co‑create deep popularization articles

1. Why Even the Most Advanced Language Models Struggle with “Superficial” Texts

“Where does the model’s ability end and human intervention begin?” – this question resonates with anyone who has encountered the long‑form outputs of generative models that brag about “understanding” the world yet still slip into clichés and vague statements. Accuracy, coherence, and stylistic purity are often only promises that fall apart in practice when the model tackles complex subjects such as AI ethics, quantum physics, or the philosophy of the future.

A typical hallucination example is a paragraph on AI ethics that cites a non‑existent study claiming 70 % of the public supports fully autonomous vehicle driving—a statistic that has never been published. This fabricated figure is easy to spot with fact‑checking, but without that step it can mislead readers.

The cause is not a lack of compute power but the architecture itself: a single large model is forced to handle every task at once—from gathering facts to writing introductory paragraphs and spell‑checking. This universality creates a “cognitive bottleneck,” where the model spends all its capacity on word generation and loses the ability to critically evaluate its own output.

Consequently, developers and editors are turning to a multi‑agent approach—splitting the work among specialized components. The idea is simple: if a human editorial team includes an editor, a copy‑editor, a subject‑matter expert, and a designer, why shouldn’t AI have the same division of labor? Below we dissect the concrete implementation of this principle that limdem.io uses. It isn’t a commercial product but an experimental workflow whose results you are reading right now.

2. Architecture Overview: Eight AI Specialists and a Human Editor

“Who exactly forms the chain that eventually turns an idea into a finished article?” – limdem.io employs eight specialized agents complemented by a single human editor who provides final oversight and infrastructure. Each specialist has a clearly defined role, much like a traditional magazine newsroom. The table below outlines their functions:

Specialist	Name	Primary Task	Key Output Form
1	Planner	Creates an analytical framework, identifies key topics, maps terminology, sets section structure, and flags potential factual risks	Analytical reasoning (no article text)
2	Proofreader (plan review)	Checks the plan, adds missing perspectives, proposes verification queries, validates terminology	Editorial note on the plan
3	Writer – first draft	Using the plan and notes, writes a complete first version of the article while adhering to style requirements (no Anglicisms, authentic opening, open‑ended conclusion)	Draft in Markdown
4	Writer – self‑critique	Assesses its own draft from four angles (depth, structure, language, factual risks) and produces a prioritized list of edits	Draft analysis
5	Writer – revision	Executes targeted changes to weak spots while preserving strong passages unchanged	Second draft
6	Checker A – technical facts	Looks for incorrect or unverified technical claims (numbers, technology names, procedures)	Technical error report
7	Checker B – logical consistency	Identifies contradictory arguments, missing logical connectors, deduction flaws	Logical defect report
8	Checker C – terminology and style	Spots Anglicisms, literal translations, terminology inconsistencies, stylistic clichés	Language deficiency report
9	Proofreader (Fact-check Analysis)	Provides the Supervisor with a perspective on the previous fact-checking reports.	Editorial Note on fact-checks
10	Supervisor	Highest authority: evaluates all reports, decides which findings to fix, which to reject, and performs the final rewrite	Final article version
11	Translator (optional)	Turns the Czech text into fluent American English while preserving terminology and structure	English version

In addition to these specialists, a human editor runs the infrastructure, monitors the status of each task, and intervenes when technical problems arise (e.g., API failures). Human oversight is also essential for ethical assessment—no fully automated chain can replace human judgment on sensitive topics.

3. Why We Need a Multi‑Agent Approach

3.1 Technical Limits of a Single Model

Two core problems are hallucination (fabrication) and lack of contextual consistency. Hallucination occurs when the model produces information unsupported by any source—e.g., a bogus statistic or an incorrect date. Contextual inconsistency shows up when a single piece of text flips between contradictory claims or repeatedly shifts tone and style.

Another limitation is the fixed context window. Older or smaller commercial LLMs operated with windows of 8 k–32 k tokens; today’s models typically offer 128 k–200 k tokens, while frontier models can handle up to a million tokens. Even with larger windows, extensive texts can cause the model to lose coherence—the nominal window length does not guarantee effective use of the entire context. When generating long articles, earlier portions can fall out of the window, and the model loses track of the overall structure.

If the model is simultaneously asked to search for facts, build an argument, ensure linguistic accuracy, and adapt style for a specific audience, its capacity becomes overloaded, leading to the errors described above.

3.2 Benefits of Task Division

Dividing work among multiple specialists lets each focus on a narrow, well‑defined input‑output pair. One specialist may handle information analysis, another language editing, another fact‑checking, and so on. This reduces the risk that a single component produces unchecked output. Moreover, each specialist can receive specific input templates and memory instructions that eliminate repeated mistakes and promote a consistent style.

4. Technical Infrastructure: OpenWebUI as Orchestrator

“How are the individual AI components connected and where does the whole system live?” – the workflow’s core runs on OpenWebUI, an open‑source interface that serves as both a graphical and programmatic environment for communicating with LLMs via various providers’ APIs. OpenWebUI enables:

Model and prompt‑template management – users can tweak prompts in real time, set output length, temperature, and other parameters, which is crucial for distinct tasks (planning vs. copy‑editing).
Parallel request orchestration – thanks to asynchronous calls, factual checks (Checker A, B, C) can run simultaneously, saving time and making efficient use of API limits.
Logging and contextual memory – OpenWebUI stores conversation histories and can pass memory tokens (e.g., a list of false positives that the supervisor injects into subsequent iterations).
Extension with external (open‑weight) models – besides proprietary APIs, open‑weight models (e.g., Llama, Mistral) can be attached, increasing transparency and reducing reliance on a single platform.

OpenWebUI also provides a dashboard for monitoring workflow execution – the editor can see which specialists are active, how many iterations have occurred, and the current status of fact‑checking. This is essential because the process is designed to be fully automated yet allows manual intervention when a failure or unexpected error occurs.

5. User Input and Data Flow: From Idea to Plan

“How does a simple topic become a structured work chain?” – the process starts with a user prompt. A reader (or editor) writes a topic, for example “The Future of Agentic Workflows in AI,” and can attach optional details separated by a special delimiter. These details may include desired tone, target audience, requirement to include specific examples, or preferred length.

Example block:

REQUIREMENTS:

tone: analytical
length: ≥ 6000 words
forbid: “in the final analysis”, “imagine”

All of this information is wrapped in a block labeled “binding editorial brief” and automatically injected into the input templates for the Planner, Writer, and Proofreader. This guarantees that key instructions (e.g., “avoid Anglicisms”, “preserve an authentic opening”) are never lost at any stage.

Once the input block is submitted, OpenWebUI launches the Planner, which produces analytical reasoning. The output of this step becomes the skeletal backbone of the article—identifying the most interesting angles, pinpointing information gaps, and flagging terminological challenges. Next, the Proofreader reviews the plan, adds missing perspectives, and creates editorial notes. Only after these two inputs are ready does the Writer begin drafting.

This approach ensures that each step has a clearly defined input and output, eliminating the risk that the requester’s requirements “dissolve” into an ambiguous context.

6. Phases 1‑3: Preparation and First Draft

6.1 Planner – Analytical Reasoning

“What is the core of preparation and why isn’t the plan itself the article?” – the Planner does not produce any article text; it generates a structured framework. In this step it:

Identifies information gaps – areas with few published sources but high potential for new insight.
Maps target-language terminology to avoid Anglicisms (relevant for the Czech version: e.g., ‘trénovací data’ instead of ‘training data’).
Flags factual risk zones – spots where numbers, model names, or citations will likely need verification.
Proposes structure – at least eight sections, each delivering a unique informational value, to meet the required length (over 6 000 words).

The output is analytical reasoning – a list of points, not article prose. This document acts as a “red plan” for subsequent stages.

6.2 Proofreader – Plan Review

“How do we ensure the plan is not only well‑structured but also factually reliable?” – the Proofreader acts as the first editor. It verifies that the plan covers all important angles (technical, philosophical, societal) and flags potential hallucinations. If web searching is permitted, it performs a quick verification of numbers and citations and attaches the results to its notes.

Key outputs:

Coverage comments – added perspectives (e.g., ethical implications of multi‑agent workflows).
Fact‑risk alerts – concrete questions that must be answered in Phase 4.
Terminology suggestions – e.g., replace “deep dive” with “in‑depth analysis”.

6.3 Writer – First Draft and Self‑Critique

“How does a plan become readable text and how is its quality evaluated afterward?” – the Writer receives three inputs: the plan, editorial notes, and the user’s original request. Based on these, it creates a complete first draft in Markdown, adhering to:

Forbidden phrasing – no Anglicisms, no introductory definitions, no clichés like “in the final analysis”.
Authentic opening – a question or contradiction that pulls the reader in (e.g., “What if a model tells you AI already runs 80 % of the world’s energy, yet no such statistic exists?”).
Conclusion that opens a new perspective – not merely a summary, but a call to further thought.

After the draft is finished, the Writer self‑critiques from four angles (depth, structure, language, factual risk) and compiles a prioritized edit list (3‑5 top issues). This step does not produce new text but supplies concrete guidance for revision. The writer’s final step is to produce a revised article based on their own guidelines.

7. Iterative Feedback Loop: Fact‑Checking and Correction

“Why isn’t a single check enough and how do we achieve a ‘clean’ article?” – after the first revision, an iterative loop (Phases 4‑6) kicks in. Its goal is to gradually eliminate all hallucinations and linguistic shortcomings while preserving strong passages.

7.1 Fact‑check – Three Specialists

Checker A – technical facts – verifies numbers, model names, procedures.
Checker B – logical consistency – ensures the argumentation contains no contradictions.
Checker C – terminology and style – hunts for prohibited Anglicisms, literal translations, and stylistic clichés.

Each checker works independently, but their outputs are structured (problem type, citation, description, suggested fix, severity, confidence).

7.2 Proofreader – detection phase and conditional analysis of fact-checks

After receiving reports from all three checkers, the Proofreader does not assess accuracy in the first step, but merely detects the presence of findings. If at least one finding of high or medium severity is identified, a full analysis of the fact-checkers’ output by the proofreader and a rewriting process by the supervisor are triggered. If all findings are low‑severity or absent, the loop ends and the text is deemed “clean”.

7.3 Supervisor – Decision and Rewrite

The Supervisor is the top authority. It receives two inputs: the raw checker reports and the Proofreader’s analysis. Based on these, it decides which findings are:

High priority – must be fixed.
Medium priority – fix if they don’t threaten structure.
Uncertain – decide based on judgment.
False positive – explicitly mark as not to be corrected.

For each fix, the Supervisor creates a structured plan (location, description, source, scope of edit) and then carries out the rewrite.

7.4 False‑Positive Memory

To avoid repeatedly flagging the same issues, the Supervisor generates a section called “FC_MEMORY_DOES_NOT_FIX” containing all rejected findings with explanations (e.g., “false alarm – acceptable simplification”). This section is passed to all checkers from the second iteration onward as a concrete instruction not to raise those items again. This dramatically cuts redundant alerts and speeds up reaching a clean state.

7.5 Loop Termination

The loop ends either when the Proofreader’s detection step finds no issues or after reaching a maximum number of iterations (usually ten). In limdem.io’s experiment, a clean state was achieved after four iterations on the test topic AI ethics, suggesting the process can be viable.

8. Hierarchy of Authority and Protective Mechanisms

“Why is it crucial that the Supervisor has absolute authority and what other safeguards protect the output from degradation?” – the hierarchy is designed to minimize democratic failure—a scenario where decision‑making is spread across too many voices, leading to inconsistent results. By giving the Supervisor the sole power to rewrite text, the final version aligns with both the requester’s specifications and editorial standards.

Additional safeguards include:

Ban on placeholder text – the prompt explicitly forbids expressions like “SECTION_UNCHANGED”. Every specialist must copy content in full, eliminating the possibility of skipping problematic passages.
False‑positive memory – prevents repeated reporting of the same items.
Iteration cap – stops endless cycles and forces focus on the most critical problems.
Transparent logging – OpenWebUI retains the complete history of prompts and responses, enabling audits and post‑mortems.

Together, these principles make the workflow act like a self‑sustaining editorial chain that can eliminate typical LLM weaknesses without losing human oversight.

9. Translation and Transcreation: From Czech to English

“How does Czech text change when it’s meant for English‑speaking readers?” – when an English version is needed, the Translator enters the chain. This specialist does not perform a literal translation but a transcreation—adapting idioms, rhythm, and terminology so the text feels like an original American‑English piece.

Key traits of transcreation:

Idiom localization – e.g., Czech “mít oči na stopkách” becomes “to keep a close eye on”.
Structure preservation – headings, lists, and links stay in the same hierarchy.
Technical term retention – expressions like “LLM”, “API”, and “OpenWebUI” remain unchanged because they are standard in English contexts.

As with previous stages, the Translator is also bound by the no‑placeholder rule and must fully rewrite each section.

10. Human Editor and the Experimental Nature of the Workflow

“Where does automation end and human responsibility begin?” – despite the high degree of automation, the human editor remains indispensable. Their duties include:

Infrastructure monitoring – maintaining OpenWebUI, managing API keys, handling outages.
Ethical review – ensuring the article contains no inappropriate or sensitive material that could be problematic.
Final copy‑edit – after all automated checks, the editor performs a last read‑through to confirm the text matches the portal’s stylistic voice.

It is important to stress that the entire workflow is still experimental. Limdem.io has not yet conducted systematic studies quantifying improvement over traditional single‑LLM writing. What can be said, however, is that the article you are reading right now was produced by this exact chain; if it feels accurate and stylistically balanced, that is the first practical proof that the approach works.

11. What This Means for Readers: Practical Impact and Future Outlook

“What benefit does this technical experiment bring to the audience?” – for limdem.io’s readers, the workflow offers a guarantee of higher quality thanks to:

Deep planning – every article starts with a robust skeleton that ensures no key angle is omitted.
Multiple fact‑checks – the likelihood of an incorrect figure or unverified quote appearing is significantly reduced; research on multi‑agent systems suggests that adding independent verification layers can lower error rates, though exact results depend on implementation and content type.
Linguistic cleanliness – prohibited Anglicisms and literal translations are systematically removed, yielding cleaner prose.

Each published piece includes a footer listing the LLMs used and a brief workflow description (Planner, Writer, Fact‑check, etc.). This footer acts as a transparent disclosure for readers who want to understand how the text was generated and possibly experiment with AI writing themselves.

Looking ahead, we can anticipate:

Additional validation layers – e.g., automatic citation pulling from scholarly databases or enhanced ethical screening.
Adaptive iteration counts – the system could decide dynamically when quality is sufficient instead of being capped at a fixed number of cycles.
Multilingual model integration – enabling single‑workflow publishing in multiple languages, aligning with limdem.io’s multilingual vision.

12. Conclusion: An Open Call for Reflection

“If the specialists became readers, what would they say about their own work?” – this question pushes us toward a deeper contemplation of the reciprocal relationship between creator and publication. In our case, the specialists are tools, and we measure their success not only by error count but also by how readers perceive the depth and accuracy of the argumentation.

The article you are reading is living evidence that task division among eight AI specialists and a human can produce a text whose structure and linguistic level meet demanding popular‑science standards. Yet the open question remains: how far can automation go before we hit new limits—such as ethical dilemmas where decisions about what is appropriate to publish must be made?

We therefore invite every reader to go beyond passive consumption and critically examine the creation process. If the style, structure, or linguistic purity impressed you, scroll to the footer for a detailed list of LLMs and the workflow description. Your feedback could be the next catalyst for improving this experiment and building even more robust tools that serve not only science communication but any form of human discourse.

Imagine a future where every article follows this layered model: automatic planning, independent fact‑checking, and a human ethical gate. Where could such an editorial structure thrive beyond web portals? What impact would it have on educational materials, journalism, or scholarly publishing? These are the questions we can start tackling together.

Content Transparency & AI Assistance

How this article was created:
This article was generated with artificial intelligence assistance. Specifically, we employed an agentic workflow composed of eight language models running in the OpenWebUI application. Our editorial team established the topic, research direction, and primary sources; the AI then generated the initial structure and draft text.

Want to learn more about the process?

Read our article:
Agentic Workflow on limdem.io: how eight AI specialists and a human editor co‑create deep popularization articles

Editorial review and fact-checking:

✓ The text was editorially reviewed
✓ Fact-checking: All key claims and data were verified
✓ Fact corrections and enhancement: Our editorial team corrected factual inaccuracies and added subject matter expertise

AI model limitations (important disclaimer):
Language models can generate plausible-sounding but inaccurate or misleading information (known as “hallucinations”). We therefore strongly recommend:

Verifying critical facts in primary sources (official documentation, peer-reviewed research, subject matter authorities)
Not relying on AI-generated content as your sole information source for decision-making
Applying critical thinking when reading

Used language models:

Role	Model	License
🧠 Planner	deepseek-ai/DeepSeek-R1	MIT License
🔍 Proofreader	zai‑org/glm-5:thinking	MIT License
✍️ Writer	openai/gpt‑oss-120b	Apache 2.0
🔍 Fact‑checker A	deepseek/deepseek‑v3.2	MIT License
🧠 Fact‑checker B	moonshotai/kimi‑k2.5:thinking	Modified MIT License
📝 Fact‑checker C	qwen/qwen3.5‑397b‑a17b‑thinking	Apache 2.0
👔 Supervisor	nousresearch/hermes-4-405b	Llama 3.1 Community License
🌍 Translator	openai/gpt‑oss-120b	Apache 2.0

Source code of the workflow used:
limdemioarticlewriterprov25frontier.py

Agentic Workflow on limdem.io: how eight AI specialists and a human editor co‑create deep popularization articles

1. Why Even the Most Advanced Language Models Struggle with “Superficial” Texts

2. Architecture Overview: Eight AI Specialists and a Human Editor

3. Why We Need a Multi‑Agent Approach

3.1 Technical Limits of a Single Model

3.2 Benefits of Task Division

4. Technical Infrastructure: OpenWebUI as Orchestrator

5. User Input and Data Flow: From Idea to Plan

6. Phases 1‑3: Preparation and First Draft

6.1 Planner – Analytical Reasoning

6.2 Proofreader – Plan Review

6.3 Writer – First Draft and Self‑Critique

7. Iterative Feedback Loop: Fact‑Checking and Correction

7.1 Fact‑check – Three Specialists

7.2 Proofreader – detection phase and conditional analysis of fact-checks

7.3 Supervisor – Decision and Rewrite

7.4 False‑Positive Memory

7.5 Loop Termination

8. Hierarchy of Authority and Protective Mechanisms

9. Translation and Transcreation: From Czech to English

10. Human Editor and the Experimental Nature of the Workflow

11. What This Means for Readers: Practical Impact and Future Outlook

12. Conclusion: An Open Call for Reflection

Content Transparency & AI Assistance

Used language models:

Be First to Comment

Leave a Reply Cancel reply

Agentic Workflow on limdem.io: how eight AI specialists and a human editor co‑create deep popularization articles

1. Why Even the Most Advanced Language Models Struggle with “Superficial” Texts

2. Architecture Overview: Eight AI Specialists and a Human Editor

3. Why We Need a Multi‑Agent Approach

3.1 Technical Limits of a Single Model

3.2 Benefits of Task Division

4. Technical Infrastructure: OpenWebUI as Orchestrator

5. User Input and Data Flow: From Idea to Plan

6. Phases 1‑3: Preparation and First Draft

6.1 Planner – Analytical Reasoning

6.2 Proofreader – Plan Review

6.3 Writer – First Draft and Self‑Critique

7. Iterative Feedback Loop: Fact‑Checking and Correction

7.1 Fact‑check – Three Specialists

7.2 Proofreader – detection phase and conditional analysis of fact-checks

7.3 Supervisor – Decision and Rewrite

7.4 False‑Positive Memory

7.5 Loop Termination

8. Hierarchy of Authority and Protective Mechanisms

9. Translation and Transcreation: From Czech to English

10. Human Editor and the Experimental Nature of the Workflow

11. What This Means for Readers: Practical Impact and Future Outlook

12. Conclusion: An Open Call for Reflection

Content Transparency & AI Assistance

Used language models:

Be First to Comment

Leave a Reply Cancel reply

6. Phases 1‑3: Preparation and First Draft