Most GEO advice treats optimisation as a universal intervention. Add statistics. Improve structure. Make content more authoritative. Apply across your site. A March 2026 preprint from researchers at Virginia Tech and the Chinese Academy of Sciences challenges that premise directly: for long-tail, specialist content, generic optimisation methods can make citation rates worse, not better.
That is the most important finding in this paper. Not the performance numbers — the direction of harm.
Evidence tier: Preprint finding — directional, not yet peer-reviewed. Tian, Zhihua, Yuhan Chen, Yao Tang, Jian Liu, and Ruoxi Jia (2026), "Diagnosing and Repairing Citation Failures in Generative Engine Optimization," arXiv:2603.09296. Submitted March 10, 2026. Single research group; the AgentGEO system has not been independently replicated.
What the earlier research assumed — and what this paper challenges
The original GEO research from Aggarwal et al. 2024 — the peer-reviewed study that established statistics, source citations, and readability as citation predictors — applied a set of optimisation methods to documents and measured the aggregate outcome. The implicit assumption in that design, and in most practitioner GEO advice that followed it, is that the same methods work in the same direction across content types. Optimise for the features that predict citation. Apply them everywhere.
The Tian et al. preprint tests that assumption. Their finding: generic methods apply uniform rewriting rules regardless of what is actually causing a specific document to fail. For some documents, that mismatch is benign. For others — particularly long-tail, niche content with specific structural or relevance problems — applying the wrong fix to the wrong failure mode reduces citation rates.
The distinction the paper introduces is between measuring contribution (how much a document influenced a response) and citation (whether the document is attributed by name). Most practitioner GEO tools and most academic benchmarks measure contribution. The paper argues that optimising for the wrong measurement produces the wrong outcomes for content types where the two diverge.
The citation failure mode taxonomy
The paper's diagnostic foundation is a taxonomy of citation failure modes — a classification of the distinct reasons a document fails to be cited by a generative engine. Before any fix is applied, the AgentGEO system identifies which failure mode a document faces. The repair is targeted to the diagnosis.
This is the structural argument against generic optimisation. A document that fails because its content is not retrieved has a different problem from a document that is retrieved but not attributed. A document with an authority deficit has a different problem from one with a relevance mismatch. Applying the same rewriting rules to all four cases does not address three of them — and may worsen the underlying problem if the applied changes shift the document further from the specific properties a generative engine needs to cite it.
The failure mode taxonomy itself is new. It was introduced by the paper's authors and has not been externally validated. The logic is sound and consistent with what we know about how generative engines process content — but the taxonomy categories are the authors' framework, not a received field standard. That distinction matters when evaluating how confidently to act on it.
What the performance data shows
The benchmark is MIMIQ — a document-centric evaluation with real LLM engines, not a simulated retrieval proxy. On this benchmark, generic GEO methods produced approximately 25% average improvement in citation rates across the method set. AgentGEO, applying targeted repairs after diagnosing the specific failure mode, produced 40%+ improvement while modifying only 5% of content.
Those figures come from the same study on the same benchmark — a direct comparison. The 40%+ result is a strong claim from a system the authors designed. It warrants the preprint caveat fully. But the 40% vs 25% gap is not the finding we think matters most for independent niche operators.
The finding that matters is this: generic methods produce their approximately 25% average across the full document set. That average includes documents where generic optimisation works and documents where it causes harm. For long-tail content, the paper establishes that some documents face challenges generic rewriting cannot address — and applying it anyway moves the citation rate in the wrong direction.
Why this matters specifically for niche content
Generic GEO optimisation is designed around the content types that dominate GEO research benchmarks: news articles, general informational content, broad-query responses. The interventions that work for that content type — adding statistics, increasing readability, improving structural clarity — are calibrated to a different content profile than specialist, long-tail pages covering narrow topics for informed audiences.
Niche content has properties that generic optimisation does not account for. Specialist terminology that looks like jargon to a general-purpose optimisation system is load-bearing evidence for an audience that knows the field. Dense, precise paragraphs that fail a readability heuristic may be exactly what a generative engine needs to cite the page accurately. The specificity that makes long-tail content valuable to the right reader is also what makes it vulnerable to blanket rewriting.
The Tian et al. harm finding is stated at the system level. The paper does not identify which specific GEO tactics are most harmful for which long-tail content types — that granularity is not in the current data. What the finding establishes is that the harm exists and is measurable. For independent operators producing specialist content, the takeaway is concrete: applying generic GEO advice without understanding what is actually causing your citation failures is not a neutral action.
What this study does not tell us
The paper is a preprint submitted in March 2026. It has not completed peer review. The AgentGEO system was designed by the same researchers who evaluated it — independent replication does not yet exist. The 40%+ improvement figure is a strong claim from a single study; treat it as a direction and an order of magnitude, not a deliverable outcome.
The failure mode taxonomy is the paper's own framework. Whether those specific categories map cleanly onto the full range of content types practitioners actually publish has not been tested outside this research group's work. The taxonomy's logic is persuasive; its external validity is an open question.
The harm finding — generic optimisation reduces citation rates for some long-tail content — is the most epistemically solid result in the paper, because it is consistent with theoretical predictions about what happens when a mismatched fix is applied to a specific failure mode. That consistency makes it credible even at preprint status. The specific magnitude of harm, and which content types are most exposed, remain open.
The Psytable tools this finding validates
The AgentGEO argument is that diagnosis must precede repair. Apply the fix that matches the failure, not the fix that works on average. Two Psytable tools operate on the same principle — surfacing what is happening with a specific page rather than recommending uniform changes across your content.
The Absorption Analyser measures a page's content dimensions against the absorption factors identified in the Zhang et al. 2026 dataset — statistics presence, definitional language, comparative language, structural properties. The output is page-specific. It tells you what your page has and what it lacks relative to the content properties associated with higher absorption depth. It does not recommend the same changes to every page.
The Platform Variance tool surfaces how a piece of content is likely to perform differently across citation environments based on its current content profile. Platform behaviour diverges significantly — what works for ChatGPT's depth-first citation pattern is not the same as what works for Perplexity's breadth-first pattern (see our E.04 post on Zhang et al. 2026). Neither tool is an AgentGEO-equivalent diagnostic system. Both surface page-specific factors rather than generic uplift recommendations — which is where the Tian et al. paper says the useful work lives.
References
Tian, Zhihua, Yuhan Chen, Yao Tang, Jian Liu, and Ruoxi Jia (2026), "Diagnosing and Repairing Citation Failures in Generative Engine Optimization," preprint. arXiv: 2603.09296
Aggarwal, Manas et al. (2024), "GEO: Generative Engine Optimization," ACM SIGKDD 2024. arXiv: 2311.09735
Zhang, Kai, He Xinyue, and Yao Jingang (2026), "From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms," preprint. arXiv: 2604.25707
Diagnose before you optimise.
The Absorption Analyser scores the content factors associated with higher citation depth on a page-by-page basis — so you know what your specific page is missing before applying any fix.