GEO Optimization Is Conditional — Three Preprints, One Argument

The advice has been consistent: add statistics, improve structure, use authoritative language. Three preprints published between March and April 2026 did not overturn that advice. They qualified it — and the qualifications change how the advice applies to your specific content.

Taken together, the three papers shift the operative question.

Not: "Should I optimise?" That question is settled. Optimisation affects citation outcomes. The field established that.

The question is: "Optimize what, for what type of content, serving which audience?"

Evidence tier — applies to all three source papers: All three papers cited in this synthesis are preprints — directional findings, not yet peer-reviewed. Yu et al. (arXiv:2603.29979), Tian et al. (arXiv:2603.09296), and Liu & Xu (arXiv:2604.19113). None has been independently replicated. Treat figures as directional preprint evidence, not confirmed benchmarks. The foundational study — Aggarwal et al. 2024, arXiv:2311.09735 — is peer-reviewed.

Where the research started: structure as an independent lever

The original GEO finding from Aggarwal et al. 2024 — peer-reviewed, published at KDD — established that a bundle of content modifications improves citation visibility: adding statistics, adding source citations, improving readability and authority. That study measured the combined effect. It could not isolate any single contributor.

Yu et al. 2026 — the GEO-SFE preprint — addressed that directly. By holding semantic content constant and varying only structural treatment, the study isolated structure as an independent lever. Same words. Same claims. Same facts. Different architecture. The result across six generative engines: a mean 17.3% improvement in citation rates from structural changes alone.

The three-tier hierarchy the paper introduces — macro-structure (document architecture), meso-structure (information chunking), micro-structure (visual emphasis) — gave the field its first controlled decomposition of structure as a GEO variable.

That 17.3% figure measures structural contribution in isolation — structure as the only variable, content held constant. It does not measure what structure adds to a page that is already content-optimised. That is a different question, which a separate 2026 preprint answers differently.

The foundational question at this stage: "What works?" Answer: structural features produce measurable citation lift, independently of content. That held for the six engines Yu et al. tested.

The first conditionality: structure versus content in a complete system

Liu & Xu 2026 — the FeatGEO preprint — ran an ablation study inside a fully optimised multi-feature system. Within that system, individual feature types were removed to measure their marginal contribution. Content features dominated. Statistics density and citation density produced the largest individual gains. Structural features contributed consistently — but their marginal contribution within the fully optimised system was below 0.35% absolute deviation.

That figure requires a precise reading. It is not a contradiction of Yu et al. It is a measurement from a different experimental context.

Yu et al.'s 17.3% measures the gain from structure when structure is the only lever. FeatGEO's <0.35% measures the marginal contribution of structure when content features are already present and dominant. These are not the same quantity and cannot be compared as magnitudes. The picture they describe together: structure provides a reliable baseline contribution to citation visibility. Content features — statistics density, citation density — produce the larger gains in a fully optimised system. Structure is the floor. Content determines how high above the floor you reach.

The conditionality here: the value of a structural intervention depends on whether content features are already in place. Optimise in the right order.

The second conditionality: what substrate are you optimising?

The FeatGEO preprint introduced a second conditionality that has nothing to do with structural tier.

The paper tested token-level GEO heuristics — statistics addition, quotation insertion, authoritative language — on LLM-generated advertiser pages. Pages produced by a language model, already fluent, already coherent, already surface-optimised. The result: citation visibility degraded by 14–19% relative across three real LLM engines (GPT-4o-mini, Gemini-2.5-flash, Qwen-plus).

The same heuristics, applied to human-written competitor pages in the same dataset, produced a mean +0.99 visibility gain (18.72% → 19.71%) — consistent with Aggarwal's direction.

The correct framing, which Cass's statistical review confirmed: token-level GEO heuristics improve citation visibility on human-written content. The same heuristics degrade citation visibility when applied to already-fluent LLM-generated content. These are not contradictory findings about the same thing. They describe the same tactics applied to different content substrates.

The Aggarwal et al. 2024 finding — that the best-performing methods produce up to 40% improvement in position-adjusted visibility on human-written web content — and the FeatGEO degradation of 14–19% on LLM-generated content are not two estimates of the same effect. They measure different interventions on different content types using different metrics. The directional divergence is real and explained by substrate difference, not by a contradiction in the research.

The conditionality: the same tactic that helps human-written content hurts AI-generated content. The question is not whether to add statistics. The question is what you are adding them to.

The third conditionality: which audience does your content serve?

Tian et al. 2026 — the AgentGEO preprint — approached the problem from a different angle. The paper's central claim: most GEO methods apply uniform rewriting rules to all documents regardless of what is actually causing each document to fail. For documents where the applied fix matches the actual failure mode, generic optimisation works. For documents where it does not — the paper identifies long-tail, niche content as the at-risk category — it can actively reduce citation rates.

The benchmark comparison from AgentGEO's own data: generic GEO methods produced approximately 25% average improvement in citation rates across the method set on MIMIQ, a document-centric benchmark using real LLM engines. AgentGEO's diagnostic-repair approach — which diagnoses the specific failure mode before applying any fix — produced 40%+ improvement while modifying only 5% of content on the same benchmark.

Those figures come from the same study measuring the same construct on the same benchmark. They are a direct comparison. The Aggarwal et al. finding of up to 40% improvement in position-adjusted visibility and AgentGEO's ~25% generic baseline are from different studies measuring different constructs on different benchmarks — they cannot be treated as a range or presented as two data points around the same central tendency.

The conditionality: the audience your content serves determines whether generic optimisation helps or harms. Niche, specialist content with a specific failure mode does not respond to generic fixes the same way general-purpose content does. For some long-tail pages, applying the wrong repair to the wrong diagnosis moves citation rates in the wrong direction.

The three questions the progression leaves you with

The field moved across three separate conditionalities in the space of four months. The findings are not a cascade of contradictions. They are a progressive decomposition of the same broad question into more precise ones.

The first question the research now asks about your content: Which structural tier are you optimising, and have you established a structural baseline before adding content features? Yu et al. established that structure contributes independently. FeatGEO established that in a full system, structure is the floor. The sequence matters.

The second question: What is the authorship substrate of the content you are optimising? Human-written content with structural and stylistic gaps responds to token-level heuristics positively. Already-fluent LLM-generated content does not. Applying the same tactics to both substrates is a Category Error — the degradation on LLM-generated content is not a rounding error, it is a direction reversal.

The third question: Which audience does your content serve — and does your content face a failure mode that generic optimisation can address? Long-tail, niche content may face failure modes that generic rewriting makes worse. Diagnosis precedes repair.

None of these questions replaces the foundational question — "does my content have enough evidence density, structural clarity, and authorial presence to be worth citing?" That question still applies. It just no longer covers the full decision space.

What the Psytable tools surface — and what they do not

The conditionality research changed how we read what our own tools return.

The Absorption Analyser measures a page against the content dimensions associated with citation absorption depth — statistics presence, definitional language, comparative language, structural properties. The FeatGEO substrate conditionality applies directly here: the absorption dimensions the tool measures are grounded in research on human-written content. If your page was produced by a language model, the tool surfaces the right dimensions, but the conditionality means that increasing those scores through token-level additions may reduce rather than improve your citation outcomes.

The Heading Visualiser maps the heading structure of any page — document architecture and micro-structure simultaneously. The Yu et al. finding provides the structural basis for what the tool surfaces. The tool tells you what your structural tier looks like. It does not tell you whether your page already has the content features that make structural improvements the right next step.

The tools surface diagnostic dimensions. They do not determine whether your content type or audience segment is the one being helped or harmed by a given intervention. That determination requires reading the three conditionalities against your specific situation — substrate, structural tier, and audience type together.

What the preprint status means for acting on this research

All three papers are preprints. None has completed peer review. None has been independently replicated.

The structural baseline finding from Yu et al. (17.3% mean citation improvement from structural changes alone) is the most controlled of the three — a clean isolation design across six engines. The preprint caveat applies. The direction is credible and consistent with the field's theoretical understanding of how generative engines process structural content properties.

The substrate conditionality from FeatGEO (−14% to −19% on LLM-generated content; +0.99 mean on human-written content) is from a single research group on a specific corpus. The directional principle — that fluent, surface-optimised content does not respond to the same levers as content with gaps to fill — is theoretically coherent. The specific magnitudes warrant the preprint caveat fully.

The harm-to-long-tail finding from AgentGEO is the most epistemically robust of the three at preprint stage. It is consistent with the theoretical prediction about mismatched fixes and specific failure modes. The direction is credible even before peer review. The specific magnitude and the mechanism remain open.

Our read: the three conditionalities are worth acting on now at the directional level. Structural tier sequencing, substrate checking before applying token-level heuristics, and failure mode diagnosis before generic rewriting are all low-cost orientation changes. You do not need peer-reviewed confirmation to ask better questions before optimising.

References

Yu, Junwei, Yang MuFeng, Yepeng Ding, and Hiroyuki Sato (2026), "Structural Feature Engineering for Generative Engine Optimization: How Content Structure Shapes Citation Behavior," preprint. arXiv: 2603.29979

Tian, Zhihua, Yuhan Chen, Yao Tang, Jian Liu, and Ruoxi Jia (2026), "Diagnosing and Repairing Citation Failures in Generative Engine Optimization," preprint. arXiv: 2603.09296

Liu, Zikang and Peilan Xu (2026), "Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility," preprint. arXiv: 2604.19113

Aggarwal, Manas et al. (2024), "GEO: Generative Engine Optimization," ACM SIGKDD 2024. arXiv: 2311.09735

Three questions before you optimise.

The Absorption Analyser surfaces the content dimensions associated with higher citation depth — page by page, not as a blanket recommendation. Check your substrate, your structural tier, and your audience type before applying any fix.

Try Absorption Analyser → Read E.03 — Structural Features