Standard GEO Advice Degrades AI Citation Rates on LLM-Generated Content

Preprint finding — directional, not yet peer-reviewed. Liu, Zikang and Peilan Xu (2026), "Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility," Nanjing University of Information Science and Technology. arXiv: 2604.19113. April 21, 2026. Single research group. Not independently replicated. Treat figures as directional preprint evidence.

You use AI to help write your content. You have also been following GEO advice: add statistics, add quotations, use authoritative language. A preprint from April 2026 provides directional evidence that these two things are in conflict. Token-level GEO heuristics — the foundational advice from Aggarwal et al. 2024 — reduce citation visibility by 14–19% when applied to already-fluent LLM-generated content. The same heuristics improve visibility on human-written pages.

The question is no longer whether GEO works. It is whether GEO works on your specific content type.

What the original research established — and what this paper tests

The Aggarwal et al. 2024 study — peer-reviewed, published at KDD — is the source of GEO's foundational advice. Add statistics. Add source citations. Use authoritative language. That study measured the effect of these modifications on human-written web content and found that the best-performing methods produced up to 40% improvement in position-adjusted visibility. Those results became the basis for practitioner GEO recommendations that have since spread widely.

The FeatGEO preprint tests those same token-level heuristics on a different content substrate: LLM-generated advertiser pages — pages produced by a language model, already optimised for fluency and coherence at the surface level.

The result is the opposite. Not a smaller gain. The opposite direction.

The substrate-conditional finding

The paper tested token-level heuristics across three real LLM engines: GPT-4o-mini, Gemini-2.5-flash, and Qwen-plus. On LLM-generated content, the same statistics-and-quotations approach that drives gains on human-written pages produced citation visibility degradation of 14–19% relative across all three engines (Table 2).

The mechanism the paper proposes: LLM-generated pages are already fluent, coherent, and structurally optimised at the surface level. They are content-saturated. Adding statistics or quotations into a saturated page does not fill a gap. It introduces redundancy into content that has no gaps to fill, and that redundancy degrades the citation signal rather than improving it.

The paper resolves the apparent contradiction with its own Table 4. When the same token-level heuristics are applied to human-written competitor pages in the same dataset, the average visibility gain is +0.99 (18.72% → 19.71%) — consistent with Aggarwal's direction. There is no contradiction between these findings. They describe different substrates.

The correct framing (Cass-reviewed): Token-level GEO heuristics improve visibility on human-written content [Aggarwal et al., up to +40%; FeatGEO Table 4, +0.99 mean] but degrade visibility when applied to already-fluent LLM-generated content [FeatGEO Table 2, −14% to −19% relative on three engines]. These are not contradictory findings about the same thing. They describe the same tactics applied to different content types.

What FeatGEO proposes instead

The paper's core argument is that token-level optimisation is the wrong unit of analysis for LLM-generated content. The FeatGEO system operates at the feature level — it abstracts pages into interpretable structural, content, and linguistic properties, then optimises the feature configuration rather than rewriting text at the token level.

Applied to LLM-generated pages, the FeatGEO method produced citation visibility improvements of 37–96% across the same three engines where token-level heuristics failed (Table 2). The range is wide. The 37% lower bound and the 96% upper bound come from different engines. The paper does not report the mechanism behind engine-level variation in this range.

These figures are from a preprint by a single research group, testing a system they designed on a corpus they assembled. They are the right direction to understand. They are not a guaranteed outcome.

The feature hierarchy: what matters most in a full system

The FeatGEO ablation study (Figure 4) identifies the relative contribution of different feature types within the full optimised system. Content features dominate. Statistics density and citation density produce the largest individual gains. Structural features contribute consistently across all pages — but their marginal contribution within a fully optimised multi-feature system is small (below 0.35% absolute deviation for structural features in the ablation).

That figure — less than 0.35% — is worth understanding precisely. It is not the gain you get from adding structure to unoptimised content. It is the marginal gain from structural features when content features are already present and dominant in the system.

The H1 Research Dispatch (Yu et al., arXiv:2603.29979) established that structural changes alone — without any semantic content change — produce a mean 17.3% improvement in citation rates across six engines. That 17.3% measures structural contribution in isolation. FeatGEO's <0.35% measures the marginal contribution of structural features when content features are already doing most of the work. These are not the same quantity and cannot be compared as magnitudes.

The picture they describe together: structural features provide a reliable baseline contribution to citation visibility — they matter when they are the primary lever. Content features — statistics density, citation density — produce the larger gains in a fully optimised system. Structure is not irrelevant. It is the floor, not the ceiling.

What this means if you write with AI assistance

The conditionality finding has a direct implication for any content operation that uses AI writing tools.

If your pages are produced or substantially shaped by a language model, applying token-level GEO heuristics — adding statistics blocks, inserting quotations, adding authoritative source mentions — may reduce your citation rates. Not by a rounding error. By 14–19% on the available evidence.

The FeatGEO preprint does not specify what degree of AI involvement triggers the conditionality. Fully AI-generated content and lightly AI-edited human drafts likely behave differently — the paper does not establish the threshold. What it does establish is the directional principle: fluent, coherent, surface-optimised content does not respond to the same levers as content with stylistic and structural gaps to fill.

The practical question for any content producer using AI tools: what substrate are you actually optimising? Answering that question changes which advice is relevant and which is counterproductive.

What this paper does not tell us

The degradation figures (−14% to −19%) and the FeatGEO improvement range (+37% to +96%) come from a single study by the same research group on a specific LLM-generated advertiser page corpus. Neither set of figures has been independently replicated.

The paper does not establish what percentage of AI involvement triggers the conditionality — where the transition from "human-written with AI assistance" to "LLM-generated" occurs for the purpose of citation outcomes. Fully AI-generated and lightly AI-edited may sit at different points on that spectrum, but the paper does not resolve this.

The +37% to +96% FeatGEO improvement range spans nearly three times between its lower and upper bounds. The engine-level mechanism driving that spread is not reported in the available data. Both bounds are from the same preprint; neither can be treated as the expected outcome for an arbitrary deployment of FeatGEO methods.

This is a preprint. The methodology and the conditionality finding have not yet been independently validated. The content-type conditionality is directionally consistent with the broader understanding of LLM output saturation, which is why we treat it as directional evidence worth acting on now. It is not a confirmed fact about how citation systems behave.

About the tools and what the conditionality means for them

The Evidence Density Score measures a page's statistics and evidence properties based on the Aggarwal et al. 2024 finding that these predictors improve citation visibility on human-written content. That is the study the tool is based on. The FeatGEO finding adds a qualification: the Evidence Density Score reflects what predicts citation visibility on human-written pages. For AI-generated or heavily AI-assisted content, the substrate conditionality finding suggests that high evidence density scores may not produce the same outcomes — and that increasing evidence density through token-level additions may actively reduce visibility.

This is not a revision to the Evidence Density Score's underlying research. It is a scope condition that was not part of the original 2024 study. We are noting it here because it is directly relevant to how you interpret your score if your content was produced with AI assistance.

The Absorption Analyser measures a page against the absorption dimensions identified in the Zhang et al. 2026 dataset — statistics presence, definitional language, comparative language, structural properties. The substrate conditionality in FeatGEO is a parallel finding: the same content properties perform differently depending on the authorship context. Both tools surface content dimensions with directional evidence behind them; neither removes the need to understand what your specific content substrate is.

Both tools are free, browser-based, and require no login.

References

Liu, Zikang and Peilan Xu (2026), "Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility," preprint. arXiv: 2604.19113

Aggarwal, Manas et al. (2024), "GEO: Generative Engine Optimization," ACM SIGKDD 2024. arXiv: 2311.09735

Yu, Junwei, Yang MuFeng, Yepeng Ding, and Hiroyuki Sato (2026), "Structural Feature Engineering for Generative Engine Optimization: How Content Structure Shapes Citation Behavior," preprint. arXiv: 2603.29979

Check your evidence density — and read the scope condition.

The Evidence Density Score scores the citation predictors from Aggarwal et al. 2024. The substrate conditionality finding in FeatGEO applies a qualification: scores are most relevant for human-written pages. Read this post alongside your score.

Try Evidence Density Score → Try Absorption Analyser →