New to this topic? Start with Why AI Search and Google Are Pulling from Different Pools — the "Learn with the research" post that teaches the mechanism first.
Only 6.82% of the pages ChatGPT cites overlap with Google's top-10 results for the same queries.
That figure comes from a large-scale preprint — more on its status below — that compared source selection across six LLM search engines and two traditional search engines across 55,936 queries. The study is not measuring which pages rank differently; it is measuring which pages are selected at all. And the selection pools are almost entirely separate.
This is the prior question the Psytable series had not yet answered directly: are AI search and Google-style search selecting on different properties, or just ordering the same pages differently? According to this research, they are selecting from largely different pools — and the properties LLMs prefer explain why.
The study
The research is Chalkidis et al. 2024.1 It is a preprint — it has not completed peer review — and the specific figures in this post should be treated as directional signals from a large but not yet independently validated dataset.
The study compared which pages six LLM-based search engines cited — ChatGPT, Perplexity, Copilot, You.com, Andi, and Brave — against which pages two traditional search engines returned — Google and Bing — across 55,936 queries. The scale is a genuine strength: this is not a small-sample analysis. But preprint status means the methodology has not cleared formal peer review, and findings may be qualified once it does. Apply the same standard you would to any large preprint: useful directional evidence, not a settled empirical record.
The overlap finding
6.82% overlap between ChatGPT results and Google's top 10.
This is not a modest divergence. It means that roughly 93 of every 100 pages ChatGPT cites for a given query are not in Google's top 10 for that query. The two engines are not re-ordering the same competition — they are drawing from largely separate source pools.
The study reports this figure for ChatGPT specifically; the broader finding across LLM search engines is the same directional conclusion: LLM-based search engines and traditional search engines select pages based on substantially different criteria. For a practitioner currently allocating content optimisation effort between Google-SEO and AI-SEO, this is load-bearing. If the selection criteria are different, the optimisation work is different too — and what follows in this post is what those criteria are.
The properties LLMs prefer
The Chalkidis et al. analysis did not just measure that the overlap was low — it measured why. The pages LLM search engines selected systematically differed from traditionally ranked pages on four measurable dimensions.
Structured and hierarchical HTML. LLM-cited pages exhibited more structured, hierarchically organised markup compared to pages preferred by traditional search. Three independent studies converge on this signal:
- Aggarwal et al. 2024 (peer-reviewed, KDD): structured content properties were associated with higher AI citation probability.
- Zhang et al. 2026 (preprint, not yet peer-reviewed): heading density was among the properties most strongly associated with high-influence pages.
- Chalkidis et al. 2024 (preprint, not yet peer-reviewed): structured HTML is associated with LLM selection across a 55,936-query dataset.
Multi-study convergence on the same signal, even under preprint caveats, is analytically meaningful.
Easier-to-read text. LLM-cited pages tended toward lower textual complexity — content that is more accessible and readable. This is also convergent. Aggarwal et al. 2024 found that content at Flesch-Kincaid grade 8–10 was associated with higher AI extraction rates. The Chalkidis finding runs the same direction independently: readability is a selection signal, not a retrieval penalty. For practitioners trained on the expectation that complexity signals authority, both studies indicate the reverse holds for AI selection.
Lower domain popularity. LLM-cited pages tended to come from domains with lower overall popularity compared to the pages traditional search returns. This is a departure from the traditional search model, where domain authority and popularity are established positive ranking signals. LLMs appear less anchored to domain-level reputation and more responsive to page-level content properties. For small and specialist publishers, this is structural good news — explored further below.
More outlinks to reputable sources. Pages that linked out to more reputable external sources were more likely to be cited by LLMs. This finding does not appear in Aggarwal or Zhang — it is the Chalkidis study's primary new contribution to the Psytable series, and it runs counter to a practice that traditional SEO sometimes discourages.
The outlinks finding
Linking out to reputable external sources is positively associated with LLM citation probability.
Traditional SEO has long carried a debate about outbound linking. Sending traffic away from your site via external links is sometimes characterised as leaking authority or diluting engagement. The concern is directional: every click to an external source is a reader who is not staying on your page.
The Chalkidis finding points in the opposite direction for AI search. Pages that cite more reputable external sources are the pages LLMs are more likely to select. The directional logic is consistent with what LLMs are built to do: they are constructing answers that require sourced, evidenced claims. A page that already models that behaviour — citing external authority, attributing claims to sources — appears to be structurally more useful to an LLM extracting from it.
The practical implication is concrete. Linking to authoritative external sources — primary research, named organisations, official publications — is not a trade-off you make against AI citation probability. Based on the Chalkidis study's directional evidence, it is a positive signal.
This is a preprint finding without cross-study corroboration; the caveat applies. But it is directionally consistent with the broader picture the study paints: LLMs select for pages that exhibit the epistemic properties of a well-evidenced answer — structured, readable, cited, and sourced.
The domain diversity finding
37% of the domains cited by LLM search engines in the Chalkidis dataset are unique to LLM-SEs — they do not appear in traditional search results at all.
That is a structural finding about market access, not just ranking position. Traditional search concentrates citation share toward high-authority, high-popularity domains. LLM search engines select from a substantially broader set of domains, including domains that are entirely absent from traditional search results.
For the Psytable reader — content practitioners, often microentrepreneurs and specialists without the domain authority profile of large publishers — this is the most directly relevant finding in the study for their position. In traditional search, domain authority creates a ceiling for new or small-domain publishers. In LLM search, 37% of cited domains have no traditional search presence at all. The ceiling is structurally different.
This does not mean content quality is irrelevant — the other findings in this post are about the specific content properties LLMs select for. But it means the domain-authority constraint that limits traditional search performance is substantially weaker in LLM search. A small-domain publisher whose content is well-structured, readable, evidence-rich, and appropriately sourced is competing in a different environment than traditional search rankings suggest.
About the tools referenced in this post
The Evidence Density Score and the Absorption Analyser are Psytable tools that measure the structural and linguistic properties of a piece of content. The Evidence Density Score measures statistics presence, source attribution, readability, and structural properties in a single 0–100 score. The Absorption Analyser scores content against structural and linguistic dimensions drawn from the Zhang et al. analysis. Both tools surface the content properties that the Psytable series' research — including the Chalkidis convergent findings on structured HTML and readability — has identified as associated with higher AI citation and absorption probability. References to these tools in the section below refer to these specific Psytable outputs.
What this means in practice
The Chalkidis study's directional evidence supports three concrete content practices.
Maintain outbound links to reputable sources. This is the finding most likely to conflict with existing practice. If your content currently avoids external linking on the basis that it sends traffic away from your site, the Chalkidis evidence points in the opposite direction for AI search. Link to primary research, named publications, and authoritative sources where your argument warrants it. Name the source explicitly — consistent with the Aggarwal finding on attribution as a citation signal.
Prioritise structured HTML and readability. Both properties are now convergent across Aggarwal (peer-reviewed), Zhang (preprint), and Chalkidis (preprint). Three independent studies pointing the same direction strengthens the directional confidence even under preprint caveats. Use structural heading markup to create logical breaks in your content. Keep readability in the accessible range — target Flesch-Kincaid grade 8–10 as the Aggarwal evidence supports. The Evidence Density Score and Absorption Analyser surface both signals.
Understand that small-domain content competes differently in LLM search. The domain diversity finding is not a structural optimisation — there is nothing to implement on this point. But it is a strategic framing correction. If your content optimisation has been anchored to the assumption that domain authority is the primary ceiling, LLM search suggests a different ceiling applies. Content-level properties matter more in that environment.
One question this post does not answer: whether AI citations in chat interfaces produce referral traffic. Traditional SEO drives clicks; AI citations in summarised responses may provide brand exposure without a navigable link. The business value of AI citation optimisation depends partly on how each platform surfaces cited sources — a question worth monitoring as AI search product features evolve.
The limits
This is a preprint. The 6.82% overlap figure, the 37% unique domains finding, and the property-level comparisons are directional signals from a dataset that has not yet cleared peer review. The scale — 55,936 queries across eight engines — is a genuine strength, but scale does not substitute for independent methodological validation.
The study measures which pages LLMs select, not why the mechanisms produce those preferences. The outlinks finding, the domain popularity finding, and the structured HTML finding are correlational. The study establishes that LLMs systematically select pages with these properties; it does not establish that adding these properties to a page will mechanically produce a selection uplift. Correlation, not causation — applied consistently.
The overlap figure (6.82%) is specifically measured for ChatGPT vs. Google. The Chalkidis study analysed six LLM search engines in total — the 6.82% figure is from ChatGPT vs. Google; overlap figures for Perplexity, Copilot, You.com, Andi, and Brave are covered in the full paper and may vary from this benchmark figure. The directional conclusion — that LLM and traditional search engines select from substantially different source pools — is consistent across the study's broader findings, but practitioners working with specific platforms other than ChatGPT should treat the precise figure as indicative rather than platform-universal.
This post does not compare query volumes across ChatGPT and Google — the traffic opportunity each platform represents varies significantly by audience and content type, and is a separate consideration when allocating optimisation effort between AI-specific and traditional SEO.
AI search engine behaviour evolves. The study reflects selection patterns as measured at the time of data collection. Treat the findings as current directional evidence, not permanent algorithmic rules.
Measure your content's structural properties.
The Evidence Density Score applies the evidence from Aggarwal, Zhang, and the Chalkidis convergent signals — measuring statistics, attribution, readability, and structure in a single 0–100 score.
References
1 Chalkidis, Søgaard, and Simonsen (2024), "Source Coverage and Citation Bias in LLM-based vs. Traditional Search Engines," submitted December 2024. arXiv: 2512.09483 ↩