1. The Concept, Plainly
When AI tools answer a question, they draw from a list of sources. When Google answers the same question, it draws from a different list. Those two lists barely overlap. A large-scale study found that only 6.82% of the pages ChatGPT cites are in Google's top 10 for the same queries — roughly 93 of every 100 pages an AI tool cites are not on Google's first page at all. AI tools and search engines are running separate competitions with largely separate candidates.
2. Why This Matters Right Now
Posts L1 through L5 in this series taught you how to improve content properties — evidence density, structure, readability, source attribution, and content type. That work applies.
But there is a prior question underneath all of it: are you in the right pool to begin with?
If your content is not being drawn from by AI tools, improving its properties improves your standing in a competition you are not entering. The 6.82% overlap figure is the reason this series exists. The properties L1–L5 covered are the ones that get you into the pool AI tools draw from. Understanding that there are two separate pools is the frame that makes those earlier lessons coherent.
3. The Mechanism
The source for this post is Chalkidis, Søgaard, and Simonsen (2024) — a preprint, not yet peer-reviewed. The study compared which pages six LLM-based search engines cited against which pages two traditional search engines returned, across 55,936 queries. The scale is a genuine strength; preprint status means the figures are directional signals, not certified benchmarks. Treat them as such.
The overlap finding.
6.82% overlap between ChatGPT and Google's top 10. That figure is for ChatGPT vs. Google specifically; other LLM platforms vary, though the directional conclusion — that LLM and traditional search draw from largely separate pools — holds across the study's broader findings.
The 37% unique-domain finding.
37% of the domains that LLM search engines cite in the Chalkidis dataset do not appear in traditional search results at all. This is the finding with the most direct relevance for smaller and specialist publishers. In traditional search, domain authority — the accumulated reputation of your whole site — is a ceiling. Large, established domains dominate, and new or small-domain publishers compete from behind that ceiling. In LLM search, 37% of cited domains have no traditional search presence whatsoever. The ceiling is structurally different. Domain authority is a weaker constraint in LLM search than in traditional search.
This does not mean any content gets cited. The properties that get content into the AI pool are measurable — and three are already covered by earlier posts in this series.
The four properties LLMs prefer.
The Chalkidis analysis measured not just that the overlap was low, but what distinguished the pages LLMs selected. Four properties differentiated LLM-cited pages from traditionally ranked pages:
Structured, hierarchical HTML. LLM-cited pages used more organised markup — headings, logical content breaks, clear hierarchy. This finding is convergent: Aggarwal et al. 2024 (peer-reviewed) and Zhang et al. 2026 (preprint, not yet peer-reviewed) point in the same direction independently.
Easier-to-read text. LLM-cited pages tended toward lower textual complexity. Aggarwal et al. found that content at Flesch-Kincaid grade 8–10 was associated with higher AI citation rates — the Chalkidis finding runs the same direction.
Lower domain popularity. LLMs are less anchored to domain-level reputation than traditional search engines are. Page-level content properties carry more weight relative to whole-site authority.
More outbound links to reputable sources. Pages that linked out to more reputable external sources — primary research, named organisations, authoritative publications — were more likely to be cited by LLMs. This is the Chalkidis study's primary new contribution. It does not appear in Aggarwal or Zhang. And it runs counter to a practice traditional SEO sometimes discourages: the concern that external links send traffic away from your page.
The property findings are correlational. The study shows LLMs select pages that have these properties — not that adding them will mechanically produce a citation. Correlation, not causation. But the direction of the evidence is clear, and for three of the four properties it is convergent across multiple independent studies.
4. Try It Now
No dedicated pool-audit tool exists on Psytable for this step — the prompt below is the most practical immediate action. The outbound link finding is the one most likely to conflict with existing practice: if your content avoids external links, the Chalkidis evidence points in the opposite direction for AI search.
Paste this prompt:
"Please audit the following passage for outbound linking behaviour. Do three things. First: identify whether the passage contains named, external links to reputable sources — primary research, named institutions, named organisations, or official publications. Second: for any claim that warrants attribution — statistics, study findings, specific data points — assess whether it is accompanied by a named external link or citation. Third: a 2024 preprint (Chalkidis et al.) found that pages with more outbound links to reputable external sources were more likely to be cited by LLM search engines. Based on your audit, does this passage model that behaviour, or are there specific claims that would benefit from a named external source? Note: this is correlational evidence from a preprint; the audit is directional, not a guarantee of AI citation.
[Paste your passage here — 200 to 400 words works well]"
What to look for in the output: The AI will identify whether your passage links to named external sources and flag any claims made without attribution. A passage the AI characterises as well-attributed — with named links to primary research or authoritative sources — exhibits the outbound linking behaviour Chalkidis et al. found to be positively associated with LLM citation selection. A passage flagged for unattributed claims gives you a concrete list of places where adding a named source strengthens the epistemic quality of your content and aligns it with what the AI pool selects for.
5. The One Thing to Remember
AI tools and search engines draw from largely separate source pools — and the content properties that get you into the AI pool are measurable and actionable, starting with structure, readability, and linking out to reputable sources.
6. Go Deeper
The Field Notes post on the Chalkidis study covers the full methodology, the confidence-level caveats on each property finding, and the domain diversity result in detail: 93% of ChatGPT's Sources Are Not in Google's Top 10.
Measure your content's structural properties.
The Evidence Density Score measures statistics presence, source attribution, readability, and structural richness — the content properties the Psytable research series has identified as associated with higher AI citation probability.