Why AI Search and Google Are Pulling from Different Pools

1. The Concept, Plainly

When AI tools answer a question, they draw from a list of sources. When Google answers the same question, it draws from a different list. Those two lists barely overlap. A large-scale study found that only 6.82% of the pages ChatGPT cites are in Google's top 10 for the same queries — roughly 93 of every 100 pages an AI tool cites are not on Google's first page at all. AI tools and search engines are running separate competitions with largely separate candidates.

2. Why This Matters Right Now

Posts L1 through L5 in this series taught you how to improve content properties — evidence density, structure, readability, source attribution, and content type. That work applies.

But there is a prior question underneath all of it: are you in the right pool to begin with?

If your content is not being drawn from by AI tools, improving its properties improves your standing in a competition you are not entering. The 6.82% overlap figure is the reason this series exists. The properties L1–L5 covered are the ones that get you into the pool AI tools draw from. Understanding that there are two separate pools is the frame that makes those earlier lessons coherent.

3. The Mechanism

The source for this post is Chalkidis, Søgaard, and Simonsen (2024) — a preprint, not yet peer-reviewed. The study compared which pages six LLM-based search engines cited against which pages two traditional search engines returned, across 55,936 queries. The scale is a genuine strength; preprint status means the figures are directional signals, not certified benchmarks. Treat them as such.

The overlap finding.

6.82% overlap between ChatGPT and Google's top 10. That figure is for ChatGPT vs. Google specifically; other LLM platforms vary, though the directional conclusion — that LLM and traditional search draw from largely separate pools — holds across the study's broader findings.

The 37% unique-domain finding.

37% of the domains that LLM search engines cite in the Chalkidis dataset do not appear in traditional search results at all. This is the finding with the most direct relevance for smaller and specialist publishers. In traditional search, domain authority — the accumulated reputation of your whole site — is a ceiling. Large, established domains dominate, and new or small-domain publishers compete from behind that ceiling. In LLM search, 37% of cited domains have no traditional search presence whatsoever. The ceiling is structurally different. Domain authority is a weaker constraint in LLM search than in traditional search.

This does not mean any content gets cited. The properties that get content into the AI pool are measurable — and three are already covered by earlier posts in this series.

The four properties LLMs prefer.

The Chalkidis analysis measured not just that the overlap was low, but what distinguished the pages LLMs selected. Four properties differentiated LLM-cited pages from traditionally ranked pages:

Structured, hierarchical HTML. LLM-cited pages used more organised markup — headings, logical content breaks, clear hierarchy. This finding is convergent: Aggarwal et al. 2024 (peer-reviewed) and Zhang et al. 2026 (preprint, not yet peer-reviewed) point in the same direction independently.
Easier-to-read text. LLM-cited pages tended toward lower textual complexity. Aggarwal et al. found that content at Flesch-Kincaid grade 8–10 was associated with higher AI citation rates — the Chalkidis finding runs the same direction.
Lower domain popularity. LLMs are less anchored to domain-level reputation than traditional search engines are. Page-level content properties carry more weight relative to whole-site authority.
More outbound links to reputable sources. Pages that linked out to more reputable external sources — primary research, named organisations, authoritative publications — were more likely to be cited by LLMs. This is the Chalkidis study's primary new contribution. It does not appear in Aggarwal or Zhang. And it runs counter to a practice traditional SEO sometimes discourages: the concern that external links send traffic away from your page.

The property findings are correlational. The study shows LLMs select pages that have these properties — not that adding them will mechanically produce a citation. Correlation, not causation. But the direction of the evidence is clear, and for three of the four properties it is convergent across multiple independent studies.

4. Try It Now

No dedicated pool-audit tool exists on Psytable for this step — the prompt below is the most practical immediate action. The outbound link finding is the one most likely to conflict with existing practice: if your content avoids external links, the Chalkidis evidence points in the opposite direction for AI search.

5. The One Thing to Remember

AI tools and search engines draw from largely separate source pools — and the content properties that get you into the AI pool are measurable and actionable, starting with structure, readability, and linking out to reputable sources.

6. Go Deeper

The Field Notes post on the Chalkidis study covers the full methodology, the confidence-level caveats on each property finding, and the domain diversity result in detail: 93% of ChatGPT's Sources Are Not in Google's Top 10.

Measure your content's structural properties.

The Evidence Density Score measures statistics presence, source attribution, readability, and structural richness — the content properties the Psytable research series has identified as associated with higher AI citation probability.

Try Evidence Density Score → Read the Field Notes post →