In Branded-Query Data, Content Type Correlates with Citation Share

New to this topic? Start with Your Content Format Is a Citation Signal — the "Learn with the research" post that teaches the mechanism first.

You have been optimising the structural properties of your content — statistics density, source attribution, readability, heading structure. That work is real and the evidence behind it is solid. But there is a prior question most practitioners skip: is the content type you are producing competitive for citations in the query context you are targeting?

A 2025 practitioner study by Omniscient Digital analysed 23,387 unique sources drawn from 240 branded-query prompts — each containing a specific brand name — queried across five AI platforms. The dataset is large and the methodology is stated. It is not peer-reviewed. Treat what follows as directional.

The finding: content type is sharply unequal in citation share across branded queries. And the implication for how you sequence your optimisation work is worth examining — even though the data's scope does not transfer directly to informational content.

What the data shows — and what it cannot

Across 23,387 citations from branded-query prompts, the distribution was not even close:

Reviews, listicles, forums, and case studies accounted for approximately 57% of citations
Directory sites accounted for approximately 17%
Product pages accounted for approximately 12%
Thought leadership content accounted for approximately 5.4% of citations
Video and news/press releases were the least-cited types

A companion study from the same research programme extended the analysis to 43,000+ citations and added an intent-stage dimension:

Educational and thought leadership content dominated Problem Unaware queries — approximately 86% of citations at this stage
Social proof — reviews, comparisons, community content — dominated Solution Aware queries, accounting for approximately 51% of citations at that stage

These are directional signals from a practitioner dataset. Not certified benchmarks. The figures will shift depending on platform, prompt selection, and niche.

What the data cannot show, stated plainly:

Every prompt in the Omniscient dataset contained a brand name. This is branded-query citation behaviour — vendor research, product evaluation, brand comparison. Informational content practitioners produce content for query contexts where no brand name appears. That is a structurally different environment, and the distribution percentages do not transfer directly to it.

The mechanism is not tested. Whether this distribution reflects an algorithmic preference for certain content types, a composition effect in the AI source pool — there is simply more review-format content available to be cited in branded-query contexts — or something specific to the branded-query environment is an open question. The data describes averages, not constraints.

Whether 5.4% thought leadership share is disproportionately low relative to thought leadership's share of available pages is also not established. It could reflect AI preference against thought leadership, or it could reflect proportionate availability. The data does not distinguish these.

Layer 1 before Layer 2

The Aggarwal et al. (2024) and Zhang et al. (2026) findings operate at what we are calling Layer 2: they identify structural and evidential properties within a piece of content — statistics presence, source attribution, heading density, readability, definitional and comparative language — that correlate with citation probability. That evidence is peer-reviewed and actionable.

The Omniscient data introduces a prior question. Call it Layer 1: given the query context you are targeting, is your content type competing for a meaningful share of citations at all?

The practical implication from the distribution pattern: if thought leadership averages 5.4% of citations in branded-query contexts — regardless of how well-structured or evidence-dense a given piece is — then Layer 2 optimisation is solving a real problem inside a constraint that content type may set. Optimising a thought leadership post's structural quality does not change where it sits in the type distribution.

This is a prioritisation heuristic, not a confirmed mechanism. The Layer 1 / Layer 2 framework is a synthesis introduced here, not an established industry model. But the heuristic is still worth following: ask the Layer 1 question before investing heavily in Layer 2. The data is branded-query and directional — but the implication for optimisation sequencing is clear enough to act on.

State the verdict plainly: if your content type is not competitive for citations in your target query context, structural optimisation is building on the wrong foundation.

The intent-stage finding from the companion study adds specificity. Thought leadership is not uniformly disadvantaged. At the awareness stage — Problem Unaware queries — educational and thought leadership content took approximately 86% of citations. That is the reverse of the vendor-evaluation picture. The same content type can be dominant in one query context and marginalised in another. The Layer 1 question is not "is thought leadership worth producing" — it is "which query contexts is thought leadership competitive in, and am I targeting those?"

The earned media problem

The vendor evaluation picture from the companion study is the most actionable data point in this research programme. When users were evaluating vendors, earned media — reviews, comparisons, third-party coverage — accounted for approximately 82% of citations.

The directive that follows from this is obvious: close the earned media gap if you have one.

But it requires a clarification. Reviews, forum discussions, and third-party comparisons are earned media. They are written by third parties and are not under your direct editorial control. You cannot produce them the way you produce thought leadership.

A Layer 1 audit that reveals an earned media gap is identifying a different type of problem than a content production gap. The response is a PR or outreach strategy — seeding reviews, building relationships with comparison sites, generating coverage — not a publishing calendar decision. State that distinction explicitly rather than treating "produce more earned media" as a content production directive.

About the tools referenced in this post

The Evidence Density Score is a Psytable tool that measures the structural and linguistic properties of a piece of content — statistics presence, source attribution, readability, and structural richness — in a single 0–100 score. It operates at Layer 2: it measures content quality within a piece. It does not determine which content type to produce, and it does not surface a Layer 1 verdict. Use it once the Layer 1 question is answered. It is most actionable for content types that are already well-positioned for your query context — where Layer 1 is solved and Layer 2 is what determines whether a given piece rises or falls within the citation pool.

The Evidence Density Score at Layer 2

Once the Layer 1 question is settled — once you have identified which content types are competitive for your target query contexts — the Evidence Density Score tells you where a given piece stands on the structural dimensions associated with citation probability.

The Evidence Density Score works at Layer 2. It measures statistics presence, source attribution, readability, and structural richness within a piece of content. Use it once the Layer 1 question is answered. It is most actionable for content types that are already well-positioned for your query context — where Layer 1 is solved and Layer 2 is what determines whether a given piece rises or falls within the citation pool.

What the research establishes

The branded-query citation distribution suggests the content type question comes before the structural optimisation question — not instead of it.

The Aggarwal and Zhang findings still apply. They apply to Layer 2. The Omniscient finding adds the prior layer. A practitioner who uses both has a more complete picture of where optimisation effort has leverage than one who uses only one.

The specific percentages do not transfer directly to informational content — the branded-query scope is a genuine constraint on that. But the prioritisation heuristic transfers. Ask the Layer 1 question before investing heavily in Layer 2. Use the data to ask better questions about your content type mix; do not use it to set specific citation targets.

References

Aggarwal, Manas et al. (2024), "GEO: Generative Engine Optimization," ACM SIGKDD 2024. arXiv: 2311.09735

Zhang, Kai, Xian He, and Jiaxin Yao (2026), "From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms," preprint. arXiv: 2604.25707

Omniscient Digital (2025), practitioner research dataset — 240 prompts, 23,387 unique sources across 5 AI platforms. Not peer-reviewed. No public citation available at time of writing.

Measure Layer 2 once Layer 1 is answered.

The Evidence Density Score surfaces statistics presence, source attribution, readability, and structural richness — the Layer 2 properties the Aggarwal and Zhang research identifies as associated with citation probability.

Try Evidence Density Score → Next: The temporal layer →

In Branded-Query Data, Content Type Correlates with Citation Share. Here Is What That Means for How You Sequence Your Work.