← All insights
Education

How AI Chooses Which Businesses to Mention

ChatGPT, Claude, Perplexity, and Gemini each pick businesses differently. We unpack the source-selection logic behind every AI engine's recommendations.

By Annette Thompson · Updated May 9, 2026 · 18 min read

A Boulder homeowner asks ChatGPT for “the best plumbers in North Boulder.” ChatGPT returns five names, three of which the homeowner has never heard of. The other two she knows by reputation. She picks one of the names she didn’t know. Two hours later, that plumber gets a call. The plumber has no idea why.

This is happening now, at scale, across every category where buyers ask AI before they ask anyone else. The local home services category, the medical and legal categories, the SaaS and B2B categories: all of them. AI engines are shaping awareness, consideration, and conversion at a layer below the layer most businesses are tracking. Most businesses don’t know they’ve been recommended (or skipped) until the leads stop matching the rankings.

We’ve spent the last six months pulling apart the source-selection logic of ChatGPT, Claude, Perplexity, and Gemini. The systems are different in their internals, but the surface logic of “which businesses get mentioned” is more knowable than people think. This is the long version of what we’ve learned, including where each engine pulls from, what signals matter, and how a business goes from invisible to consistently mentioned.

The shift from search rankings to AI mentions

Traditional Google rankings are a sorted list. Position #1 wins, position #11 loses. The business case is straightforward: rank higher, get more clicks, convert more leads.

AI mentions don’t work that way. When ChatGPT generates an answer to “best plumbers in Boulder,” it doesn’t return a ranked list of every plumber. It returns 3-7 names that satisfy a different optimization. The model is trying to give the user a useful answer, drawn from a retrieval set, weighted by signals the model has learned to trust. The “winner” of the AI mention isn’t the business with the most backlinks. It’s the business whose information is structured in a way the model can extract cleanly.

The 5W AI Platform Citation Source Index, released in May 2026, analyzed 680 million citations across the major AI engines. The headline finding: Reddit accounts for roughly 40% of citations across all major models, displacing traditional news outlets as the primary source. The deeper finding: each engine has its own citation distribution, and the differences are large enough that “AI search optimization” isn’t really one thing. It’s four overlapping disciplines.

Where each AI engine actually pulls from

Here’s the source-share data from the 5W Index, aggregated with our own audit work and Yext’s parallel research:

EngineTop sources (by citation share)Citation freshness
ChatGPTWikipedia (47.9%), Reddit, Forbes, Business Insider, official .gov / .edu56% from past year
ClaudeNYT, Atlantic, New Yorker, Economist, Wikipedia, Reddit36% from past year
PerplexityReddit (46.7%), NIH/PubMed, niche B2B authorities, primary research60%+ from past year
Gemini / Google AI OverviewsMirrors Google’s organic top 10 with additional weight on official sourcesVaries; tracks Google index

The differences matter. A business that’s optimized to show up on Wikipedia will get cited heavily by ChatGPT. A business that’s earned coverage in legacy journalism will show up in Claude. A business with a strong Reddit presence will dominate Perplexity. A business that ranks well in Google’s top 10 will tend to show up in Gemini and AI Overviews.

That’s the strategic insight most “GEO guides” miss. Optimizing for AI mentions isn’t one move. It’s four overlapping moves, with shared underlying mechanics but different surface tactics.

The Three-Layer Source Hierarchy

Every AI engine, regardless of internal architecture, ranks sources along three layers. We call this the Three-Layer Source Hierarchy, and it’s the framework we use when we audit a business for AI mention probability.

┌─────────────────────────────────────────────────┐
│  Layer 1: Authoritative public sources         │
│  Wikipedia, .gov, .edu, top 50 publications     │
│  → Used as ground truth                         │
├─────────────────────────────────────────────────┤
│  Layer 2: Reputation aggregators                │
│  Reddit, Quora, YouTube, niche forums, reviews  │
│  → Used to triangulate sentiment                │
├─────────────────────────────────────────────────┤
│  Layer 3: First-party authority                 │
│  Business website, structured data, owned media │
│  → Used to verify and enrich                    │
└─────────────────────────────────────────────────┘

A business that’s strong on Layer 1 alone will get cited generically (“Companies like X serve this market”). A business strong on Layer 2 alone will get cited contextually (“Reddit users frequently recommend X”). A business strong on Layer 3 alone will get cited cleanly when a user’s query lands on the business’s own site through retrieval.

The mentions happen most consistently when a business has presence on all three layers. The AI engine can triangulate: Layer 1 says you exist and are real, Layer 2 says people talk about you favorably, Layer 3 confirms the specific facts the model needs to render an answer.

This is the framework most AI mention strategies skip. They optimize one layer (usually Layer 3, because it’s the layer the business directly controls) and wonder why they’re still invisible. The triangulation is the point.

Layer 1: Authoritative public sources

This is the layer where AI engines anchor their factual claims. If your business doesn’t appear on a Wikipedia page, doesn’t show up in any .gov or .edu domain, hasn’t been mentioned in a top-50 publication, then the AI engine has no high-trust source to anchor a claim about you.

For most small businesses, Wikipedia is functionally out of reach. The notability bar is high, and Wikipedia’s editor community aggressively prunes promotional content. But adjacent strategies work: getting cited on a city’s official tourism site, getting included in a state-level industry directory, earning a mention in an academic paper or government report, showing up in a chamber of commerce or BBB profile that the engines trust.

Here’s the principle: Layer 1 isn’t about getting your own page on Wikipedia. It’s about getting your name to appear, in passing, on pages the AI engines already trust.

Mini case study: Bone Voyage Dog Rescue (Annette’s prior venture) earned mentions on the official tourism sites of Lake Chapala municipalities, on Mexico’s federal animal welfare reporting, and on academic papers about international companion animal placement. None of those mentions targeted the SEO directly. They were the byproduct of doing real work that institutions noticed. The byproduct, ten years later, is that ChatGPT and Claude both reliably mention Bone Voyage when asked about international dog rescue placement, even though the rescue itself closed in 2024.

Layer 2: Reputation aggregators

Layer 2 is where Reddit’s dominance becomes structurally important. AI engines have learned that Reddit threads contain real opinions from real users at scale, in a format the model can read directly. A subreddit thread titled “Best plumbers in North Boulder?” with 30 comments naming specific businesses is exactly the kind of content the engines learn to trust.

Other Layer 2 sources: Quora answers, YouTube video descriptions and comments, industry forums (HVAC-Talk, AVS Forum, etc.), Yelp reviews, Google reviews, Trustpilot, niche aggregator sites in your category.

The mistake most businesses make on Layer 2 is treating it as a public relations exercise. It isn’t. It’s a presence-and-sentiment exercise. The AI engine isn’t reading Reddit looking for press releases. It’s reading Reddit looking for what real users actually say about your category, and which businesses they name.

Two practical tactics for Layer 2:

Be the answer in your category’s subreddit, in your own voice. Not promotionally. Not as a “marketing channel.” As a participant who happens to run a relevant business. Show up consistently, answer questions in detail, identify yourself when relevant. Over time, your business name appears repeatedly in threads the AI engines crawl.

Earn legitimate reviews on the platforms your category cares about. Google Business Profile reviews matter for local. G2 and Capterra matter for SaaS. Yelp matters for hospitality. Industry-specific aggregators matter for industry-specific queries. Volume and recency both matter (more on this below).

Layer 3: First-party authority

Layer 3 is your own website, structured data, owned media, and email-list-driven content. It’s the layer you fully control, and it’s the layer where most businesses overspend relative to the leverage it provides on AI mentions.

The function of Layer 3 in AI source selection isn’t ranking. It’s verification. When an AI engine has a candidate business pulled from Layers 1 and 2, it checks Layer 3 to confirm the facts: name, location, phone, services, hours, prices, specialties. If your website doesn’t make those facts easy to retrieve (in plain text, not buried in JavaScript, ideally with schema markup), the engine has reduced confidence in the candidate, and the candidate gets dropped from the response.

This is why technical SEO still matters in the AI era, even though “ranking” matters less. The engine isn’t ranking your site against competitors. It’s checking whether you exist, whether your facts match what it learned from Layers 1 and 2, and whether it can quote you reliably.

Schema markup is high-leverage on Layer 3. A LocalBusiness schema block with proper @id linking, sameAs references to your social profiles, and complete service area declarations gives the engine a clean structured fact-set to verify against.

What AI engines specifically look for in a business

Beyond the three-layer hierarchy, six specific signals show up repeatedly in our analysis of which businesses get mentioned and which get skipped:

1. Entity clarity. The business needs to be a recognizable entity, not just a string. Consistent name across the web. Unique enough to disambiguate from other businesses with similar names. Specific enough that a model can pin a knowledge graph node to it.

2. Location specificity. For local businesses, the engine needs to know exactly where you operate. Not “the Front Range” but “Boulder, Longmont, Louisville, Lafayette.” The engine matches user queries against location strings, and vague service-area declarations get skipped.

3. Service or product specificity. The engine needs to know exactly what you do. “Marketing services” loses to “SEO and content strategy for medical aesthetics practices in the Mountain West.” The more specific the description, the more queries the engine can confidently match you to.

4. Recency. Businesses that have published, been mentioned, or earned reviews recently get weighted higher. The 5W Index data shows that ChatGPT’s citations skew 56% toward sources from the past year. Stale presence loses ground to fresh presence.

5. Cross-source consistency. When the engine cross-references your business across Layers 1, 2, and 3, the facts should match. NAP (name, address, phone) consistency matters, but so does service description consistency, founder name consistency, and brand-language consistency. Conflicts trigger reduced confidence.

6. Explicit recommendations from real humans. This is the Reddit factor in concentrated form. When real people in real threads write the words “I use [business name] and they’re great because [specific reason],” the engine treats that as a high-quality citation signal. Generic positive reviews matter less. Specific recommendations from named users in real conversations matter most.

Engine-specific patterns we’ve observed

The six signals are universal, but each engine weights them differently. Here’s what we’ve seen in side-by-side testing across hundreds of business-related queries:

ChatGPT

ChatGPT leans heavily on Wikipedia and Bing’s real-time index. For business queries, the engine triangulates: Wikipedia and large publications for the entity itself, then Bing search results for current details, then Reddit for sentiment. Businesses that show up well in Bing organic results (which substantially overlap with Google but aren’t identical) get an edge in ChatGPT mentions.

ChatGPT also has a specific bias toward .gov and .edu domains. If your business is mentioned on any official government or academic page, that mention gets disproportionate weight in ChatGPT’s responses. Local economic development agencies, state licensing boards, and university extension programs are all high-value Layer 1 placements specifically for ChatGPT.

Claude

Claude’s training data has a strong skew toward legacy journalism and long-form quality writing. The platform leans toward the New York Times, Atlantic, New Yorker, and Economist for context. Businesses that have been profiled in legacy outlets, even briefly, get an outsized presence in Claude responses.

Claude’s training cutoff also matters. Until the next major training run, Claude’s knowledge of newer businesses is limited unless those businesses get mentioned in real-time retrieval (which Claude supports through tools, but with weaker emphasis than Perplexity). Businesses founded recently need to invest more heavily in Layer 1 and Layer 2 presence to show up in Claude reliably.

Perplexity

Perplexity is the engine most receptive to recent web content. Its citation distribution (46.7% Reddit, plus heavy weight on primary research and niche B2B authorities) makes it the most accessible for smaller businesses with strong Layer 2 presence. A business that gets named in Reddit threads, mentioned in industry-specific forums, and covered in niche B2B publications can dominate Perplexity citations even without Wikipedia or NYT presence.

Perplexity also weights primary research heavily. NIH, PubMed, official industry studies, and academic publications carry strong weight. For service categories where research backing is relevant (medical, legal, financial), citing peer-reviewed sources on your own site, and getting your work cited in others’ research, matters more in Perplexity than in ChatGPT.

Gemini and Google AI Overviews

Gemini and AI Overviews track Google’s organic index more closely than the other engines. A page that ranks in Google’s top 10 has a substantially higher probability of being cited in AI Overviews than a page that doesn’t. Seer Interactive’s late-2025 data showed 99% of AI Overview citations came from the organic top 10.

This is the engine where traditional SEO investment pays off most directly. Pages that win the SEO fight on Google tend to win the AI Overview citation fight too. The relationship isn’t perfect (some top-10 pages don’t get cited, some lower-ranking pages do), but the correlation is high enough that “rank well in Google” remains the cleanest path to Gemini mentions.

What changes when you actually do this work

We’ve worked with businesses that went from zero AI mentions to consistent mentions across two or more engines. The pattern is repeatable but slow. Here’s the realistic timeline:

PhaseTimelineWhat happens
FoundationMonths 1-2Layer 3 cleanup: schema markup, entity consistency, NAP across web
Reputation seedingMonths 2-4Layer 2 work: legitimate Reddit presence, review acquisition, niche forum participation
Authority buildingMonths 4-9Layer 1 work: chamber/directory placements, local press, industry mentions
First mentions appearMonths 6-12Engines start mentioning the business in relevant queries
Consistent mentionsMonths 12-18Mentions become reliable across most relevant query variations

The honest read: AI mentions are a slower asset than search rankings. The trade-off is that once you have them, they’re stickier. Search rankings can drop overnight on an algorithm update. AI mentions, anchored across all three layers, tend to compound and persist.

Common mistakes when trying to get mentioned by AI

The first mistake is investing only in Layer 3. The business builds a beautiful site with perfect schema markup and waits to be discovered. The engines never have enough Layer 1 or Layer 2 signal to triangulate, and the site sits ignored. Fixing your own site is necessary but not sufficient.

The second mistake is treating Reddit and Quora as channels for promotional posting. The engines have learned to discount obviously promotional content. The accounts that get cited are the ones that participate genuinely over months, with most of their activity being non-promotional, and occasional posts where their business is contextually relevant.

The third mistake is chasing AI mentions while neglecting traditional SEO. For Gemini and AI Overviews specifically, traditional ranking remains the dominant signal. Investing in AI mentions while letting Google rankings slide is trading certain pain for hypothetical gain.

The fourth mistake is expecting fast results. AI engines are slow to learn about new businesses, especially businesses that haven’t earned Layer 1 placements. The 6-12 month timeline isn’t artificial. It reflects how long it takes for genuine authority signals to register across all three layers.

What this means for businesses today

The market is in an unusual moment. The AI mention game is being shaped now, and most businesses haven’t started playing it. The business that invests in the three layers in 2026 will have a structural advantage in 2027 and 2028 that’s hard to displace, because the engines will have learned to mention them, and changing that learned behavior takes years.

The business that waits until AI mentions are obviously the dominant referral channel will be 2-3 years behind, trying to catch up to competitors with established Layer 1 and Layer 2 presence.

This is the same dynamic that played out in early Google SEO between 2003 and 2008. The businesses that built authoritative content and earned legitimate links during those years still rank today. The businesses that ignored search until 2012 are still trying to catch up. AI mentions are following the same pattern, on a faster timeline.

Frequently asked questions

How does ChatGPT decide which businesses to recommend?

ChatGPT pulls from Wikipedia (47.9% of its citations), Bing’s real-time index, Reddit, and major publications like Forbes and Business Insider. For business recommendations, the model triangulates across these sources, weights .gov and .edu mentions heavily, and prefers businesses with consistent entity information across the web. Businesses without Wikipedia presence can still get mentioned, but they need strong showing across other authoritative sources.

Why does Perplexity cite Reddit so heavily?

Perplexity’s design favors recent, conversational, real-user content, and Reddit produces that at scale in a structure the engine can read cleanly. About 46.7% of Perplexity’s citations come from Reddit threads, where real users name specific businesses, products, and experiences. For local and consumer businesses, a strong Reddit presence (in your category’s subreddits, contributing genuinely) is the highest-leverage Perplexity tactic.

Can a small business get mentioned by AI without press coverage?

Yes, but the path is different. Without press coverage (Layer 1 authority), the business needs to compensate through strong Layer 2 (Reddit, reviews, niche forums) and Layer 3 (clean website, structured data, consistent entity information). Bone Voyage Dog Rescue earned consistent AI mentions despite never being covered by major US press, because it built deep Layer 2 presence and was mentioned on Layer 1 sources tied to the cities where it operated.

Is there a way to know exactly why an AI mentioned a competitor and not me?

Not exactly, but close. Tools like Profound, ZipTie, and PeecAI track AI citations across engines and let you compare which sources each cited business is anchored to. Manual auditing across the three layers (search for your competitor on Wikipedia, search Reddit for their name, audit their schema markup) usually surfaces the gap. The gap is almost always Layer 1 or Layer 2 presence, not Layer 3.

Do AI engines update their knowledge of businesses in real time?

It depends on the engine. Perplexity and Gemini use real-time retrieval, so changes to your web presence can affect mentions within days. ChatGPT (with browsing enabled) uses a hybrid model that pulls real-time data alongside training-cutoff knowledge. Claude has a training cutoff and updates more slowly, with newer business information only fully available after the next major training run. For real-time competitive moves, focus on Perplexity and Gemini.

Does paying for sponsored content help with AI mentions?

It depends on the source. Sponsored content on Layer 1 sources (top publications) often counts as a real mention, because the engines can’t always distinguish sponsored from editorial coverage. Sponsored content on lower-quality sources doesn’t help and can hurt by associating your business with content the engines have learned to discount. We don’t recommend paid placement as a primary AI mention strategy. The work doesn’t compound.

How often should I check my AI mention status?

Monthly is reasonable for most businesses. Daily monitoring becomes useful only when you’re running active campaigns or testing new strategies. The signal moves slowly enough that monthly snapshots capture the trends, and weekly snapshots add noise without much information. Use a tracker (Profound, ZipTie, PeecAI) for monitoring, and supplement with manual checks across the engines you care about most.

Building durable AI presence

The honest read: AI mention strategy is the most important new SEO discipline of the last decade, and the playbook is still being written. We’re learning alongside our clients, updating our framework as the engines update their behavior, and treating each audit as an opportunity to refine the model.

The Three-Layer Source Hierarchy is the most concrete framework we’ve found for thinking about AI mention work. It’s not a magic formula. It’s a way of asking “where is my business actually represented in the data the AI engines consume?” and then closing the gaps systematically.

If you want help auditing your current presence across the three layers, identifying which gaps are worth closing, and building a 12-month plan for consistent AI mentions, that’s the work we do. The mentions compound. The earlier you start, the harder you are to displace later.


Internal links to add:

  • what-makes-content-citation-worthy-for-ai-search
  • what-is-entity-seo
  • what-is-schema-markup
  • best-seo-strategy-for-local-businesses-2026

Schema markup: Article + FAQPage. Generated at build time from frontmatter.

Want this done for your business?

The free AI visibility audit takes ten minutes on your end and shows you exactly where you stand: in Google, in Maps, and in AI search. No pitch, no obligation.

Get my free AI visibility audit