How does llemmy report AI visibility?

As a rate, not a yes or no. Visibility is the share of AI answers (on unbranded prompts) that mention your brand, measured over your recent answers and shown with the sample size (n) and a 95% Wilson confidence interval, so you can see how solid the number is. A single scan that says appeared or not appeared is a coin flip dressed up as a fact, because LLMs are non-deterministic.

Does a confidence interval mean the number is accurate?

No, and we are explicit about this. A confidence interval quantifies sampling noise (how stable the rate is at a given n). It does not fix bias. If the measurement surface differs from what a given user sees, a tighter interval just makes a biased number look more confident. So we also tell you how we query each engine and which biases we do not eliminate: API versus the logged-in app, personalization, model drift, and geography beyond what you configure.

Does llemmy query the real ChatGPT app or the API?

For ChatGPT, Claude, Gemini and Perplexity, llemmy queries the official model APIs. Google AI Overviews are read from the search results page. The model API is not the same surface as a logged-in consumer app with memory and personalization, so we report it as what it is: the model's answer to a defined prompt, in a defined location, at a point in time, not a claim about what every individual user sees.

How llemmy measures AI visibility (and what we don't claim)

The short version

We report presence as a rate, not a single yes or no, because LLMs are non-deterministic.
Every headline number carries its sample size (n) and a 95% Wilson confidence interval, shown right under the number in the app and in client reports.
A confidence interval handles noise, not bias. We name the biases it cannot fix instead of hiding them.
We separate branded from unbranded prompts, build the headline on earned (unbranded) prompts, and record every source each engine cited so you can open the evidence.

AI-visibility tooling has an honesty problem. A tool that tells you "17% share of voice, rank 3" as if it were a stable fact is selling precision that the underlying system cannot support. LLMs are non-deterministic: run the same prompt three times and you can get three different answers. So we built llemmy's reporting around what is actually measurable, and we are equally clear about what is not.

Presence is a rate, with a sample size and a confidence interval

Your visibility is the share of AI answers that mention your brand: mentions divided by the number of answers measured (n). Because that is a proportion estimated from a finite, noisy sample, we attach a 95% Wilson score interval (the Wilson interval is the right one for small samples, where the textbook normal approximation breaks down). In the app you see it inline, for example:

Visibility 42% · n=210 · 95% CI 35-49%

A wide interval is a signal to collect more answers before reading too much into the number. Share of voice (your mentions divided by all brand mentions across the same answers) gets the same treatment, with its own n. We would rather show you an honest "70%, n=18, 95% CI 60-80%" than a confident-looking "70%" that one extra noisy day could swing.

What a confidence interval does NOT fix

This is the part most tools skip. A confidence interval tells you how stable a measurement is at a given sample size. It says nothing about whether you are measuring the right thing. A biased sample, measured more times, just produces a more confident wrong number. So here is where our numbers come from, plainly:

API vs the real app. We query the official model APIs for ChatGPT, Claude, Gemini and Perplexity, and read Google AI Overviews from the search results page. The model API is not identical to a logged-in consumer app with memory and personalization. We report the model's answer to a defined prompt, not a claim about what each individual user sees.
Personalization. We query from a consistent context, so our numbers are not personalized to any one user's history. That is a feature for comparability and a limit for realism, and we are not going to pretend otherwise.
Model drift. We capture the exact resolved model version on every answer, so drift is visible in your data. We do not silently restate history when a provider ships a new model.
Geography. You can set a location context per project; by default queries are global. Visibility is only as local as what you configure.

We treat these as limits to name, not problems to claim away. If a vendor tells you they have "solved" the gap between an API and a logged-in app, that is the overclaiming you should be skeptical of.

Branded vs unbranded, and the evidence behind every number

Prompts that name your brand return a mention close to 100% of the time by construction, so including them would inflate your headline. llemmy classifies prompt intent and excludes branded prompts from the headline visibility score, building it on unbranded, buyer-intent prompts where a mention is earned. Branded prompts still matter, so they flow to a separate sentiment and accuracy view rather than the headline.

And we record the sources each engine cited on every answer, so a number is never a black box: you can open the underlying answers and the domains behind them.

What we will never do

We will not present a precise, stable number without the sample size, the prompt set, the engine, and the window behind it. We will not bury the methodology, and we will not claim to have eliminated biases that no API-based tool can eliminate. Directional truth, honestly bounded, beats false precision. That is the whole point of llemmy.

Questions about any of this, or think we have a measurement wrong? We would genuinely like to hear it. Related reading: What AI engines actually cite, AI share of voice, measured without fooling yourself, and How to track your brand across AI engines.

Presence is a rate, with a sample size and a confidence interval

What a confidence interval does NOT fix

Branded vs unbranded, and the evidence behind every number

What we will never do

See how AI describes your brand