GEO playbook

From crawl to citation: how an AI answer gets built

SEO practitioners already think in a pipeline: crawl, index, rank. AI answers have their own version, and it is worth learning as its own model, because each stage is a separate place to win or lose. An engine crawls your page, retrieves it into an answer, and then cites you, or cites a competitor instead. Miss the first stage and the last two are impossible.

The three stages, and why they are separate

A citation is the visible end of a chain most tools only show you the tip of. Pulling the stages apart is the whole game.

The reason to separate them: a drop in visibility has a different fix at each stage. If bots cannot reach you, no amount of better content helps. If you are crawled but never retrieved, the content is the problem. If you are retrieved but a rival is cited, you are close, and the fix is specificity.

Stage 1: get crawled

This is the webmaster's stage, and the most common own-goal. Things to check:

Stage 2: get retrieved

Being fetched is table stakes. Retrieval is about being the page an engine wants to build an answer from. Two things move the needle here.

Be answer-shaped. Retrieval favors pages that already contain the answer in a liftable form: a clear heading that matches the question, a direct paragraph under it, a list, a table, a specific number. Content an engine has to infer loses to content it can lift.

Be verifiably fresh. When answers are built by retrieval, recency is a real signal, and engines can only see the freshness you emit. A page rewritten last month with no dateModified reads as abandoned. We covered the evidence and the fix in content freshness and AI citations.

Stage 3: get cited

You are retrieved and still not named. This is the closest and most frustrating gap: the engine looked at you and picked someone else. It almost always comes down to specificity.

AI answers cite the source that states the exact thing the answer needs. The page with a specific, dated, attributable figure gets quoted; the page that said "many buyers now use AI" does not, even if it ranks higher in classic search. Own the specific claim: the stat, the definition, the named comparison, the dated fact. That is what gets pulled into the sentence.

If you want a structured way to find these gaps, that is exactly what an AI citation gap analysis is for: the prompts where a rival is named and you are not.

Reading the timing: correlation, not proof

Here is where the chain gets genuinely useful, and where it is easy to fool yourself. Because each stage is time-stamped, you can watch the sequence: an engine crawled your page on the 3rd, first cited it on the 10th. A seven-day lag. Do that across your pages and you get a feel for how long your content takes to travel from fetched to cited, and whether a change actually shortened it.

What you cannot do is call that proof. A shorter lag after a freshness push is a strong hint, not a verdict. Plenty else moved in those seven days. The honest way to use this is the way a good analyst uses any time series: as evidence to prioritize and investigate, not a single cause to declare. Show the timing, label it as timing, and let the reader make the call. For how we keep every number honest, see how we measure.

One caveat worth stating plainly: "first cited" means the first citation you observed. If you started tracking a prompt last week, a page cited months ago will look artificially late. Treat first-observed as a floor, not a birth certificate.

The one-page checklist

The teams that win AI search are not the ones with the most content. They are the ones who can see the whole chain and fix the exact stage that is broken. That is the loop llemmy is built to run: watch the crawl, the retrieval and the citation on one timeline, and hear the day something moves.

See how AI describes your brand

Run a free GEO audit — no signup needed to see your score — or start tracking your brand across every AI engine.