Turn any YouTube video into a searchable AI knowledge base — install the free Chrome extension. Add to Chrome
How RAG citations differ from web-search citations

How RAG citations differ from web-search citations

· by marcin · grounding, citations, rag, product

A web-search citation points at a URL. A RAG citation points at the passage that actually produced the answer. That’s the whole difference, and it changes how much work the user has to do to trust what they’re reading. ChatGPT and Perplexity will hand you five links and ask you to verify the claim by reading them; a RAG system already retrieved a specific chunk of a specific document, used that chunk to write the answer, and cites the chunk itself. In SageTube, the chunk is a clip inside a YouTube video, and the citation is the timestamp.

What a web-search citation is

When an LLM-with-web-search answers a question, the flow is: the model issues a query, the search engine returns ten URLs, the model reads what it can fetch, and the answer cites those URLs. The citation is a pointer to a page. Where in that page the supporting sentence lives — or whether it lives there at all — is left to the reader.

That’s a real improvement over uncited LLM output. You at least know which sources the model claims to have used, and you can spot-check them. But the verification cost is high. To confirm one factual claim, you open a 3,000-word article and ctrl-F for the relevant phrase. If the article paraphrases or restructures the idea, ctrl-F misses it and you read more. In practice, most users skim the answer, glance at the citations, and trust on vibes.

There’s also a quieter failure mode: the model can cite a URL it didn’t actually use. The citation is a textual reference — [1] — and the model is fluent enough to drop one wherever it sounds natural. Web-search systems mitigate this by re-running retrieval against the cited page after generation, but the link between answer text and cited evidence remains soft.

What a RAG citation is

Retrieval-augmented generation reverses the order. Before the model writes anything, the system retrieves a small set of passages — chunks of documents that vector search judged relevant to the query — and the model is constrained to compose its answer from those passages. Every claim it makes is supposed to trace back to a specific chunk in the retrieved set.

The citation in a RAG answer is a pointer to the chunk, not to the source document as a whole. In SageTube’s corpus, that chunk is roughly 350 tokens of transcript — about 60 to 90 seconds of speech — with the video ID and the start/end timestamps attached. When you click the citation in the chat UI, the player opens at the start timestamp and the clip plays. You’re verifying against the exact text the model worked from.

The grounding is much tighter for a structural reason: the model doesn’t see the rest of the video. The retrieval step is the bottleneck — only the top-k chunks reach the prompt. If a chunk wasn’t retrieved, the model has no way to cite it, because it has no way to know it exists. The set of possible citations is a closed, small, inspectable list before generation even starts.

Hallucinated citations: how RAG closes the door

Web-search systems harden citations by re-verifying URLs after the fact. RAG systems can do something stronger: they can refuse to emit any citation that wasn’t in the retrieved set, because that set is finite and known.

SageTube enforces this at the answer layer. After the model returns its structured output, AnswerValidator walks every evidence_refs array attached to every claim and strips any reference that doesn’t match a chunk in the retrieved set — see AnswerValidator::stripAllUnknownRefs in webapp/app/Services/AnswerValidator.php. If a model invents an S7 when only S1 through S5 were retrieved, S7 is gone before the answer is rendered. If stripping a phantom reference empties a claim’s evidence list, the claim drops; if it empties the whole answer, the validator rewrites the response as unsupported and the user sees: “I do not have enough source support to answer reliably from the current corpus.”

That’s a small piece of code doing a load-bearing job. Web-search systems can’t easily do the same — their “retrieved set” is the entire open web, so there’s no clean rule for what counts as a valid citation. The closed corpus is what makes the discipline possible.

Why this matters more for video than for text

For a written article, the gap between page-level and passage-level citation is annoying but bridgeable: you can ctrl-F, you can re-read. For a video, the gap is severe. A URL points you at a 45-minute lecture; the moment that supports the claim is somewhere inside it. Without a timestamp, the citation effectively hands the verification problem back to you, just with a slightly smaller search space.

Timestamps fix that. A 30-second clip is checkable in 30 seconds. The verification cost drops from “scrub through this video” to “press play, listen.” The proprietary footprint behind this on SageTube today: 113,396 child chunks averaging 356 tokens each, indexed across 17,317 transcribed videos from 135 channels — every chunk carries a timestamp, and every answer paragraph the platform produces points at one of them.

A practical test

The next time an AI assistant gives you an answer, look at what it cites and ask: how long would it take me to verify one claim?

If the citation is a URL, the answer is “minutes” — you have to open the page, find the relevant section, and check that the model’s paraphrase matches the source.

If the citation is a passage — a paragraph, a transcript chunk, a timestamp — the answer is “seconds.” You read or watch the cited bit and you’re done.

That gap, multiplied across every claim in every answer, is the difference between AI you double-check and AI you don’t. RAG citations make the double-check cheap enough that it stops being theoretical.


Related: Why every AI answer needs a citation makes the broader argument that uncited LLM output is a productivity trap. For the full SageTube citation architecture, see Citations, Timestamps, and Trust: How SageTube Answers Are Grounded. Try an Expert with citations live at SageTube.

Frequently asked questions

What's the difference between a RAG citation and a web-search citation?
A web-search citation points at a URL — usually a multi-thousand-word page you have to skim for the supporting sentence. A RAG citation points at a specific passage inside a corpus the system has already retrieved: the exact text the model used to compose the answer. In SageTube, that passage is a 30–90 second clip in a YouTube video, anchored to a start and end timestamp.
Why is passage-level citation more trustworthy than page-level?
Because the verification cost is much lower. Skimming a 4,000-word article to find the supporting sentence takes minutes; playing the 30-second clip the AI actually used takes seconds. Lower verification cost means more users actually verify, which is the only mechanism that makes citations meaningful instead of decorative.
Can a RAG system invent a citation?
It can try. SageTube blocks it server-side: `AnswerValidator` inspects every evidence reference the model emits and strips any reference that isn't in the retrieved evidence set. A phantom citation never reaches the user — at worst, the answer is shortened or relabelled `unsupported`.
Why don't search-engine citations work for video?
Because the unit of citation is wrong. A URL points at an entire video; the supporting moment is somewhere inside 50 minutes of speech. Without a timestamp, the citation hands the verification problem back to the user. The only useful citation for a video is one that opens the player at the right second.
SageTube

Begin Your
Expert Journey

Create an account to build intelligent AI experts and transform how you learn.

or

Already have an account? Sign in

One more step

Please accept our Terms of Service to complete your sign-in with Google.

SageTube SageTube Support
SageTube

Hi! I'm SageTube's AI assistant. Ask me anything about the product, billing, or troubleshooting.