As we build the Agentic Web, a confusing alphabet soup of standards is emerging. Three files, in particular, are vying for the attention of modern SEOs: llms.txt, cats.txt, and the new WebMCP protocol.
They often get confused, but they serve three distinct purposes in the lifecycle of an AI interaction. Think of them as Context, Contract, and Capability.
1. LLMS.TXT: The Context (What to Know)
- Role: Documentation for Robots.
- Location: Root directory (
/llms.txt). - Audience: Training crawlers and RAG agents.
llms.txt is essentially a Markdown file that strips away the HTML “cruft” of your website. It provides a clean, token-efficient summary of your content. It answers the question: “What information does this website hold?”
Read more →While robots.txt tells a crawler where it can go, llms.txt tells an agent what it should know. It is the first step in “Prompt Engineering via Protocol.” By hosting this file, you are essentially pre-prompting every AI agent that visits your site before it even ingests your content.
This standard is rapidly gaining traction among developers who want to control how their documentation and content are consumed by coding assistants and research bots.
Read more →In the high-stakes poker game of Modern SEO, llms.txt is the competitor’s accidental “tell.”
For two decades, we have scraped sitemaps to understand a competitor’s scale. We have scraped RSS feeds to understand their publishing velocity. But sitemaps are noisy—they contain every tag page, every archive, every piece of legacy drift. They tell you what exists, but they don’t tell you what matters.
The llms.txt file is different. It is a curated, high-stakes declaration of what a website owner believes is their most valuable information. By defining this file, they are explicitly telling OpenAI, Anthropic, and Google: “If you only read 50 pages on my site to answer a user’s question, read these.”
Read more →The /llms.txt standard is rapidly emerging as the robots.txt for the Generative AI era. While robots.txt was designed for search spiders (crawling links), llms.txt is designed for reasoning engines (ingesting knowledge). They serve different masters and require different strategies.
The Difference in Intent
- Robots.txt: “Don’t overload my server.” / “Don’t confirm this duplicate URL.” (Infrastructure Focus)
- Llms.txt: “Here is the most important information.” / “Here is how to cite me.” / “Ignore the footer.” (Information Focus)
Content of the File
A robust llms.txt shouldn’t just be a list of Allow/Disallow rules. It should be a map of your Core Knowledge.
Read more →