Directing Agents with LLMS.TXT

While robots.txt tells a crawler where it can go, llms.txt tells an agent what it should know. It is the first step in “Prompt Engineering via Protocol.” By hosting this file, you are essentially pre-prompting every AI agent that visits your site before it even ingests your content.

This standard is rapidly gaining traction among developers who want to control how their documentation and content are consumed by coding assistants and research bots.

Read more →

Implementing CATS Protocols for Ethical Scraping

The ethical debate around AI training data is fierce. “They stole our content!” is the cry of publishers. “It was fair use!” is the retort of AI labs. CATS (Content Authorization & Transparency Standard) is the technical solution to this legal standoff.

Implementing CATS is not just about blocking bots; it is about establishing a contract.

The CATS Workflow

  1. Discovery: The agent checks /.well-known/cats.json or cats.txt at the root.
  2. Negotiation: The agent parses your policy.
    • “Can I index this?” -> Yes.
    • “Can I train on this?” -> No.
    • “Can I display a snippet?” -> Yes, max 200 chars.
    • “Do I need to pay?” -> Check pricing object.
  3. Compliance: The agent (if ethical) respects these boundaries.

Signaling “Cooperative Node” Status

Search engines of the future constitutes a “Web of Trust.” Sites that implement CATS are signaling that they are “Cooperative Nodes.” They are providing clear metadata about their rights.

Read more →

CATS.TXT: The Constitution for Autonomous Agents

For nearly three decades, the robots.txt file has served as the internet’s “Keep Out” sign. It is a binary, blunt instrument: Allow or Disallow. Crawlers either respect it or they don’t. However, as we enter the age of the Agentic Web, this binary distinction is no longer sufficient. We need a protocol that can express nuance, permissions, licenses, and economic terms. We need CATS (Content Authorization & Transparency Standard), often implemented as cats.txt or authorized_agents.json.

Read more →

Implementing /llms.txt: The New Standard

The /llms.txt standard is rapidly emerging as the robots.txt for the Generative AI era. While robots.txt was designed for search spiders (crawling links), llms.txt is designed for reasoning engines (ingesting knowledge). They serve different masters and require different strategies.

The Difference in Intent

  • Robots.txt: “Don’t overload my server.” / “Don’t confirm this duplicate URL.” (Infrastructure Focus)
  • Llms.txt: “Here is the most important information.” / “Here is how to cite me.” / “Ignore the footer.” (Information Focus)

Content of the File

A robust llms.txt shouldn’t just be a list of Allow/Disallow rules. It should be a map of your Core Knowledge.

Read more →