For nearly three decades, the robots.txt file has served as the internet’s “Keep Out” sign. It is a binary, blunt instrument: Allow or Disallow. Crawlers either respect it or they don’t. However, as we enter the age of the Agentic Web, this binary distinction is no longer sufficient. We need a protocol that can express nuance, permissions, licenses, and economic terms. We need CATS (Content Authorization & Transparency Standard), often implemented as cats.txt or authorized_agents.json.

Beyond Allow/Disallow: The Need for Nuance

The robots.txt standard cannot distinguish between “viewing” and “using.” A search engine might need access to index a page for discovery, but an AI model might need “usage rights” to train on it or “derivative rights” to summarize it in a chat interface.

Example scenarios where robots.txt fails:

  • “You can index this for search, but you cannot use it for generative training.”
  • “You can summarize this content, but you must attribute the author with a link.”
  • “You can access this API for free up to 100 requests, then you need a token.”

CATS provides a structured way to express these complex relationships. It is the legal layer for the semantic web.

The Syntax of Sovereignty

The CATS standard (currently in draft by the Agentic Protocols Group) favors YAML or JSON structures for readability and extensibility. Unlike the line-by-line parsing of robots.txt, CATS files are parsed as objects.

A Robust cats.yaml Example

version: 1.0
policy:
  - user-agent: *
    allow: /public/
    license: CC-BY-4.0
    attribution: required
    commercial_use: allowed_with_attribution

  - user-agent: GPTBasedAgents
    allow: /research/
    features:
      - training: denied
      - inference: allowed
      - summarization: allowed
      - context_window: "full"
    pricing:
      token_access: "0.0001 USD / token"

This configuration tells generic bots they can use content with attribution. However, it explicitly forbids training for GPT-based agents while allowing inference (RAG). This distinction is crucial for publishers who want to be found but not “consumed.”

Case Study: The Wikipedia Paradigm

Consider how Wikipedia manages its data. It is open, but expansive. If Wikipedia were to implement CATS, they could explicitly license their content for educational models while charging commercial models.

Early adopters in the academic publishing sector have used CATS to create “Research-Only” tiers of access. By inspecting the cats.txt file, academic agents (like those from semantic scholar) are granted deep access to PDFs, while commercial scrapers are blocked. This granular control has led to a 30% reduction in unauthorized commercial scraping while maintaining high visibility in academic search tools.

Implementation: The “Well-Known” Location

To implement CATS, you simply place the file at the root of your domain or in the /.well-known/ directory.

https://mcp-seo.com/cats.txt

By adopting this standard today, you signal to the AI ecosystem that you are a “Cooperative Node.” You are not just a passive resource to be mined; you are an active participant with rights and terms. In a future where autonomous agents have digital wallets, cats.txt will be the mechanism by which your content earns revenue. It is the constitution for your digital territory.