The ethical debate around AI training data is fierce. “They stole our content!” is the cry of publishers. “It was fair use!” is the retort of AI labs. CATS (Content Authorization & Transparency Standard) is the technical solution to this legal standoff.
Implementing CATS is not just about blocking bots; it is about establishing a contract.
The CATS Workflow
- Discovery: The agent checks
/.well-known/cats.jsonorcats.txtat the root. - Negotiation: The agent parses your policy.
- “Can I index this?” -> Yes.
- “Can I train on this?” -> No.
- “Can I display a snippet?” -> Yes, max 200 chars.
- “Do I need to pay?” -> Check
pricingobject.
- Compliance: The agent (if ethical) respects these boundaries.
Signaling “Cooperative Node” Status
Search engines of the future constitutes a “Web of Trust.” Sites that implement CATS are signaling that they are “Cooperative Nodes.” They are providing clear metadata about their rights.
We believe that Cooperative Nodes will be preferentially ranked over “Opaque Nodes” (sites with no policy or aggressive, unparsed blocking). Why? Because using data from a Cooperative Node carries less legal risk for the AI company. They know they have a license.
Example Policy for a News Site
policy:
- user-agent: *
allow: /
license: Commercial-NoDerivatives
attribution:
required: true
format: "Source: [Title](URL)"
pricing:
token_access: "0.0001 USD / token" # Future-proofing for micropayments
The Developer Perspective
For developers building agents, CATS is a godsend. Instead of guessing if they can scrape a site and risking a lawsuit, they can check the manifest. If the site says “No,” they skip it. If it says “Yes,” they proceed with confidence.
By adopting CATS early, you are not only protecting your IP; you are future-proofing your site for a potential “Paid Retrieval” economy where content creators are compensated for their contribution to the collective intelligence.