Mark Puft | mcp-seo.com

Implementing CATS Protocols for Ethical Scraping

November 20, 2025 by Mark Puft #CATS.TXT #Robots.txt #Ethics

The ethical debate around AI training data is fierce. “They stole our content!” is the cry of publishers. “It was fair use!” is the retort of AI labs. CATS (Content Authorization & Transparency Standard) is the technical solution to this legal standoff.

Implementing CATS is not just about blocking bots; it is about establishing a contract.

The CATS Workflow

Discovery: The agent checks /.well-known/cats.json or cats.txt at the root.
Negotiation: The agent parses your policy.
- “Can I index this?” -> Yes.
- “Can I train on this?” -> No.
- “Can I display a snippet?” -> Yes, max 200 chars.
- “Do I need to pay?” -> Check pricing object.
Compliance: The agent (if ethical) respects these boundaries.

Signaling “Cooperative Node” Status

Search engines of the future constitutes a “Web of Trust.” Sites that implement CATS are signaling that they are “Cooperative Nodes.” They are providing clear metadata about their rights.

The Uncanny Valley of AI Copywriting

November 20, 2025 by Mark Puft #AI Content

“Unleash your potential.” “In today’s digital landscape.” “Delve into the intricacies.” “It’s important to note.”

These phrases are the hallmarks of lazy AI content. They are the “Uncanny Valley” of text—grammatically perfect, but soul-less. They are also the first things a classifier detects.

The Classifier’s Job

Search engines and social platforms act as classifiers. They are constantly trying to label content as “Human” or “Machine.”

Machine Content: Often down-ranked or labeled as “Low Quality.”
Human Content: Given a “Novelty Boost.”

Escaping the Valley

To rank in an AI world, your content must sound idiosyncratic. Unpolished, voice-driven content is becoming a premium signal of humanity.

Hreflang for AI Agents: Does it Matter?

November 20, 2025 by Mark Puft #Internationalization #hreflang #GEO

In traditional SEO, hreflang tags were the holy grail of internationalization. They told Google: “This page is for French speakers in Canada.” But in a world where AI models are inherently polyglot, does this tag still matter?

The Polyglot LLM

Models like GPT-4 and Gemini are trained on multilingual datasets. They can seamlessly translate between English, Japanese, and Swahili. If a user asks a question in Spanish, the model can retrieve an English source, translate the facts, and generate a Spanish answer.

The Impact of RAG on Local Search

November 17, 2025 by Mark Puft #Geotargeting #RAG #Local SEO

Retrieval-Augmented Generation (RAG) is changing how local queries are answered. Query: “Where is a good place for dinner?”

Old Logic (Google Maps): Proximity + Rating.
RAG Logic: “I read a blog post that mentioned this place had great ambiance.”

The “Vibe” Vector

RAG introduces the “Vibe” factor. The model retrieves reviews, blog posts, and social chatter to construct a “Semantic Vibe” of the location.

Vector: “Cosy + Romantic + Italian + Brooklyn”.

Optimization Strategy

To rank in Local RAG, you need text that describes the experience, not just the NAP (Name, Address, Phone).

The Shift from Keywords to Contextual Vectors

November 9, 2025 by Mark Puft #General SEO #Topic authority #Vector SEO

The landscape of Search Engine Optimization (SEO) is undergoing a seismic shift. For decades, the primary mechanism of discovery was the keyword—a string of characters that users typed into a search bar. “Best shoes.” “Plumber NYC.” “Pizza near me.”

Today, with the advent of Large Language Models (LLMs) and vector databases, we are moving towards an era of contextual vectors.

The Vectorization of Meaning

In traditional SEO, matching “best running shoes” meant having those words on your page in the <title> tag and <h1>.

The Missing Reports in GSC for AI Traffic

November 1, 2025 by Mark Puft #Search Console

Google Search Console (GSC) is broken for the AI era. It was strictly designed for “Blue Link” clicks. It currently lumps AI Overview impressions into general search performance, or hides “zero-click” generative impressions entirely.

We estimate that 30% of informational queries are now satisfied by AI Overviews without a click. The user sees your brand, reads your snippet, learns the fact, and leaves.

Brand Impact: Positive (Awareness).
GSC Impact: Zero (No click).

This “Invisible Traffic” builds brand awareness but doesn’t show up in your analytics.

Rendering for Agents: Headless vs. API

October 26, 2025 by Mark Puft #JavaScript SEO #Rendering #HTML

Javascript-heavy sites have always been tricky for crawlers. For agents, the problem is compounded by cost. Running a headless browser to render React/Vue apps is expensive and slow.

The Economics of Rendering

HTML Fetch: $0.0001 / page.
Headless Render: $0.005 / page. (50x more expensive).

If you are an AI company crawling billions of pages, you will skip the expensive ones. This means if your content requires JS to render, you are likely being skipped by the long-tail of AI agents.

The Ouroboros Effect: AI Optimization for AI Consumption

October 13, 2025 by Mark Puft #AI SEO #Model Collapse #E-E-A-T

The Ouroboros is the ancient symbol of a snake eating its own tail. It is the perfect metaphor for the current state of the web. AI generates content -> Webmasters publish it -> AI scrapes it to train -> AI generates more content.

Model Collapse

Researchers warn of Model Collapse. If models train on their own output, the variance (creativity) of the model degrades. It becomes an echo chamber of “average” probability.

Entity Authority Construction through Digital PR

October 8, 2025 by Mark Puft #Digital PR #Knowledge Graph #E-E-A-T #Schema.org

In the past, Digital PR was about generating “buzz” and backlinks. Success was measured in placement volume and Domain Authority (DA). In the age of Semantic Search and AI, Digital PR is a precise engineering discipline: Entity Authority Construction.

Your goal is not just to get a link; it is to teach the Knowledge Graph who you are.

The Knowledge Graph Goal

Search engines like Google and Bing, and answer engines like Perplexity, organize information into Knowledge Graphs.

The Future of Sitemaps: From URLs to API Endpoints

October 4, 2025 by Mark Puft #Sitemaps #API #MCP #XML

The XML sitemap was invented in 2005. It lists URLs. But as we move towards Agentic AI, the concept of a “page” (URL) helps human navigation, but constrains agent navigation. Agents want actions.

The API Sitemap

We propose a new standard: the API Sitemap. Instead of listing URLs for human consumption, this file lists API endpoints available for agent interaction.

<url>
  <loc>https://api.mcp-seo.com/v1/check-rank</loc>
  <lastmod>2026-01-01</lastmod>
  <changefreq>daily</changefreq>
  <rel>action</rel>
  <openapi_spec>https://mcp-seo.com/openapi.yaml</openapi_spec>
</url>

This allows an agent to discover capabilities rather than just content.