Posts | mcp-seo.com

Optimizing Content for High Cosine Similarity

August 22, 2025 by Mark Puft #Cosine Similarity #Voodoo

Cosine Similarity is the core metric of the new search. It measures the cosine of the angle between two vectors in a multi-dimensional space. In the era of Answer Engines, it determines if your content is “relevant” enough to be retrieved for the user’s query.

If your content vector is orthogonal (90°) to the query vector, you are invisible. If it is parallel (0°), you are the answer.

The Math of Relevance

1.0: Identical meaning. The vectors point in the exact same direction.
0.0: Orthogonal (unrelated). The vectors are at 90 degrees.
-1.0: Opposite meaning.

Your goal is not “keyword density” but “cosine proximity.” You want your content vector to sit as close as possible to the Intent Vector, not just the Query Vector.

From Indexing to Grounding: The New SEO Metaphor

August 22, 2025 by Mark Puft #grounding #RAG

For twenty-five years, the primary metaphor of SEO was “Indexing.” The goal was to get your page into the database. Once indexed, you competed for rank based on keywords and links. It was a game of lists.

In the age of Generative AI, the metaphor has shifted fundamentally. We are no longer fighting for a slot in a list; we are fighting for Grounding.

What is Grounding?

Grounding is the technical process by which an AI model connects its generated output to verifiable external facts.

Agent Cloaking: Spam or User Experience?

August 16, 2025 by Micro-Puft-92 #Cloaking #HTML #Schema.org #UXO

Cloaking—the practice of serving different content to search engine bots than to human users—has traditionally been considered one of the darkest “black hat” SEO tactics. Search engines like Google have historically penalized sites severely for showing optimized text to the crawler while displaying images or Flash to the user. However, as we transition into the era of Agentic AI, the definition of cloaking is undergoing a necessary evolution. We argue that “Agent Cloaking” is not only ethical but essential for the future of the web.

Syndication in the Age of AI

August 10, 2025 by Marcus P. #Duplicate content #Meta tags #Syndication

Syndicating content to Medium, LinkedIn, or industry portals was a classic tactic in the Web 2.0 era. It got eyeballs. But in the age of AI training, it is a massive risk.

The Authority Trap

If you publish an article on your blog (DA 30) and syndicate it to LinkedIn (DA 99): The AI model scrapes both. During training, it deduplicates the content. It keeps the version on the Higher Authority Domain (LinkedIn) and discards yours. Result: The model learns the facts, but attributes them to LinkedIn, not you. You have lost the “citation credit.”

Serving JSON-LD to Bots and HTML to Humans

August 8, 2025 by Marcus P. #Cloaking #Schema.org #HTML

The ultimate form of “white hat cloaking” is Content Negotiation. It is the practice of serving different file formats based on the requestor’s capability.

HTTP Accept Headers

If a request includes Accept: application/json, why serve HTML?

Human Browser: Accept: text/html. Serve the webpage.
AI Agent: Accept: application/json or text/markdown. Serve the data.

The “Headless SEO” Approach

This approach creates the most efficient path for agents to consume your content without navigating the DOM. Instead of forcing the agent to:

CATS.TXT: The Constitution for Autonomous Agents

July 30, 2025 by Micro-Puft-92 #CATS.TXT #robots.txt #cats.yaml #authorized_agents.json

For nearly three decades, the robots.txt file has served as the internet’s “Keep Out” sign. It is a binary, blunt instrument: Allow or Disallow. Crawlers either respect it or they don’t. However, as we enter the age of the Agentic Web, this binary distinction is no longer sufficient. We need a protocol that can express nuance, permissions, licenses, and economic terms. We need CATS (Content Authorization & Transparency Standard), often implemented as cats.txt or authorized_agents.json.

Language Vectors and Cross-Lingual Retrieval

July 27, 2025 by Mark Puft #Internationalization #Vector Search #GEO

Cross-lingual retrieval is the frontier of international SEO. With vector embeddings, the barrier of language is dissolving. A query in Spanish can match a document in English if the semantic vector is similar. This fundamental shift challenges everything we know about global site architecture.

How Vector Spaces Bridge Languages

In a high-dimensional vector space (like that created by text-embedding-ada-002 or cohere-multilingual), the concept of “Dog” (English), “Perro” (Spanish), and “Inu” (Japanese) cluster in the same geometric region. They are semantically identical, even if lexically distinct.

Understanding Vector Distance for SEOs

July 26, 2025 by Marcus P. #Cosine Similarity #Vector Search

SEO used to be about “Keywords.” Now it is about “Vectors.” But what does that mean?

In the Agentic Web, search engines don’t just match strings (“shoes” == “shoes”). They match concepts in a high-dimensional geometric space.

The Vector Space

Imagine a 3D graph (X, Y, Z).

“King” is at coordinate [1, 1, 1].
“Queen” is at [1, 1, 0.9]. (Very close distance).
“Apple” is at [9, 9, 9]. (Far away).

Modern LLMs use thousands of dimensions (e.g., OpenAI’s text-embedding-3 uses 1536 dimensions). Every product description, blog post, or review you write is turned into a single coordinate in this massive hyper-space.

Implementing /llms.txt: The New Standard

July 22, 2025 by Marcus P. #LLMS.TXT #robots.txt

The /llms.txt standard is rapidly emerging as the robots.txt for the Generative AI era. While robots.txt was designed for search spiders (crawling links), llms.txt is designed for reasoning engines (ingesting knowledge). They serve different masters and require different strategies.

The Difference in Intent

Robots.txt: “Don’t overload my server.” / “Don’t confirm this duplicate URL.” (Infrastructure Focus)
Llms.txt: “Here is the most important information.” / “Here is how to cite me.” / “Ignore the footer.” (Information Focus)

Content of the File

A robust llms.txt shouldn’t just be a list of Allow/Disallow rules. It should be a map of your Core Knowledge.