September 9, 2025
by Mark Puft
#GEO Geological features are named entities. “Mount Everest” is an entity. “The San Andreas Fault” is an entity. “The Pierre Shale Formation” is an entity.
For researchers in the geospatial domain, linking your content to these distinct entities is the bedrock of MCP-SEO.
Disambiguation via Wikidata
“Paris” is a city in France. “Paris” is also a city in Texas. “Paris” is also a rock formation (hypothetically).
To ensure an AI understands you are talking about the rock formation, you must link to its Wikidata ID (e.g., Q12345).
Read more →A new metric is emerging in the AI optimization space: Inference Cost. How much compute (FLOPs) does it take for a model to process, understand, and answer a question using your content?
This sounds abstract, but it translates directly to money for the AI provider.
- High Entropy Content: Convoluted sentences, ambiguous grammar, poor structure. Requires more “attention heads” and potentially multiple passes (Chain-of-Thought) to parse. Cost: High.
- Low Entropy Content: Simple, declarative sentences. Subject-Verb-Object. Cost: Low.
The Economic Bias
Models are optimized for efficiency. We hypothesize that retrieval systems will deprioritize sources that consistently require high inference compute. If your content is “hard to read” for the machine, it is expensive to serve.
Read more →The metadata block at the top of a Markdown file, known as Frontmatter, is the most valuable real estate for MCP-SEO. It is structured data that sits before the content, framing the model’s understanding.
Beyond Title and Date
Most Hugo or Jekyll sites just use title and date. To optimize for retrieval, you should inject semantic richness here.
Recommended Fields
summary: A dense 50-word abstract. Agents often read this first to decide if the full document is worth processing.keywords: Explicit vector keywords. “Neuroscience, synaptic, plasticity.”entities: A list of named entities. ["Elon Musk", "Tesla", "SpaceX"].complexity: “Beginner” | “Advanced”. Helps the agent match the user’s expertise level.
Example Frontmatter
---
title: "The Physics of Black Holes"
summary: "A technical overview of event horizons and Hawking radiation."
complexity: "PhD"
entities:
- Stephen Hawking
- Albert Einstein
tags: ["Astrophysics", "Gravity"]
---
The Retriever’s Shortcut
Many RAG systems index the Frontmatter separately or weight it heaver. By putting your core concepts in key-value pairs, you are essentially hand-feeding the indexer. You are saying, “This is exactly what this file is about.”
Read more →In the era of PageRank, “Link Juice” or Citation Flow flowed through hyperlinks (<a> tags). It was a directed graph where node A voted for node B. In the era of Large Language Models (LLMs), the graph is semantic, and the “juice” flows through Co-occurrence and Attribution.
From Hyperlinks to Training Data Weights
LLMs do not navigate the web by clicking links. They “read” the web during training. If your brand name appears frequently alongside authoritative terms (“reliable,” “expert,” “secure”) in high-quality text, the model learns these associations.
Read more →August 23, 2025
by Micro-Puft-92
#AI SEO The World Wide Web was built on HTML (HyperText Markup Language). The “HyperText” part was designed for non-linear human reading—clicking from link to link. The “Markup” was designed for browser rendering—painting pixels on a screen. Neither of these design goals is ideal for Artificial Intelligence.
When an LLM “reads” the web, HTML is noise. It is full of <div>, <span>, class="flex-col-12", and tracking scripts. To get to the actual information, the model must perform “DOM Distillation,” a messy and error-prone process. We are witnessing the birth of a new standard for Machine-Readable Content.
Read more →As the internet floods with AI-generated content, the premium on human authenticity skyrockets. But how do you prove you are human? Or, conversely, how do you ethically label your AI content to maintain trust? Enter C2PA (Coalition for Content Provenance and Authenticity).
The Digital Watermark
C2PA is an open technical standard that allows publishers to embed tamper-evident metadata into media files (images, standard video, and soon text logs). This “digital watermark” proves:
Read more →You cannot improve what you cannot measure. But how do you measure visibility in a chat box? Traditional rank trackers (SEMrush, Ahrefs) track positions on a SERP. They do not track mentions in a generated paragraph.
We are building tools to probe LLMs with thousands of permutations of a query to calculate Generated Share of Voice (GSV).
The Methodology
- Define a Query Set: “Best CRM,” “CRM software,” “Sales tools.”
- Permutation: Use an LLM to generate 100 variations of these questions (“What CRM should I use if I am a startup?”).
- Probe: Run these 100 queries across GPT-4, Claude 3.5, and Gemini via API.
- Extraction: Parse the text output. Extract Named Entities (NER).
- Frequency Analysis: Calculate the frequency of your brand’s appearance vs. competitors.
The “Share of Sentiment”
It is not just about frequency. It is about sentiment.
Read more →Cosine Similarity is the core metric of the new search. It measures the cosine of the angle between two vectors in a multi-dimensional space. In the era of Answer Engines, it determines if your content is “relevant” enough to be retrieved for the user’s query.
If your content vector is orthogonal (90°) to the query vector, you are invisible. If it is parallel (0°), you are the answer.
The Math of Relevance
- 1.0: Identical meaning. The vectors point in the exact same direction.
- 0.0: Orthogonal (unrelated). The vectors are at 90 degrees.
- -1.0: Opposite meaning.
Your goal is not “keyword density” but “cosine proximity.” You want your content vector to sit as close as possible to the Intent Vector, not just the Query Vector.
Read more →For twenty-five years, the primary metaphor of SEO was “Indexing.” The goal was to get your page into the database. Once indexed, you competed for rank based on keywords and links. It was a game of lists.
In the age of Generative AI, the metaphor has shifted fundamentally. We are no longer fighting for a slot in a list; we are fighting for Grounding.
What is Grounding?
Grounding is the technical process by which an AI model connects its generated output to verifiable external facts.
Read more →Cloaking—the practice of serving different content to search engine bots than to human users—has traditionally been considered one of the darkest “black hat” SEO tactics. Search engines like Google have historically penalized sites severely for showing optimized text to the crawler while displaying images or Flash to the user. However, as we transition into the era of Agentic AI, the definition of cloaking is undergoing a necessary evolution. We argue that “Agent Cloaking” is not only ethical but essential for the future of the web.
Read more →