Marcus P. | mcp-seo.com

RAG Needs Semantic Not Divs: The API of the Agentic Web

November 24, 2025 by Marcus P. #RAG #Semantic HTML #Grounding #content chunking #Technical SEO

In the rush to build “AI-Powered” search experiences, engineers have hit a wall. They built powerful vector databases. They fine-tuned state-of-the-art embedding models. They scraped millions of documents. And yet, their Retrieval-Augmented Generation (RAG) systems still hallucinate. They still retrieve the wrong paragraph. They still confidently state that “The refund policy is 30 days” when the page actually says “The refund policy is not 30 days.”

Why? Because they are feeding their sophisticated models “garbage in.” They are feeding them raw text stripped of its structural soul. They are feeding them flat strings instead of hierarchical knowledge.

PageRank and LLMs: From Search Rankings to Training Weights

November 23, 2025 by Marcus P. #PageRank #LLM Training #Grounding #SEO Strategy #Algorithms #Math

An in-depth analysis of how PageRank has evolved from a simple search ranking signal to a critical component in Large Language Model (LLM) training and RAG grounding. We explore the math, the history, and the future of link-based authority in the Agentic Web.

The Structural Deficit: Why LLMs Crave Schema.org in Training

November 23, 2025 by Marcus P. #LLM Training #Schema.org #Structured Data #Data Pipeline #Technical SEO

An analysis of how Large Language Models ingest and utilize structured data during pre-training, moving beyond ’text-only’ ingestion to understanding the semantic backbone of the intelligent web.

Semantic HTML is LLM Training Fuel: Why 'Div Soup' Poisons Models

November 15, 2025 by Marcus P. #LLM Training #HTML Structure #Boilerplate Detection #Data Structures #Technical SEO

In the early days of the web, we were told to use Semantic HTML for accessibility. We were told it allowed screen readers to navigate our content, providing a better experience for the visually impaired. We were told it might help SEO, though Google’s engineers were always famously coy about whether an <article> tag carried significantly more weight than a well-placed <div>.

In 2025, that game has changed entirely. We are no longer just optimizing for screen readers or the ten blue links on a search results page. We are optimizing for the training sets of Large Language Models (LLMs).

The Ultimate Guide to Fixing Indexing Errors in Google Search Console

November 5, 2025 by Marcus P. #General SEO #Search Console

Seeing the “Excluded” number rise in your Page Indexing report is enough to give any SEO anxiety. But in the modern agentic web, indexing issues are often diagnostic tools rather than failures. They tell you exactly how Google perceives the value of your content.

This guide decodes the most common error statuses and provides actionable fixes.

The Big Two: Discovered vs. Crawled

The most confusing distinction in GSC is between “Discovered” and “Crawled.” They sound the same, but they mean very different things for your infrastructure.

Debugging Agent Crawls with Server Logs

October 15, 2025 by Marcus P. #Search Console #Log Analysis #Crawling

Google Search Console (GSC) has historically been the dashboard of record for SEOs. But in the agentic era, GSC is becoming a lagging indicator. It often fails to report on the activity of new AI agents, RAG bots, and specialized crawlers. To truly understand how the AI ecosystem views your site, you must return to the source: Server Logs.

The Limitations of GSC

GSC is designed for Google Search. It tells you little about how ChatGPT (OpenAI), Claude (Anthropic), or Perplexity are interacting with your site. If GPTBot fails to crawl your site due to a firewall rule, GSC will never tell you.

Scraper Best Practices: The Etiquette of the Agentic Web

October 15, 2025 by Marcus P. #Scraping #Python #robots.txt #User-Agent #Ethics #Technical SEO

A comprehensive guide to scraping without getting blocked. We cover User-Agent protocols, robots.txt parsing libraries, safe crawl rates, and the ethical controls that define a ‘Good Bot’ in the Agentic Era.

GSV vs. SOV: Why the Metric Changed

October 9, 2025 by Marcus P. #Generated Share of Voice (GSV) #Search Metrics #Generative Search

Marketing executives love “Share of Voice” (SOV). It was an easy metric: “We possess 30% of the visibility on the first page of Google for these keywords.” This meant if there were 10 links and 4 ads, you showed up in 4 spots.

Generated Share of Voice (GSV) is a different beast. It is a “winner-take-all” metric.

The Collapse of Real Estate

In a Generative Search Experience (SGE / AI Overview), there is usually only one answer generated. That answer might contain 3-4 citations.

The Death of the Backlink? Not Quite.

October 9, 2025 by Marcus P. #Link building #PageRank

“Backlinks are dead!” cries the SEO clickbait. “AI doesn’t need links!” This is technically false. Reports of the backlink’s death are exaggerated, but its role has definitely changed.

Discovery vs. Authority

In the past, links were for Authority (PageRank). Today, links are primarily for Discovery. Without links, a crawler cannot find your URL to add it to the training set. If you are an orphan page, you do not exist.

Content Density vs. Length: What Agents Prefer

October 1, 2025 by Marcus P. #Content marketing #Information Density #Token Efficiency #Content chunking

For the last decade, the mantra of content marketing has been “Long-Form Content.” Creating 3,000-word “Ultimate Guides” was the surest way to rank. But as the consumers of content shift from bored humans to efficient AI agents, this strategy is hitting a wall. The new metric of success is Information Density.

The Context Window Constraint

While context windows are growing (128k, 1M tokens), they are not infinite, and more importantly, “reasoning” over long context is expensive and prone to “Lost in the Middle” phenomena.