The metadata block at the top of a Markdown file, known as Frontmatter, is the most valuable real estate for MCP-SEO. It is structured data that sits before the content, framing the model’s understanding.
Beyond Title and Date
Most Hugo or Jekyll sites just use title and date. To optimize for retrieval, you should inject semantic richness here.
Recommended Fields
summary: A dense 50-word abstract. Agents often read this first to decide if the full document is worth processing.keywords: Explicit vector keywords. “Neuroscience, synaptic, plasticity.”entities: A list of named entities.["Elon Musk", "Tesla", "SpaceX"].complexity: “Beginner” | “Advanced”. Helps the agent match the user’s expertise level.
Example Frontmatter
---
title: "The Physics of Black Holes"
summary: "A technical overview of event horizons and Hawking radiation."
complexity: "PhD"
entities:
- Stephen Hawking
- Albert Einstein
tags: ["Astrophysics", "Gravity"]
---
The Retriever’s Shortcut
Many RAG systems index the Frontmatter separately or weight it heaver. By putting your core concepts in key-value pairs, you are essentially hand-feeding the indexer. You are saying, “This is exactly what this file is about.”
Frontmatter as a “Prompt Injection”
Think of your Frontmatter as a legal implementation of “Prompt Injection.” You are instructing the system on how to behave.
Advanced Tactic: The “Persona” Field
Some experimental RAG agents are respecting a persona field.
persona: "Technical Reference"
persona: "Marketing Overview"
By tagging your content with its intended persona, you allow agents to filter effectively. “I am a coding bot; I only want Technical Reference pages.” This prevents your marketing fluff from polluting the code-generation context, which reduces hallucination and increases user satisfaction.
Glossary of Terms
- Agentic Web: The specialized layer of the internet optimized for autonomous agents rather than human browsers.
- RAG (Retrieval-Augmented Generation): The process where an LLM retrieves external data to ground its response.
- Vector Database: A database that stores data as high-dimensional vectors, enabling semantic search.
- Grounding: The act of connecting an AI’s generation to a verifiable source of truth to prevent hallucination.
- Zero-Shot: The ability of a model to perform a task without seeing any examples.
- Token: The basic unit of text for an LLM (roughly 0.75 words).
- Inference Cost: The computational expense required to generate a response.