Level 1 Agentic Cloaking: Recognizing Agentic Browsers via HTTP and JavaScript

The web architectural landscape is experiencing a profound transition from deterministic human browsing to semantic-driven, autonomous traversal. In previous analyses, such as Agentic Cloaking: Introducing AXO (Part 1) and Level 0 Agentic Cloaking with Static Web Content, we established the foundational concepts of serving specialized content to agents versus humans. However, before you can effectively cloak or route content, you must first answer a critical question: Who—or what—is actually requesting this page?

Recognizing agentic browsers and AI agents goes far beyond checking just the User-Agent string and navigator.webdriver. Modern bot detection and anti-scraping systems (like Cloudflare, DataDome, and advanced WAFs) use dozens of signals across HTTP headers and client-side JavaScript environments. The landscape in 2026 has split into two very different problems: detecting traditional automation frameworks (like Selenium, Puppeteer, or Playwright) and identifying modern AI agentic browsers (like ChatGPT Agent, Google Mariner, and Claude Computer Use).

This comprehensive research article serves as the definitive guide to Level 1 Agentic Cloaking, focusing strictly on identifying automated and agentic browsers using nothing but basic JavaScript and HTTP request headers. We will purposely exclude behavioral signals (such as mouse movements or typing dynamics) and proprietary detection libraries, focusing purely on what the browser tells you—either willingly or through its unavoidable architecture.

1. The Basics: User-Agent and navigator.webdriver

For years, the first line of defense—and the first layer of detection—has relied on two primary signals: the User-Agent string sent in the HTTP request and the navigator.webdriver property exposed in the JavaScript runtime. While these are often trivial to bypass today, they remain the foundational checks for any agentic cloaking system.

The navigator.webdriver Property

According to the official W3C WebDriver Specification, browsers controlled by automation frameworks (such as Selenium, Puppeteer, or Playwright) must expose a read-only navigator.webdriver property set to true. Human-operated browsers return false or undefined. This boolean flag was designed explicitly to allow servers and applications to know when they are communicating with an automated client.

// Basic client-side check for webdriver activity
if (navigator.webdriver === true) {
  console.log("Browser is under automation control.");
  // Trigger Level 1 Agentic Cloaking routing logic
}

When does this return true?

Chrome/Chromium: Whenever the browser is launched with --enable-automation, --headless, or --remote-debugging-port=0.
Firefox: When the marionette.enabled preference or --marionette flag is present.
Safari: When automation via safaridriver is engaged.

Despite its status as a W3C standard, this signal is frequently bypassed by sophisticated scrapers using flags like --disable-blink-features=AutomationControlled or by overwriting the property using JavaScript getter manipulation. However, for baseline detection, it correctly flags the vast majority of default automation setups.

The User-Agent String

If a browser is launched in standard headless mode without deliberate spoofing, the User-Agent string usually includes the word HeadlessChrome. A typical headless Chrome UA string looks like this:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36

Server-Side Detection Example (Node.js):

function detectHeadlessUA(req) {
    const ua = req.headers['user-agent'] || '';
    if (ua.includes('HeadlessChrome')) {
        return true; // Headless Chrome detected
    }
    if (ua.includes('PhantomJS')) {
        return true; // Legacy PhantomJS detected
    }
    return false;
}

While the User-Agent string is notoriously fragile and easily changed by the client, it serves as an excellent low-cost server-side filter before engaging heavier JavaScript checks. For more details on the expected behavior of the Navigator API, consult the MDN Navigator API Documentation.

2. Advanced navigator Properties

When simple flags are bypassed, bot detectors look for anomalies where the browser claims to be a standard desktop browser but its internal environment lacks the characteristics of a fully-fledged consumer endpoint. These inconsistencies arise because headless implementations often strip out features to save memory and processing power.

Empty or Hardcoded Plugin Arrays

Real browsers typically have plugins registered, such as the ubiquitous Chrome PDF Viewer. In older standard headless modes, navigator.plugins.length is 0. While modern headless architectures (“new headless”) attempt to mimic the headed browser more closely, discrepancies often still exist.

// A simple plugin length consistency check
if (navigator.plugins.length === 0 && navigator.userAgent.includes("Chrome")) {
    console.warn("Suspicious: Chrome claims to have zero plugins.");
}

Missing or Contradictory Languages

A real human browser almost always has preferred languages configured, reflecting the user’s operating system or manual browser settings. Headless agents sometimes expose an empty array for navigator.languages, or the JavaScript value fundamentally contradicts the Accept-Language HTTP header sent to the server.

// Client-side language check
if (!navigator.languages || navigator.languages.length === 0) {
    console.warn("Suspicious: No preferred languages exposed by the browser.");
}

Hardware Constraints: Concurrency and Memory

Properties like navigator.hardwareConcurrency (number of logical processors) and navigator.deviceMemory (RAM in gigabytes) sometimes default to values that do not make sense for the spoofed environment. For example, if a bot claims its User-Agent belongs to a high-end Mac but exposes only a single core and 1GB of RAM (a typical minimal cloud container configuration), this inconsistency is a strong indicator of automation.

3. Framework-Specific JavaScript Markers

The automation frameworks themselves leave distinctive “fingerprints” in the JavaScript window or document objects. Because these tools must communicate between the Node.js/Python backend and the browser runtime, they often inject global variables or modify the DOM in detectable ways.

Selenium and ChromeDriver

ChromeDriver famously injects variables starting with cdc_ or $cdc_ into the DOM. These variables are essentially internal markers used for coordination.

// Detecting Selenium / ChromeDriver artifacts
function detectSelenium() {
    const seleniumGlobals = [
        '_selenium', 
        '_Selenium_IDE_Recorder',
        '__selenium_evaluate', 
        '__selenium_unwrap'
    ];
    
    // Check for standard globals
    for (const prop of seleniumGlobals) {
        if (prop in window) return true;
    }
    
    // Check for the randomized cdc_ globals in the document object
    for (const key in document) {
        if (key.match(/\$cdc_/)) return true;
    }
    
    return false;
}

If you examine the Chromium source code for ChromeDriver, you can see these prefixes hardcoded into the communication layer.

Playwright

Playwright, the modern successor to Puppeteer for many automation tasks, uses injected bindings to bridge the external controller and the browser context. This often leaves behind specific window variables.

// Detecting Playwright artifacts
function detectPlaywright() {
    return '__playwright__binding__' in window || '__pwInitScripts' in window;
}

Puppeteer and the Chrome DevTools Protocol (CDP)

Puppeteer and Playwright communicate with Chrome via the Chrome DevTools Protocol (CDP). Because of how CDP serializes objects to send them over the WebSocket connection, advanced bot detectors can deliberately trigger an error and check its stack trace.

When CDP’s Runtime.enable is active, executing console.log() on an object triggers serialization of getters. We can catch this serialization to detect the CDP connection.

// The Error Stack Serialization Trick (Highly Effective for CDP)
let cdpDetected = false;
const e = new Error();

Object.defineProperty(e, 'stack', {
  get() {
    cdpDetected = true;
    return "mocked_stack";
  }
});

// This console.log triggers CDP serialization if a tool like Puppeteer is attached
console.log(e);

if (cdpDetected) {
  console.log("CDP automation detected via serialization side-effect.");
}

This trick is one of the strongest JavaScript-based signals available in 2026, as the CDP layer is fundamentally embedded in how modern automation tools govern the browser rendering process.

Furthermore, headless Chromium often lacks the standard window.chrome extension APIs that are universally present in headed, user-facing Chromium distributions. Checking for the absence of window.chrome.runtime is another reliable heuristic.

4. HTTP Request Headers and Network Fingerprints

Server-side detection allows you to spot an agentic browser before it even executes a single line of JavaScript. This is highly efficient and forms the backbone of edge-layer Level 1 Agentic Cloaking.

The Missing Accept-Language

Headless Chrome frequently omits the Accept-Language HTTP header entirely when making requests. A normal Chromium browser operated by a human will almost never do this, as the operating system locales are passed down natively.

UA Client Hints (Sec-CH-UA)

Modern Chromium browsers send structured Sec-CH-UA HTTP headers (User-Agent Client Hints) instead of relying solely on the monolithic UA string. In headless mode, this header will literally explicitly announce the presence of a headless client:

Sec-CH-UA: "Not_A Brand";v="99", "HeadlessChrome";v="133", "Chromium";v="133"

Even if a bot developer spoofs the main User-Agent string, if they fail to spoof the Sec-CH-UA headers, the discrepancy is immediately obvious.

Sec-Fetch-* Metadata Inconsistencies

The Fetch Metadata Request Headers provide context about how a request was made. A normal human navigating to a page will generate:

Sec-Fetch-Site: none Sec-Fetch-Mode: navigate Sec-Fetch-Dest: document

Automated scripts that fetch pages using sub-optimal XHR requests or weird automated navigation routines may trigger contradictory headers, such as a full page HTML request carrying a Sec-Fetch-Dest: empty header.

TLS/SSL Fingerprinting (JA3 / JA4)

At the deepest level of HTTP requests lies the negotiation of the secure connection. The way an automation framework’s underlying network library (like Python’s requests or Node.js’s https module) negotiates its TLS handshake is fundamentally different from how a standard Chrome browser does it. Even if you fake every HTTP header flawlessly to look like a normal Mac browser, the TLS fingerprint (often measured via JA3 or JA4 hashes) will flag the traffic network library of origin.

5. Identifying the New Wave: AI Agentic Browsers

While traditional automation (Selenium, Playwright, Puppeteer) can be caught using the JavaScript and Header techniques detailed above, the landscape has evolved radically. AI agentic browsers—such as ChatGPT Agent, Google Mariner, and Anthropic’s Claude Computer Use—are entirely different beasts.

These AI agents use real Chromium browsers. Their User-Agent strings are identical to legitimate human traffic, they possess the correct window.chrome runtime, their navigator.webdriver flag is naturally false, and their Sec-CH-UA headers are perfectly consistent. You cannot detect them using traditional bot scripts.

So how do we identify them for Level 1 Agentic Cloaking?

IP Address Verification

The primary method relies on infrastructure analysis. Major AI companies publicly document the IP ranges their agents use to browse the web.

OpenAI publishes an updated JSON list of its egress IP addresses.
Google allows you to perform reverse DNS lookups on incoming IPs to verify if they resolve to *.googlebot.com or *.google.com.
Anthropic similarly publishes IP ranges for ClaudeBot.

HTTP Message Signatures (RFC 9421)

To allow webmasters to reliably verify that a request genuinely originated from their AI agent (and not a malicious actor spoofing an IP or headers), forward-thinking providers like OpenAI now utilize RFC 9421 HTTP Message Signatures.

When ChatGPT Agent Mode traverses a site, it signs the HTTP request:

Signature: sig1=:0uNhUvjtBPzJc8UFGSbw7pcrqytuNg...==:
Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1755779377;keyid="otMqcjr17m...";alg="ed25519"
Signature-Agent: "https://chatgpt.com"

To verify this on your server:

Extract the Signature-Agent header.
Fetch the public key from the provider’s well-known directory (e.g., https://chatgpt.com/.well-known/http-message-signatures-directory).
Cryptographically verify the signature against the incoming request headers using the exact algorithm specified in Signature-Input.

This cryptographic proof completely eliminates false positives and prevents spoofing. This is the gold standard for agentic identification in 2026.

6. Evasion Techniques and the Practical Limits of Detection

Detection is an arms race. For every metric identified, a counter-measure is developed.

Header Spoofing: Simple User-Agent and Accept-Language headers are easily spoofed. However, spoofing Client Hints (Sec-CH-UA) accurately across all entropy requests requires significantly more effort.
JavaScript Patching (Stealth Plugins): Plugins like puppeteer-extra-plugin-stealth actively patch properties like navigator.webdriver and inject mocked window.chrome objects before the page’s scripts run. However, detecting these patches (by looking for .toString() anomalies on native functions) becomes a meta-game of its own.

The Limits of Detection

Ultimately, full-stack emulation (running a true, headful browser inside a cloud container) reduces the network and TLS distinguishability to near zero. When an agent is using an identical technology stack to a human user, the only remaining differentiators are behavioral biometrics (how the mouse moves, how scrolling occurs) and IP origin.

Furthermore, legitimate headless usage is widespread. Continuous Integration (CI) systems, uptime monitors, and SEO auditing tools all rely on headless browsers. Hard blocking traffic based solely on the absence of a plugin or an aggressive JS signal will inevitably result in unacceptable business damage. Therefore, Level 1 Agentic Cloaking must use these signals to generate a risk score, not an absolute binary block list.

7. The Detection Pipeline: A Decision Flowchart

The following flowchart model reflects the multi-layered approach to recognizing agentic and headless browsers, moving from fast, easily spoofable signatures to complex consistency checks.

  flowchart TD
  A[Request Received] --> B[Server: Collect Header Signals]
  B --> C{High-confidence Token? HeadlessChrome/PhantomJS}
  C -->|Yes| D[Increase Score Significantly -> Challenge/Block based on context]
  C -->|No| E[Response delivers JS Sampler - First-party]
  E --> F[Client: Capture JS/Runtime/Render Signals]
  F --> G[Server: Consistency Checks - Header vs JS vs Render]
  G --> H{Score high?}
  H -->|High| I[Hard Challenge / Block + Telemetry]
  H -->|Medium| J[Soft Challenge - Rate-limit / Step-up Auth]
  H -->|Low| K[Allow + Observation]
  J --> L[Feedback Loop: Check for False Positives]
  I --> L
  K --> L

8. Detection Summary Matrix

The following table aggregates the primary non-behavioral signals used to identify agentic browsers, outlining their reliability and the core mechanism behind their effectiveness.

Detection Signal	Observability	Typical Evidence	Reliability	Notes on Evasion & False Positives
navigator.webdriver	JavaScript	`navigator.webdriver === true`	Medium	A W3C standard. Extremely reliable out of the box, but trivially patched or spoofed by stealth libraries.
HeadlessChrome UA Token	HTTP & JS	`User-Agent` string contains `HeadlessChrome/...`	Medium	High false positive rate if used strictly as an adversarial signal, as many legit QA tools use it. Easily spoofed.
Client Hints (Sec-CH-UA)	HTTP & JS	`sec-ch-ua` contains `"HeadlessChrome"`	High	Harder to spoof consistently alongside the main UA string, providing excellent consistency checks.
Playwright/Puppeteer Globals	JavaScript	`__pwInitScripts` or `__playwright__binding__` in `window`	High	Very specific to automation environments. Rarely causes false positives. Can be evasively patched.
Selenium/ChromeDriver Globals	JavaScript	`key.match(/\$cdc_/)` in `document`	High	Explicit framework leak. Highly reliable, but bot operators can compile custom ChromeDriver binaries to rename `cdc_`.
Accept-Language Anomalies	HTTP Header	Missing `Accept-Language` entirely	Low-Medium	Plausible deniability exists (privacy extensions, specific proxy setups), but strongly indicative of headless bots.
CDP Serialization Side-Effects	JavaScript	Error stack getter fired upon `console.log()`	Very High	Triggers on the fundamental communication protocol of modern automation. Hard to bypass without deep framework surgery.
HTTP Message Signatures	HTTP Header	`Signature-Agent`, `Signature-Input` according to RFC 9421	Definitive	Cryptographic proof used by advanced AI Agents (e.g., ChatGPT). Impossible to forge without the private key.

Conclusion

Level 1 Agentic Cloaking relies on a deep understanding of browser architecture, network protocols, and the mechanics of automation frameworks. While legacy indicators like the User-Agent string still offer value as a preliminary filter, modern detection has shifted towards cryptographic HTTP signatures (RFC 9421), CDP serialization side-effects, and intricate JavaScript environment consistency checks.

To successfully implement these strategies, system architects must evaluate the specific risk profile of their traffic. Overly aggressive blocking can result in significant collateral damage, cutting off legitimate accessibility tools, beneficial SEO scrapers, and enterprise monitoring services. Therefore, detection signals should feed into a nuanced risk-scoring engine rather than a crude binary firewall.

By intelligently layering these technical signals, publishers can reliably identify and categorize agentic traffic with minimal false positives. This capability is the fundamental prerequisite for serving optimized, machine-readable JSON-LD schemas to AI agents while simultaneously rendering rich, visually compelling HTML to human users. As the web fractures into a dual ecosystem, mastering this identification layer is no longer optional—it is the bedrock of Agentic SEO and the future of digital content distribution.

mcp-seo.com