Case Study: The Token Tax: What Site Complexity Costs Your AI Agent

The Question

When an AI agent browses the web on your behalf, every page it reads costs tokens. Tokens cost money and, more importantly, they displace reasoning. An agent spending 80% of its context window ingesting raw HTML has 20% left to think.

We ran the same research task through two different browsing tools to measure the real cost difference: Playwright (traditional browser automation using accessibility snapshots) and Charlotte (an MCP server implementing LEAN principles for token-efficient content delivery).

The question wasn't "which tool is better." It was: when does token efficiency start to matter?

Methodology

Two rounds. Same model (Claude Opus 4.6, 1M context). Same prompts. Clean context between every run. No hints, no intervention, no memory of the other run.

The only variable was the browsing tool available to the agent.

Round 1: Simple Site

Target: clocktowerassoc.com (6 pages, server-side rendered, clean semantic HTML)

Task: Research the company's services, specifications, case studies, and contact form. Produce a structured report.

Round 2: Complex Site

Target: wikipedia.org (articles averaging 100,000-400,000+ characters of DOM per page)

Task: Research the Voyager 1 space probe across multiple Wikipedia articles. Answer six specific questions requiring navigation to at least 4 different pages and synthesis of information spread across all of them.

Both tasks required multi-page navigation, data extraction, and synthesis. The difference was page weight.

Round 1 Results: The Simple Site

Metric	Playwright	Charlotte
Total input tokens	~75,000	~95,000
Total output tokens	~3,500	~4,500
Total tokens	~78,500	~99,500
Estimated cost	$1.65	$1.90
Tool calls	9	17
Pages accessed	6	7
Task completion	5/5	5/5
Time	~3 min	~2 min

Playwright won. By about 21,000 tokens and $0.25.

On a small, well-structured site with clean server-rendered HTML, Playwright's accessibility snapshots are already a compressed representation of the DOM. Charlotte's LEAN layering (orientation fetches, content resolution negotiation, semantic search) adds protocol overhead that doesn't amortize across six lightweight pages. More tool calls, more context per call, for the same result.

Both agents answered all five questions completely, with comparable quality. Playwright caught one additional detail Charlotte missed (broader service capabilities mentioned on the homepage but not the services page).

This is exactly the scenario the LEAN specification warns about: "For exhaustive extraction, full-page audits, or five-call sessions, the overhead may not be worth it."

Round 2 Results: The Complex Site

The first Playwright tool call told the story before the task even started:

playwright - Navigate to URL (url: "https://en.wikipedia.org/wiki/Voyager_1")
⎿ Error: result (402,647 characters) exceeds maximum allowed tokens.

402,647 characters from a single page load. Before the agent navigated anywhere else, before it answered a single question, one page consumed the bulk of the available context.

Here's how the full runs compared:

Metric	Playwright	Charlotte
Total input tokens	~550,000	~350,000
Total output tokens	~8,000	~25,000
Total tokens	~558,000	~375,000
Estimated cost	$8.85	$6.75
Tool calls	~35	31
Pages accessed	4	5
Task completion	6/6	6/6
Time	~5 min	~4 min

Charlotte consumed 36% fewer input tokens while visiting more pages.

Charlotte accessed five Wikipedia articles to Playwright's four and still used 200,000 fewer input tokens. Total token savings: 33%. Cost savings: $2.10 per task, or 24%.

Both agents answered all six questions comprehensively. Quality was comparable, with both producing detailed reports covering launch history, planetary discoveries, current distance, Golden Record contents, instrument status, and key personnel.

Where the Savings Came From

Content resolution. Charlotte delivers page content at the orientation level first, a compressed summary of structure and key information rather than the full DOM. The agent drills into specific sections on demand. Playwright returns the entire accessibility tree of every page on every load.

Navigation efficiency. Charlotte's orientation view gives the agent enough structural understanding to make targeted follow-up requests. Playwright's agent had to parse through hundreds of thousands of characters of accessibility tree markup to find the information it needed, then use chunked extraction via Python to break the oversized results into processable pieces.

Cumulative context pressure. Each Playwright page load pushed large volumes of content into the context window. Across four pages, this accumulated to over half a million input tokens. Charlotte's approach kept the active context smaller at each step.

Where Charlotte Paid a Price

Charlotte's output tokens were higher (25,000 vs 8,000). Wikipedia's page sizes were large enough that even Charlotte's content delivery hit output limits, forcing the agent into a workaround pattern: saving results to disk and extracting specific sections with Python scripts. This added 15 bash calls to the tool count. The information got through, but the operational overhead of working around output size limits inflated both the call count and the output token cost.

This is an honest limitation. Charlotte's content resolution handles page weight well on the input side, but the output pipeline needs better handling of extremely large pages. That said, even with the workaround overhead, the total token count was still 33% lower.

The Crossover Point

	Simple Site	Complex Site
Winner	Playwright	Charlotte
Token difference	+21,000 (Charlotte higher)	-183,000 (Charlotte lower)
Cost difference	+$0.25	-$2.10

LEAN has a crossover point, and it's determined by page complexity.

On a six-page brochure site with clean, lightweight HTML, the protocol overhead of orientation fetches, content resolution layers, and semantic search costs more than it saves. The pages are small enough that ingesting them whole is cheap.

On content-heavy sites where individual pages carry 100,000+ characters of DOM, the economics invert. The overhead of LEAN's layered approach is trivially small compared to the savings from not ingesting full page DOMs repeatedly. A 36% reduction in input tokens across just four pages translates directly to lower cost and, crucially, more context window available for the agent to reason.

What This Means at Scale

The Wikipedia test involved 4-5 pages. A realistic agent workflow across a large site (product catalog, documentation portal, enterprise intranet) could involve 50-100+ pages in a single session.

If the per-page savings from Round 2 hold across a longer session, the math compounds:

Session Length	Playwright (projected)	Charlotte (projected)	Savings
5 pages	~558,000 tokens	~375,000 tokens	33%
50 pages	~5.5M tokens	~3.75M tokens	~1.75M tokens
100 pages	~11M tokens	~7.5M tokens	~3.5M tokens

At Opus 4.6 input pricing ($15/MTok), 3.5 million saved tokens is $52.50 per session. Across hundreds of agent sessions per day, this is the difference between agentic workflows being economically viable and being cost-prohibitive.

These are projections. Real-world variance depends on page sizes, task complexity, and how much of each page the agent actually needs. But the directional claim holds: on content-heavy sites, token-efficient content delivery isn't a nice-to-have. It's an economic prerequisite for agents operating at scale.

Takeaways

Simple sites don't need token optimization. If your pages are lightweight and your agent sessions are short, traditional browser automation works fine. The overhead of a LEAN-aware tool doesn't pay for itself.

Complex sites do. The moment page weight crosses a threshold, roughly where individual pages exceed 50,000-100,000 characters of DOM, the economics flip. Every page load without content resolution is a tax on the agent's context window, its budget, and its ability to reason.

Most real-world sites are complex. The average e-commerce product page, SaaS documentation site, or enterprise portal is closer to Wikipedia than to a six-page brochure. If you're building agent workflows against real websites, token efficiency is an architectural concern, not an afterthought.

Tools Used

Playwright MCP — Open-source browser automation via Microsoft's Playwright framework, using accessibility snapshots for page content delivery.

Charlotte — Open-source MCP server implementing LEAN (Layered Efficiency for Agentic Navigation) across four layers: tool availability, content resolution, element resolution, and response resolution. Reference implementation for the LEAN specification. github.com/Clocktower-and-Associates/charlotte

LEAN Specification — github.com/Clocktower-and-Associates/LEAN

Full specifications: clocktowerassoc.com/specs

Published by Clocktower & Associates. Methodology, prompts, and full transcripts available on request.