29 MCP tools and the Tier system — how we designed for token budgets

MCP tools are easy to build. Building them so an LLM doesn't blow through its context window on the first few calls is harder. Here's the Tier 1/1.5/2 system we landed on and why.


# 29 MCP tools and the Tier system — how we designed for token budgets The naive approach to an MCP knowledge tool: user asks Claude to find something, Claude calls `search`, Claude gets back full document content, done. Works great until you have 20 documents and Claude's context window is half gone after the first search. Context window isn't unlimited. Every token in a tool response is a token that can't go to reasoning or output. An MCP server that dumps full document content into every response is actively hostile to the LLM using it. We thought about this upfront, which is why we have the Tier system. ## The problem we were solving Say you search for "Cloudflare Workers" and get 10 results. If each document averages 2,000 tokens, you've just consumed 20,000 tokens — before Claude has even decided which one is relevant. That's most of a context window, gone. What the LLM actually needs for search results: enough to decide which documents are worth reading. Not the full documents. So Tier 1 gives you exactly that: ```json { "id": "bnd_abc", "name": "Cloudflare Workers limits", "summary": "Overview of CPU time limits, bundle size, memory constraints...", "keyPoints": ["50ms CPU limit (free plan)", "10MB bundle max", "No filesystem"], "contentTokenCount": 2840, "textUrl": "https://bind.ly/@bindly/cloudflare-workers-limits?format=md" } ``` ~300 tokens. The LLM knows what's in the document, whether it's relevant, and exactly how expensive it would be to load the full content (`contentTokenCount: 2840`). That last field is important — the LLM can decide "I have 4,000 tokens left, loading this fits, do it" rather than fetching blind. ## Tier 1.5 — the preview layer we almost didn't build After shipping Tier 1 and Tier 2, we found a gap. Sometimes Tier 1 isn't enough to know if a document is relevant. The title sounds right, the summary sounds right, but is the actual content what you need? Tier 1.5 adds `contentPreview` — the first 500 characters of the document, plus a truncation note. It's stored in D1 (`version_metas.content_preview`), not fetched from R2. Same performance as Tier 1, but the LLM gets a real taste of the content before committing to a full fetch. The truncation note matters: ``` ...Workers run in V8 isolates, not Node.js. The distinction matters because Workers deliberately restricts the Node.js API surface... (... truncated 8,640 characters. Request tier 2 for full content.) ``` That `8,640 characters` is the LLM's signal for how much more there is. If the preview already answered the question, stop. If you need more, now you know the cost. ## Batch operations — the N+1 problem for tool calls Tool calls have fixed overhead: the call itself, the response parsing, the context it uses. Fetching 5 bindings one-by-one costs 5× the overhead of fetching them in one call. `mcp_get_bindings({ ids: ["bnd_a", "bnd_b", "bnd_c"], tier: "1" })` — one call, three results. Simple to implement, significant in practice when you're building a Set or doing any multi-document workflow. `mcp_get_set_context` is the more ambitious version. You tell it your token budget: ```typescript mcp_get_set_context({ setId: "set_xyz", maxTokens: 8000, tier: "1.5" }) ``` It packs as many Set versions as fit within 8,000 tokens, in position order, and tells you if it ran out of budget. One call to load an entire curated collection. Without this, loading a 12-item Set would be: `mcp_get_set` to list version IDs, then 12× `mcp_get_version`. Thirteen calls versus one. ## What's in every response Every MCP tool response includes a `_meta.context` block: ```json { "_meta": { "context": { "binding": { "contentTokenCount": 2840, "publicUrl": "https://bind.ly/@bindly/...", "textUrl": "https://bind.ly/@bindly/...?format=md" }, "agentMeta": { "source": "llm", "model": "claude-opus-4-6" } } } } ``` `publicUrl` and `textUrl` let the LLM cite sources correctly. `agentMeta` records which model created or retrieved this — so if Claude creates a Binding, it's tagged as `source: "llm"`, `model: "claude-opus-4-6"`. Humans editing via the web UI get `source: "human"`. Provenance is first-class. ## The tool we almost built but didn't We had `mcp_get_diff` in v0 — given two version numbers, return a git-style unified diff. Seemed useful for "what changed between version 2 and 3?" We cut it in v1. The LLM can fetch both versions and compare them. Building a diff endpoint means maintaining diff generation code, deciding diff format, handling edge cases. The benefit over "fetch both, compare" was marginal. When in doubt, don't add tools. The full list ended up at 29: Binding (7), Version (2), Set (7), Space (2), Share (3), Comment (3), Utility (5). Each one exists because we needed it in a real workflow. None of them are there because "might be useful someday."