
# Cloudflare Service Bindings — why we stopped making HTTP calls between our own Workers

In v0, the MCP Worker called the API like this:

```typescript
const response = await fetch(`${env.BINDLY_API_BASE_URL}/api/bindings`, {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${userKey}` },
  body: JSON.stringify(body)
})
```

`BINDLY_API_BASE_URL` was `https://bindly-api.fly.dev`. A real HTTP request. DNS lookup, TCP connection, TLS handshake, cross-datacenter transit. On a good day, ~20ms overhead. On a bad day — cold start, network hiccup — much worse. An MCP tool that makes 3–4 API calls would accumulate 60–200ms of networking tax before doing any actual work.

Service Bindings remove all of that.

## What they actually do

In `wrangler.toml`:

```toml
[[services]]
binding = "API"
service = "bindly-api"
```

Now in the MCP Worker:

```typescript
const response = await env.API.fetch(
  new Request('https://api/api/bindings', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${token}` },
    body: JSON.stringify(body)
  })
)
```

The URL hostname (`https://api/...`) is ignored. Cloudflare routes the call directly to the `bindly-api` Worker inside its own network. Same region. No DNS. No TLS. No external transit. Sub-millisecond.

The API Worker receives this as a normal inbound request and processes it through Hono's router — it doesn't know or care whether the caller is a browser, another Worker, or a Service Binding. Same auth middleware, same business logic, same response format.

## Why the MCP Worker has no database access

The MCP Worker's `Env` interface:

```typescript
interface Env {
  API: Fetcher    // Service Binding to API Worker
  // No DB, no R2, no Vectorize
}
```

All data operations go through `env.API`. This is a deliberate architectural choice: the API Worker is the single data access point. Authorization checks, permission validation, rate limiting — all of that lives in the API Worker. If the MCP Worker had direct D1 access, we'd have to duplicate those checks or risk bypassing them.

The constraint also makes the MCP Worker simpler. It doesn't manage database connections, doesn't know the schema, doesn't care about migration state. It knows how to format MCP tool responses and how to call the API.

## Failure handling

When the API Worker fails, `env.API.fetch()` throws or returns a 5xx. There's no automatic retry — the calling Worker decides what to do:

```typescript
try {
  return await callApi('/api/mcp/bindings', body)
} catch (err) {
  return {
    content: [{ type: "text", text: "Bindly API is temporarily unavailable. Try again in a moment." }],
    isError: true
  }
}
```

The Gateway Worker handles API failures differently — it falls back to stale KV cache for public pages and adds `X-Cache-Status: stale-fallback` to the response. The MCP Worker just returns a clean error the LLM can relay to the user.

## Local development

In production, Cloudflare resolves Service Bindings automatically. Locally, Wrangler handles it:

```bash
# Start API Worker
cd v1/packages/api && wrangler dev --port 8788

# Start Gateway (wrangler reads [[services]] in wrangler.toml,
# finds 'bindly-api' running on the same machine, connects automatically)
cd v1/packages/gateway && wrangler dev --port 8787

# Start MCP Worker
cd v1/packages/mcp && wrangler dev --port 8789
```

You don't configure URLs. Wrangler discovers Workers running locally by their `name` field in `wrangler.toml`. The same code that does `env.API.fetch(...)` in production works identically in local dev. No environment-specific branches, no `if (process.env.NODE_ENV === 'development')` hacks.

## Why we still have 4 Workers instead of 1

The obvious question: if inter-Worker calls are cheap, why not merge everything into one Worker?

Independent deployability. A bug in the MCP Worker's tool formatting doesn't require redeploying the API Worker. A routing change in the Gateway doesn't touch MCP. A Gateway deploy takes ~5 seconds and the API keeps running unchanged.

For a solo developer shipping frequently, this matters. The blast radius per deploy is smaller. If a deploy breaks something, rollback is targeted to the Worker that changed.

The Service Binding communication overhead is effectively zero. So we get isolation without paying a latency penalty.