
# Why we went all-in on Cloudflare

The v0 stack was Fly.io, Fastify, SQLite. It was fine for a while. Then it wasn't.

The specific breaking point: Fly.io autostops idle apps. When Claude called our MCP server mid-conversation, the container was asleep. It would take 10–30 seconds to wake up. Claude would time out. The user would see a tool error. We'd get blamed.

We tried disabling autostop. We tried always-on instances. The cold start problem is structural in container-based hosting — containers take time to start, period. The only real fix was getting off containers entirely.

Cloudflare Workers don't have containers. They run in V8 isolates that spin up in under 5ms. That's not a marketing claim — it's the actual p99 we see in production. The cold start problem disappeared the day we deployed.

## What we moved to and why

**D1 for the database.** We went back to SQLite, but the hosted kind. D1 is SQLite that Cloudflare manages — no server, no connection strings, no pool configuration. You bind it to the Worker and call `env.DB.prepare(sql).bind(...).run()`. It just works.

The concern people have with D1 is single-writer. That's real — D1 serializes writes. For Bindly it doesn't matter. Our write volume is low, and D1 handles reads concurrently with no contention. The constraint shaped the schema toward read-heavy patterns, which is the right design for a knowledge platform anyway.

**R2 for content blobs.** Every Version's Markdown lives in R2 at `content/{bindingId}/{versionId}.md`. We deliberately separated content from metadata. D1 stores titles, summaries, key points, previews. R2 stores the full text.

Why does this matter? When you fetch a Binding at Tier 1 or Tier 1.5 — browsing, searching — we never touch R2. All that data is in D1. R2 only comes in at Tier 2, when you actually want the full content. This keeps list operations fast and keeps costs proportional to actual content reads.

**Workers everywhere.** No Node.js. This means no `fs`, no `path`, no `child_process`. When we first hit this it felt limiting. It turned out to be clarifying. The Web Crypto API is actually good. `ReadableStream` works fine. Removing filesystem access forces you to think about what data actually needs to go where.

We use Hono as the framework. It's built for Workers — no compatibility shims, no polyfills, TypeScript types that actually understand the Workers environment.

**Vectorize + Workers AI for search.** We generate embeddings with `@cf/baai/bge-m3` (multilingual, so Korean and English both work) at write time. Results go into Vectorize. Search takes the query, generates an embedding, finds nearest neighbors. No external vector database, no API key for embeddings, no separate billing.

**KV for two things.** The cache namespace (`CACHE`) stores rendered SSR HTML with a 60-second TTL — fallback when the API is down. The landing namespace (`LANDING`) stores the landing page HTML, all help docs as Markdown, `llms.txt`, `ai-plugin.json`. Updating a help page is `wrangler kv put` — no Worker redeploy needed.

## What we gave up

No long-running processes. Workers time out at 30 seconds (CPU limit is lower). No heavy batch jobs inline with a request.

No filesystem. Everything goes through bindings. You can't accidentally depend on local state.

D1 is SQLite. No window functions in older schema versions. `ALTER TABLE` is limited. You have to design the schema right the first time and use additive-only migrations after that.

These aren't dealbreakers. They're constraints that make the architecture simpler. We'd rather have a system that fits in our head than one with unlimited flexibility that requires ops expertise to run.

The single thing that took the most adjustment: accepting that there's no SSH into a server. When something breaks in production, you read logs via `wrangler tail`. You don't poke around in the filesystem. That discipline is actually good for reliability — production environments should be immutable.