Bindly · v2 · View as Markdown
Why we rebuilt Bindly from scratch — the v0 to v1 story

The honest story behind rebuilding Bindly entirely: the specific failures of v0 (Fly.io cold starts, SQLite file locking, a 10,000-line monolith), and why a complete rewrite on Cloudflare was the right call instead of incremental fixes.
v0 ran on Fly.io + Node.js + SQLite file — worked until it collapsed under its own weight
# Why we rebuilt Bindly from scratch

Bindly v0 worked. Until it didn't.

v0 ran on Fly.io, Node.js (Fastify), and SQLite. The data model was a file on disk, managed through a class called `FileStorage` with 30+ methods. It shipped. Users signed up. For a while, that was enough.

Then the problems started compounding.

## The failures, one by one

### Cold starts destroyed the LLM use case

Fly.io autostops idle apps to save resources. An app that hasn't received a request in a few minutes gets put to sleep. The next request wakes it up — 10-30 seconds, sometimes more.

For a normal web app, this is annoying but tolerable. For an MCP server that Claude calls mid-conversation, it's fatal. The user asks Claude something, Claude tries to look up a Binding, and the MCP call times out. Claude tells the user the tool isn't responding. The user assumes Bindly is broken.

We tried tuning Fly.io's autostop settings. We paid for always-on instances. We added health checks to keep containers warm. None of it worked reliably. The cold start problem was structural — containers take time to start, and no amount of configuration changes that.

### SQLite file locking broke concurrent writes

The database was `better-sqlite3` pointing at a file on disk. `better-sqlite3` is synchronous and excellent for single-threaded use. But when two requests tried to write simultaneously, one would get an `EBUSY` error.

The worst part: these errors were silent. A user would create a Binding, the write would fail, and the Binding simply wouldn't be there when they looked for it. No error in the UI — the failure happened at a layer the UI didn't observe correctly.

Fly.io's volume mounts added another layer of fragility. The file lived on a persistent volume that could only be attached to one machine. Multiple instances meant one was always reading stale data.

### A 10,792-line API monolith

By the time we started planning v1, the main API file was 10,792 lines of TypeScript. Everything was in there: authentication, binding CRUD, space management, MCP tool definitions, search, sharing, admin functions. Finding anything required a text search through a 400KB file.

Adding a feature meant reading thousands of lines to understand the context, making the change, and hoping nothing unexpected broke. The test suite was sparse — the monolith was too tangled to test individual parts effectively.

### Five packages for "MCP" infrastructure

The MCP integration lived across five packages: `delta-layer/mcp-remote`, `delta-layer/core`, `delta-layer/storage`, `delta-layer/types`, `delta-layer/gateway-worker`. Each had its own `package.json`, its own tests (or lack thereof), its own build config.

The types package alone was 2,112 lines of deeply nested TypeScript. A `Binding` type had properties with properties with properties — the structure reflected how data was stored (in a filesystem hierarchy) rather than how it was used.

The public content lived on a separate domain (`text.bind.ly`) served by a separate Worker that made HTTP calls to the main API. Every public page load was a cross-domain HTTP call with all its associated latency and failure modes.

## Why incremental fixes wouldn't work

The natural impulse is to fix things one at a time. Swap Fly.io for Railway. Replace SQLite with Postgres. Refactor the monolith into modules.

The problem is that these fixes don't compose. Replacing Fly.io with another container platform still means managing containers. Replacing SQLite with Postgres still means managing a database server and connection pools. The fundamental architecture — Node.js backend running on long-lived servers — was the wrong model for what Bindly needed to be.

What we actually needed:

- An API that starts in milliseconds, not seconds
- A database that's globally distributed without a database server
- Object storage designed for blobs, not bolted on with a filesystem abstraction
- Semantic search without a separate vector database

Cloudflare Workers answered all four. The rewrite wasn't "fixing Bindly" — it was rebuilding it on infrastructure that fit the problem.

## What the rewrite produced

v1 is a complete break from v0. Nothing from the old codebase runs in production:

| Component | v0 | v1 |
| --------- | -- | -- |
| Runtime | Node.js / Fly.io | Cloudflare Workers (V8 isolates) |
| API framework | Fastify | Hono |
| Database | SQLite file (`better-sqlite3`) | Cloudflare D1 |
| Content storage | Filesystem (`FileStorage`) | Cloudflare R2 |
| Public content | Separate domain + Worker | Gateway Worker (same domain) |
| MCP communication | HTTP calls to `BINDLY_API_BASE_URL` | Service Bindings (zero-latency) |
| Domains | 6+ (bindly.app, bind.ly, text.bind.ly, ...) | 3 (bind.ly, bindly.app, mcp.bind.ly) |
| Infrastructure | Fly.io + external services | Cloudflare-only |

The codebase went from 30+ packages to 4 Workers + 1 SPA + 3 shared packages. The 10,792-line monolith became 7 Hono route files averaging ~300 lines each. The 2,112-line types package became ~200 lines of flat, pragmatic interfaces.

## The data reset decision

One question we wrestled with: what about existing users' data?

The answer was straightforward when we thought it through honestly. Bindly v0 had a small number of users — mostly us and a handful of testers. None of us had irreplaceable data in the system. Anything important had been exported or was available elsewhere. A migration path from SQLite files to D1+R2 would have required months of work to preserve data that was mostly test fixtures.

So we reset. v1 starts fresh. No migration scripts, no compatibility shims, no transition period.

## What the rewrite validated

The technical decisions held up better than we expected. Workers cold starts are under 5ms — the Fly.io problem is simply gone. D1 writes are fast and concurrent reads scale automatically. R2 costs are negligible. Service Bindings between Workers are effectively free.

The architecture that seemed like over-engineering on paper — four Workers, separate service bindings, explicit routing — turned out to be exactly right. Each Worker deploys independently. A bug in the MCP Worker doesn't touch the Gateway. A routing change in the Gateway doesn't require an API deployment. A deploy takes 5 seconds and the rest of the system keeps running.

For a solo developer shipping frequently, that blast-radius isolation matters more than almost anything else.