D1 schema design decisions — tradeoffs for Cloudflare Workers

The design decisions behind Bindly's D1 (SQLite on Cloudflare) schema: why certain tables are structured the way they are, what tradeoffs were made for Workers compatibility, and patterns for handling D1's SQLite constraints.


# D1 schema design decisions — tradeoffs for Cloudflare Workers D1 is SQLite hosted by Cloudflare. For most of us, this means familiar SQL syntax with a few constraints that change how you design things. Some of these constraints pushed us toward better designs than we would have made otherwise. Some just required workarounds. Here's what's non-obvious about the schema and why it ended up the way it did. ## The full schema at a glance ```sql -- Auth users (id, email, name, image, username, avatar_type, avatar_value, password_hash, email_verified, role, created_at, updated_at) accounts (id, user_id, provider, provider_account_id, access_token, ...) sessions (id, user_id, expires_at, created_at) email_verifications (id, user_id, token, type, expires_at, created_at) oauth_codes (id, user_id, client_id, code, code_challenge, expires_at, ...) -- Spaces spaces (id, name, slug, type, is_public, is_verified, description, owner_id, binding_count, created_at, updated_at) space_members (space_id, user_id, role, joined_at) space_invitations (id, space_id, invited_email, token, role, expires_at, created_by, created_at) -- Keys user_keys (id, user_id, key_hash, name, last_used_at, created_at) space_api_keys (id, space_id, key_hash, name, permissions, created_at) -- Bindings bindings (id, space_id, slug, name, source_type, source_url, current_version, created_by, created_at, updated_at) versions (id, binding_id, version, title, hash, previous_hash, size, content_token_count, content_key, created_at) version_metas (binding_id, version, summary, summary_token_count, content_preview, key_points, entities, source, model) -- Sets sets (id, space_id, slug, name, description, version_count, total_token_count, created_by, created_at, updated_at) set_versions (set_id, version_id, position, added_at) -- Labels labels (id, space_id, name, color, created_at) label_bindings (label_id, binding_id) -- Shares shares (id, target_type, target_id, created_by, expires_at, is_expired, created_at) -- Comments comments (id, target_type, target_id, content, source, model, author_id, created_at) -- Admin admin_users (id, email, password_hash, name, created_at, updated_at) admin_audit_log (id, admin_id, action, target_type, target_id, metadata, created_at) ``` ## Decision: INTEGER timestamps, not DATETIME SQLite has a `DATETIME` type, but it's stored as text (ISO 8601) or a real number (Julian day). We use `INTEGER` Unix timestamps throughout: ```sql created_at INTEGER NOT NULL, -- Unix milliseconds updated_at INTEGER NOT NULL, expires_at INTEGER, -- NULL = no expiry ``` The benefits are real: sortable with `ORDER BY created_at DESC`, comparable with `WHERE expires_at < ?` using `Date.now()`, no timezone ambiguity, smaller storage footprint. The tradeoff: values look like `1744000000000` in D1's console. Slightly harder to read when debugging. We decided it was worth it. We've never wished we had ISO strings in the DB. ## Decision: binding_count cached on spaces ```sql spaces ( ... binding_count INTEGER DEFAULT 0, -- denormalized count ... ) ``` The "real" count is `SELECT COUNT(*) FROM bindings WHERE space_id = ?`. We could query this on every space list request. With hundreds of spaces on a screen, that's hundreds of COUNT queries. Instead, we update `binding_count` atomically on every create/delete: ```typescript // On binding create: await db.prepare('UPDATE spaces SET binding_count = binding_count + 1 WHERE id = ?') .bind(spaceId).run() // On binding delete: await db.prepare('UPDATE spaces SET binding_count = binding_count - 1 WHERE id = ?') .bind(spaceId).run() ``` D1's single-writer model means these increments are serialized — no race conditions. The counter can drift if there are bugs (crash after insert, before increment), which is why the Cron Trigger periodically recomputes it: ```sql UPDATE spaces SET binding_count = ( SELECT COUNT(*) FROM bindings WHERE bindings.space_id = spaces.id ) ``` `sets.version_count` and `sets.total_token_count` use the same pattern. ## Decision: version_metas as a separate table `version_metas` is 1:1 with `versions`. Every version has exactly one meta row. Why not add the columns to `versions`? ```sql -- versions: performance-critical metadata versions (id, binding_id, version, hash, content_key, content_token_count, ...) -- version_metas: rich metadata, separate table version_metas (binding_id, version, summary, key_points, entities, content_preview, ...) ``` Two reasons this matters: **Tier 1 queries don't need meta columns.** A Tier 1 request returns `id, name, slug, version, contentTokenCount, updatedAt` from `bindings + versions`. No summary, no key points, no preview needed. A JOIN to `version_metas` would load those columns anyway (SQLite doesn't have column-level projections that avoid reading the data). Separate tables means separate queries. **`content_preview` is large.** It stores 500 characters of content text. Over thousands of versions, storing this in the main `versions` table would bulk up a frequently-scanned table. In a separate table, it's only loaded when requested. ## Decision: set_versions without a versions FK ```sql CREATE TABLE set_versions ( set_id TEXT NOT NULL REFERENCES sets(id) ON DELETE CASCADE, version_id TEXT NOT NULL, -- NO foreign key to versions position INTEGER NOT NULL, added_at INTEGER NOT NULL, PRIMARY KEY(set_id, version_id) ); ``` The missing FK is intentional. If `version_id` had `REFERENCES versions(id) ON DELETE CASCADE`, deleting a Version would silently remove it from every Set that referenced it. We treat Sets like playlists. If a song is removed from a streaming service, your playlist shows it as "unavailable" — it doesn't disappear. Sets work the same way. When fetching a Set, we LEFT JOIN: ```sql SELECT sv.set_id, sv.version_id, sv.position, v.id as v_id, v.version as version_number, b.name as binding_name FROM set_versions sv LEFT JOIN versions v ON sv.version_id = v.id LEFT JOIN bindings b ON v.binding_id = b.id WHERE sv.set_id = ? ORDER BY sv.position ``` When `v.id IS NULL`, that Version has been deleted. The API returns: ```json { "versionId": "ver_abc123", "status": "deleted", "position": 3 } ``` The Set keeps its shape. The gap is visible. Users know something was there. ## Decision: shares with is_expired flag ```sql shares ( id TEXT PRIMARY KEY, target_type TEXT NOT NULL, -- 'binding' | 'set' target_id TEXT NOT NULL, expires_at INTEGER, -- NULL = never expires is_expired INTEGER DEFAULT 0, -- 1 = manually revoked or TTL expired created_at INTEGER NOT NULL ) ``` When a share expires, `is_expired` is set to 1 but the row is kept. The Gateway Worker returns **410 Gone** (not 404) for expired share links. 410 signals "this existed, it's been permanently removed" — better UX than "doesn't exist." Keeping the row allows the 410 response without storing the share ID elsewhere. Expired share IDs can be reused: a new share with the same ID updates `is_expired = 0` and sets a new `expires_at`. ## Decision: Drizzle with SQL migration files We use Drizzle ORM for schema definition: ```typescript // v1/shared/db/schema.ts export const bindings = sqliteTable('bindings', { id: text('id').primaryKey(), spaceId: text('space_id').notNull().references(() => spaces.id), slug: text('slug'), name: text('name').notNull(), currentVersion: integer('current_version').default(0), createdAt: integer('created_at').notNull(), updatedAt: integer('updated_at').notNull(), }) ``` But migrations are generated as SQL files and reviewed before application: ```bash # Generate migration SQL npx drizzle-kit generate # Review the generated SQL cat drizzle/0008_add_invitations.sql # Apply to production npx wrangler d1 migrations apply bindly-db --remote ``` The review step matters. We've caught Drizzle generating unexpected destructive SQL when column names were similar. Seeing the SQL before running it takes ten seconds and has saved us from at least one bad migration. ## Migration discipline: additive-only in production | Operation | Safety | | --------- | ------ | | `ADD COLUMN` with `DEFAULT` or nullable | Safe | | `CREATE TABLE` | Safe | | `CREATE INDEX` | Safe | | `DROP COLUMN` | Unsafe without deploy planning | | `ALTER COLUMN` type changes | Never (SQLite doesn't support them anyway) | The deploy order: 1. Run D1 migration (add new column/table) 2. Deploy Worker (code that uses the new column/table) 3. Later, separate migration: remove old column if needed This means the Worker never tries to use a column that doesn't exist, and old columns are only dropped after we're certain nothing references them. ## What SQLite constraints mean for Bindly | Constraint | Impact | Our approach | | ---------- | ------ | ------------ | | No `ALTER COLUMN` | Can't change column type | Design types correctly upfront | | `INTEGER` is flexible | Stores any integer size | Used for booleans (0/1), timestamps, counts | | No `BOOLEAN` type | Use INTEGER 0/1 | `is_public`, `is_verified`, `is_expired` | | No `ARRAY` type | Use JSON strings | `key_points TEXT` stores `'["point1","point2"]'` | | Single writer | No concurrent writes | Atomic increments, batch reads | | `ON DELETE CASCADE` | Supported | Used for space → bindings → versions | JSON storage for arrays (`key_points`, `entities`, `permissions`) is parsed in application code: ```typescript const keyPoints = JSON.parse(row.key_points ?? '[]') as string[] ``` Not elegant, but reliable. SQLite's `json_extract()` can query JSON columns if needed — we've used it for permission checks on space API keys without having to deserialize in application code.