Bindly · v2 · View as Markdown
D1 schema design decisions — tradeoffs for Cloudflare Workers

The design decisions behind Bindly's D1 (SQLite on Cloudflare) schema: why certain tables are structured the way they are, what tradeoffs were made for Workers compatibility, and patterns for handling D1's SQLite constraints.
D1 uses SQLite dialect — no window functions (older D1), limited ALTER TABLE, INTEGER timestamps
# D1 schema design decisions — tradeoffs for Cloudflare Workers

D1 is SQLite hosted by Cloudflare. For most of us, this means familiar SQL syntax with a few constraints that change how you design things. Some of these constraints pushed us toward better designs than we would have made otherwise. Some just required workarounds.

Here's what's non-obvious about the schema and why it ended up the way it did.

## The full schema at a glance

```sql
-- Auth
users (id, email, name, image, username, avatar_type, avatar_value,
       password_hash, email_verified, role, created_at, updated_at)
accounts (id, user_id, provider, provider_account_id, access_token, ...)
sessions (id, user_id, expires_at, created_at)
email_verifications (id, user_id, token, type, expires_at, created_at)
oauth_codes (id, user_id, client_id, code, code_challenge, expires_at, ...)

-- Spaces
spaces (id, name, slug, type, is_public, is_verified, description,
        owner_id, binding_count, created_at, updated_at)
space_members (space_id, user_id, role, joined_at)
space_invitations (id, space_id, invited_email, token, role,
                   expires_at, created_by, created_at)

-- Keys
user_keys (id, user_id, key_hash, name, last_used_at, created_at)
space_api_keys (id, space_id, key_hash, name, permissions, created_at)

-- Bindings
bindings (id, space_id, slug, name, source_type, source_url,
          current_version, created_by, created_at, updated_at)
versions (id, binding_id, version, title, hash, previous_hash,
          size, content_token_count, content_key, created_at)
version_metas (binding_id, version, summary, summary_token_count,
               content_preview, key_points, entities, source, model)

-- Sets
sets (id, space_id, slug, name, description, version_count,
      total_token_count, created_by, created_at, updated_at)
set_versions (set_id, version_id, position, added_at)

-- Labels
labels (id, space_id, name, color, created_at)
label_bindings (label_id, binding_id)

-- Shares
shares (id, target_type, target_id, created_by, expires_at, is_expired, created_at)

-- Comments
comments (id, target_type, target_id, content, source, model, author_id, created_at)

-- Admin
admin_users (id, email, password_hash, name, created_at, updated_at)
admin_audit_log (id, admin_id, action, target_type, target_id, metadata, created_at)
```

## Decision: INTEGER timestamps, not DATETIME

SQLite has a `DATETIME` type, but it's stored as text (ISO 8601) or a real number (Julian day). We use `INTEGER` Unix timestamps throughout:

```sql
created_at INTEGER NOT NULL,  -- Unix milliseconds
updated_at INTEGER NOT NULL,
expires_at INTEGER,           -- NULL = no expiry
```

The benefits are real: sortable with `ORDER BY created_at DESC`, comparable with `WHERE expires_at < ?` using `Date.now()`, no timezone ambiguity, smaller storage footprint.

The tradeoff: values look like `1744000000000` in D1's console. Slightly harder to read when debugging. We decided it was worth it. We've never wished we had ISO strings in the DB.

## Decision: binding_count cached on spaces

```sql
spaces (
  ...
  binding_count INTEGER DEFAULT 0,  -- denormalized count
  ...
)
```

The "real" count is `SELECT COUNT(*) FROM bindings WHERE space_id = ?`. We could query this on every space list request. With hundreds of spaces on a screen, that's hundreds of COUNT queries.

Instead, we update `binding_count` atomically on every create/delete:

```typescript
// On binding create:
await db.prepare('UPDATE spaces SET binding_count = binding_count + 1 WHERE id = ?')
  .bind(spaceId).run()

// On binding delete:
await db.prepare('UPDATE spaces SET binding_count = binding_count - 1 WHERE id = ?')
  .bind(spaceId).run()
```

D1's single-writer model means these increments are serialized — no race conditions. The counter can drift if there are bugs (crash after insert, before increment), which is why the Cron Trigger periodically recomputes it:

```sql
UPDATE spaces SET binding_count = (
  SELECT COUNT(*) FROM bindings WHERE bindings.space_id = spaces.id
)
```

`sets.version_count` and `sets.total_token_count` use the same pattern.

## Decision: version_metas as a separate table

`version_metas` is 1:1 with `versions`. Every version has exactly one meta row. Why not add the columns to `versions`?

```sql
-- versions: performance-critical metadata
versions (id, binding_id, version, hash, content_key, content_token_count, ...)

-- version_metas: rich metadata, separate table
version_metas (binding_id, version, summary, key_points, entities, content_preview, ...)
```

Two reasons this matters:

**Tier 1 queries don't need meta columns.** A Tier 1 request returns `id, name, slug, version, contentTokenCount, updatedAt` from `bindings + versions`. No summary, no key points, no preview needed. A JOIN to `version_metas` would load those columns anyway (SQLite doesn't have column-level projections that avoid reading the data). Separate tables means separate queries.

**`content_preview` is large.** It stores 500 characters of content text. Over thousands of versions, storing this in the main `versions` table would bulk up a frequently-scanned table. In a separate table, it's only loaded when requested.

## Decision: set_versions without a versions FK

```sql
CREATE TABLE set_versions (
  set_id TEXT NOT NULL REFERENCES sets(id) ON DELETE CASCADE,
  version_id TEXT NOT NULL,   -- NO foreign key to versions
  position INTEGER NOT NULL,
  added_at INTEGER NOT NULL,
  PRIMARY KEY(set_id, version_id)
);
```

The missing FK is intentional. If `version_id` had `REFERENCES versions(id) ON DELETE CASCADE`, deleting a Version would silently remove it from every Set that referenced it.

We treat Sets like playlists. If a song is removed from a streaming service, your playlist shows it as "unavailable" — it doesn't disappear. Sets work the same way.

When fetching a Set, we LEFT JOIN:

```sql
SELECT
  sv.set_id,
  sv.version_id,
  sv.position,
  v.id as v_id,
  v.version as version_number,
  b.name as binding_name
FROM set_versions sv
LEFT JOIN versions v ON sv.version_id = v.id
LEFT JOIN bindings b ON v.binding_id = b.id
WHERE sv.set_id = ?
ORDER BY sv.position
```

When `v.id IS NULL`, that Version has been deleted. The API returns:

```json
{
  "versionId": "ver_abc123",
  "status": "deleted",
  "position": 3
}
```

The Set keeps its shape. The gap is visible. Users know something was there.

## Decision: shares with is_expired flag

```sql
shares (
  id TEXT PRIMARY KEY,
  target_type TEXT NOT NULL,    -- 'binding' | 'set'
  target_id TEXT NOT NULL,
  expires_at INTEGER,           -- NULL = never expires
  is_expired INTEGER DEFAULT 0, -- 1 = manually revoked or TTL expired
  created_at INTEGER NOT NULL
)
```

When a share expires, `is_expired` is set to 1 but the row is kept. The Gateway Worker returns **410 Gone** (not 404) for expired share links. 410 signals "this existed, it's been permanently removed" — better UX than "doesn't exist." Keeping the row allows the 410 response without storing the share ID elsewhere.

Expired share IDs can be reused: a new share with the same ID updates `is_expired = 0` and sets a new `expires_at`.

## Decision: Drizzle with SQL migration files

We use Drizzle ORM for schema definition:

```typescript
// v1/shared/db/schema.ts
export const bindings = sqliteTable('bindings', {
  id: text('id').primaryKey(),
  spaceId: text('space_id').notNull().references(() => spaces.id),
  slug: text('slug'),
  name: text('name').notNull(),
  currentVersion: integer('current_version').default(0),
  createdAt: integer('created_at').notNull(),
  updatedAt: integer('updated_at').notNull(),
})
```

But migrations are generated as SQL files and reviewed before application:

```bash
# Generate migration SQL
npx drizzle-kit generate

# Review the generated SQL
cat drizzle/0008_add_invitations.sql

# Apply to production
npx wrangler d1 migrations apply bindly-db --remote
```

The review step matters. We've caught Drizzle generating unexpected destructive SQL when column names were similar. Seeing the SQL before running it takes ten seconds and has saved us from at least one bad migration.

## Migration discipline: additive-only in production

| Operation | Safety |
| --------- | ------ |
| `ADD COLUMN` with `DEFAULT` or nullable | Safe |
| `CREATE TABLE` | Safe |
| `CREATE INDEX` | Safe |
| `DROP COLUMN` | Unsafe without deploy planning |
| `ALTER COLUMN` type changes | Never (SQLite doesn't support them anyway) |

The deploy order:

1. Run D1 migration (add new column/table)
2. Deploy Worker (code that uses the new column/table)
3. Later, separate migration: remove old column if needed

This means the Worker never tries to use a column that doesn't exist, and old columns are only dropped after we're certain nothing references them.

## What SQLite constraints mean for Bindly

| Constraint | Impact | Our approach |
| ---------- | ------ | ------------ |
| No `ALTER COLUMN` | Can't change column type | Design types correctly upfront |
| `INTEGER` is flexible | Stores any integer size | Used for booleans (0/1), timestamps, counts |
| No `BOOLEAN` type | Use INTEGER 0/1 | `is_public`, `is_verified`, `is_expired` |
| No `ARRAY` type | Use JSON strings | `key_points TEXT` stores `'["point1","point2"]'` |
| Single writer | No concurrent writes | Atomic increments, batch reads |
| `ON DELETE CASCADE` | Supported | Used for space → bindings → versions |

JSON storage for arrays (`key_points`, `entities`, `permissions`) is parsed in application code:

```typescript
const keyPoints = JSON.parse(row.key_points ?? '[]') as string[]
```

Not elegant, but reliable. SQLite's `json_extract()` can query JSON columns if needed — we've used it for permission checks on space API keys without having to deserialize in application code.