Skip to main content

Schema Migrations Runbook

db/migrations/ is intentionally empty after the launch-cutover baseline reset. The migration runner and CI gate remain in place so future live-schema deltas can be added back cleanly; when there are no V*.sql files, the runner no-ops.

Cloud Spanner (PostgreSQL dialect) schema changes for the gateway flow through scripts/migrations/run.sh, gated by the CI schema-compat check and executed automatically by deploy.yml before any Cloud Run revision is promoted. This page covers the day-to-day authoring flow and the expand/contract pattern for unsafe changes.

TL;DR

  1. If a live post-reset delta is needed, add a new file under db/migrations/.
  2. Use PostgreSQL-dialect DDL; prefer additive changes.
  3. Run scripts/migrations/run.sh --dry-run against staging.
  4. Open the PR — CI runs scripts/migrations/check_schema_compat.sh which blocks rolling-unsafe patterns unless opted out.
  5. Merge + release. deploy.yml runs the migration before flipping Cloud Run traffic; migration failure aborts the deploy.

Tracking table

When versioned files exist, the runner records every applied migration in schema_migrations:

columnpurpose
idmigration filename, primary key
checksumsha256 of the file body at apply time
applied_atserver-side CURRENT_TIMESTAMP
applied_byvalue of MIGRATION_ACTOR (or $USER / ci)

Reruns are idempotent — already-applied files are skipped. If a file's on-disk checksum drifts from the recorded one, the runner refuses to continue: applied migrations are immutable. Revert the file or add a new migration on top.

Authoring guidelines

  • Additive first. New tables, columns, indexes are safe to apply before any code change ships, and safe to leave in place during a rollback.
  • No backfill in the same migration. Large UPDATE statements should run as a separate step outside the deploy hot path.
  • PostgreSQL-dialect, Spanner-compatible. Avoid unsupported constructs (sequences, stored procedures). When in doubt, cross-check against the .sq files under libs/schema/src/main/sqldelight/.
  • Name files with a monotonic version prefix: V011__add_foo.sql.

The CI schema-compat check

scripts/migrations/check_schema_compat.sh runs in CI (see .github/workflows/ci.yml) and rejects patterns that break rolling deploys where old and new Cloud Run revisions serve traffic simultaneously:

  • DROP TABLE / DROP COLUMN / DROP INDEX
  • ALTER COLUMN ... TYPE ...
  • ALTER COLUMN ... SET NOT NULL on an existing column
  • RENAME COLUMN / RENAME TABLE

Each such change breaks either the old code (column vanished) or the new code (write fails NOT NULL). Use the expand/contract pattern instead.

Expand/contract pattern

Breaking schema changes ship over multiple releases:

  1. Expand. Add the new shape alongside the old one. New column, new table, dual-write from the application. Deploy.
  2. Migrate. Backfill the new shape from the old. Read path learns to prefer the new shape; old shape becomes a fallback.
  3. Flip. Readers stop touching the old shape entirely. Deploy. Wait at least one Cloud Run revision lifecycle in production so no old revisions are still scheduling work against it.
  4. Contract. Drop the old shape in a new migration. Because no live revision references it, the drop is now safe — add the -- migration: unsafe-ok marker to bypass the schema-compat gate.

Opting a migration out of the gate

When a migration is intentionally unsafe (usually the contract step above), add one of the following lines anywhere in the file:

-- migration: unsafe-ok

The PR description must justify the opt-out and link to the expand-phase migration it is contracting.

Running locally

# Default: production instance/database (requires appropriate IAM)
scripts/migrations/run.sh --dry-run

# Staging
scripts/migrations/run.sh \
--project=pinpoint-gateway \
--instance=gateway-spanner \
--database=gateway-db-staging \
--dry-run

# Apply
scripts/migrations/run.sh

If no V*.sql files exist, the script exits successfully without applying anything. If versioned files do exist, it prints the pending set, applies them in lexical order, and records each one in schema_migrations.

CI/CD integration

deploy.yml runs the migration runner before any Cloud Run revision is promoted:

  • Staging: runs on every push to main that touches services, targeting gateway-db-staging.
  • Production: runs on every release event, targeting the production gateway-db. Migration failure fails the job and the subsequent deploy-production job does not start.

The runner uses the same WIF service account as the deploy, so no long-lived Spanner credentials are stored in GitHub.

Incident response

If a migration fails mid-deploy:

  1. The deploy job exits non-zero; Cloud Run traffic is not flipped.
  2. Check the runner log for the failing statement.
  3. If the failure is partial (some DDL landed, some didn't):
    • Inspect schema_migrations in the target DB to see what's recorded.
    • The runner only inserts the schema_migrations row after the file applies cleanly, so a failed file will not be marked applied. Fix the SQL, push a corrected migration (or a follow-up migration), and re-run the deploy.
  4. For data loss scenarios, consult Spanner Disaster Recovery.