Schema Migrations Runbook
db/migrations/ is intentionally empty after the launch-cutover baseline
reset. The migration runner and CI gate remain in place so future live-schema
deltas can be added back cleanly; when there are no V*.sql files, the runner
no-ops.
Cloud Spanner (PostgreSQL dialect) schema changes for the gateway flow
through scripts/migrations/run.sh, gated by the CI schema-compat check
and executed automatically by deploy.yml before any Cloud Run revision
is promoted. This page covers the day-to-day authoring flow and the
expand/contract pattern for unsafe changes.
TL;DR
- If a live post-reset delta is needed, add a new file under
db/migrations/. - Use PostgreSQL-dialect DDL; prefer additive changes.
- Run
scripts/migrations/run.sh --dry-runagainst staging. - Open the PR — CI runs
scripts/migrations/check_schema_compat.shwhich blocks rolling-unsafe patterns unless opted out. - Merge + release.
deploy.ymlruns the migration before flipping Cloud Run traffic; migration failure aborts the deploy.
Tracking table
When versioned files exist, the runner records every applied migration in
schema_migrations:
| column | purpose |
|---|---|
id | migration filename, primary key |
checksum | sha256 of the file body at apply time |
applied_at | server-side CURRENT_TIMESTAMP |
applied_by | value of MIGRATION_ACTOR (or $USER / ci) |
Reruns are idempotent — already-applied files are skipped. If a file's on-disk checksum drifts from the recorded one, the runner refuses to continue: applied migrations are immutable. Revert the file or add a new migration on top.
Authoring guidelines
- Additive first. New tables, columns, indexes are safe to apply before any code change ships, and safe to leave in place during a rollback.
- No backfill in the same migration. Large
UPDATEstatements should run as a separate step outside the deploy hot path. - PostgreSQL-dialect, Spanner-compatible. Avoid unsupported
constructs (sequences, stored procedures). When in doubt, cross-check
against the
.sqfiles underlibs/schema/src/main/sqldelight/. - Name files with a monotonic version prefix:
V011__add_foo.sql.
The CI schema-compat check
scripts/migrations/check_schema_compat.sh runs in CI (see
.github/workflows/ci.yml) and rejects patterns that break rolling
deploys where old and new Cloud Run revisions serve traffic
simultaneously:
DROP TABLE/DROP COLUMN/DROP INDEXALTER COLUMN ... TYPE ...ALTER COLUMN ... SET NOT NULLon an existing columnRENAME COLUMN/RENAME TABLE
Each such change breaks either the old code (column vanished) or the new code (write fails NOT NULL). Use the expand/contract pattern instead.
Expand/contract pattern
Breaking schema changes ship over multiple releases:
- Expand. Add the new shape alongside the old one. New column, new table, dual-write from the application. Deploy.
- Migrate. Backfill the new shape from the old. Read path learns to prefer the new shape; old shape becomes a fallback.
- Flip. Readers stop touching the old shape entirely. Deploy. Wait at least one Cloud Run revision lifecycle in production so no old revisions are still scheduling work against it.
- Contract. Drop the old shape in a new migration. Because no
live revision references it, the drop is now safe — add the
-- migration: unsafe-okmarker to bypass the schema-compat gate.
Opting a migration out of the gate
When a migration is intentionally unsafe (usually the contract step above), add one of the following lines anywhere in the file:
-- migration: unsafe-ok
The PR description must justify the opt-out and link to the expand-phase migration it is contracting.
Running locally
# Default: production instance/database (requires appropriate IAM)
scripts/migrations/run.sh --dry-run
# Staging
scripts/migrations/run.sh \
--project=pinpoint-gateway \
--instance=gateway-spanner \
--database=gateway-db-staging \
--dry-run
# Apply
scripts/migrations/run.sh
If no V*.sql files exist, the script exits successfully without applying
anything. If versioned files do exist, it prints the pending set, applies them
in lexical order, and records each one in schema_migrations.
CI/CD integration
deploy.yml runs the migration runner before any Cloud Run
revision is promoted:
- Staging: runs on every push to
mainthat touches services, targetinggateway-db-staging. - Production: runs on every
releaseevent, targeting the productiongateway-db. Migration failure fails the job and the subsequentdeploy-productionjob does not start.
The runner uses the same WIF service account as the deploy, so no long-lived Spanner credentials are stored in GitHub.
Incident response
If a migration fails mid-deploy:
- The deploy job exits non-zero; Cloud Run traffic is not flipped.
- Check the runner log for the failing statement.
- If the failure is partial (some DDL landed, some didn't):
- Inspect
schema_migrationsin the target DB to see what's recorded. - The runner only inserts the
schema_migrationsrow after the file applies cleanly, so a failed file will not be marked applied. Fix the SQL, push a corrected migration (or a follow-up migration), and re-run the deploy.
- Inspect
- For data loss scenarios, consult Spanner Disaster Recovery.