feat(cli, migrate): add stash encrypt commands + @cipherstash/migrate#357
feat(cli, migrate): add stash encrypt commands + @cipherstash/migrate#357
stash encrypt commands + @cipherstash/migrate#357Conversation
Adds first-class support for migrating existing plaintext columns to
`eql_v2_encrypted` in production databases — the flow that currently has
no good answer in either Stack or Proxy land.
Per-column lifecycle:
schema-added → dual-writing → backfilling → backfilled → cut-over → dropped
State lives in three layers so Proxy interop stays clean:
- `.cipherstash/migrations.json` — repo-side intent (indexes, target phase)
- `eql_v2_configuration` — EQL intent, unchanged; Proxy reads as before
- `cipherstash.cs_migrations` — NEW append-only event log for per-column
runtime state (phase, backfill cursor, rows processed). Installed by
`stash db install`. Designed to upstream into EQL as `eql_v2_migrations`
in a later release so Stack and Proxy own it jointly.
New CLI commands under `stash encrypt`:
- status per-column table: phase, EQL state, indexes, progress, drift
- plan diff intent vs observed
- advance record a phase transition (dual-writing is user-declared)
- backfill chunked, resumable, idempotent; txn-per-chunk with checkpoint;
SIGINT-safe; uses user's encryption client via jiti dynamic
import; auto-detects single-column PK
- cutover `eql_v2.rename_encrypted_columns()` in a txn; optional Proxy
refresh via CIPHERSTASH_PROXY_URL
- drop generates a DROP COLUMN <col>_plaintext migration file
New package `@cipherstash/migrate` exposes the same primitives as a library
(`runBackfill`, `appendEvent`, `progress`, `renameEncryptedColumns`, …) so
users can embed backfill in their own workers/cron without the CLI process.
Design doc: docs/plans/encryption-migrations.md
Manual e2e script: packages/cli/scripts/e2e-encrypt.sh
Phase 1 scope: Protect/Stack client-side backfill. Proxy-mode backfill
(SQL-through-Proxy using the same cs_migrations state) is Phase 2.
🦋 Changeset detectedLatest commit: 700009a The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Expand TypeDoc across the @cipherstash/migrate public API and the stash
encrypt command option interfaces. No behaviour change — docs only.
Highlights:
- BackfillOptions: each field now explains the three separate name
spaces (physical table/column vs. schema column key) and common
defaults (chunkSize = 1000, encryptedColumn = <col>_encrypted).
- BackfillCommandOptions: CLI flag semantics with an example of when
schemaColumnKey needs to differ from column.
- MigrationEvent / MigrationPhase: describes the event-vs-phase
mapping and the backfill_started/backfill_checkpoint distinction.
- EQL wrappers: explain that renameEncryptedColumns is the cut-over
primitive, and that reloadConfig must run through Proxy.
- installMigrationsSchema: documents why cs_migrations is kept
separate from eql_v2_configuration (CHECK constraint, global
state enum, write-frequency mismatch).
- Manifest: field-level documentation of cast_as values, index kinds,
and how targetPhase interacts with advance/plan/drop.
- Module-level @packageDocumentation in src/index.ts for TypeDoc's
package overview.
…stgres
Adds packages/migrate/src/__tests__/backfill.integration.test.ts —
gated on PG_TEST_URL so it skips in CI without a Postgres available.
Covers the full backfill state machine against a real transactional
Postgres using a stub encryption client (no CipherStash credentials
required):
- happy-path completion + correct terminal state event
- idempotency on re-run (row-level hash unchanged; zero new writes)
- resume from checkpoint after mid-run AbortSignal
- error event recorded + exception rethrown on encrypt failure
- pre-encrypted rows preserved (the `encrypted IS NULL` guard)
- empty-table fast path
- event log ordering (backfill_started → checkpoint* → backfilled)
- latestByColumn / progress readbacks
Run locally:
cd local && docker compose up -d
PG_TEST_URL=postgres://cipherstash:password@localhost:5432/cipherstash \\
pnpm -F @cipherstash/migrate test backfill.integration
…ation `stash db install --drizzle` now appends the cipherstash.cs_migrations schema DDL to the generated EQL migration file, so `drizzle-kit migrate` rolls the tracking table out to every environment alongside EQL itself. Before this change the drizzle path only wrote EQL SQL; the cs_migrations schema was installed directly against the connected DB (in the non-drizzle branch) and never appeared in migration history. That meant prod deploys running from drizzle migrations alone got EQL but no cs_migrations, and `stash encrypt ...` would fail with "schema cipherstash does not exist" until someone ran an out-of-band install. Also exports MIGRATIONS_SCHEMA_SQL from @cipherstash/migrate so other consumers can embed the DDL in their own migration pipelines.
…orts
loadEncryptionContext used to require the user's encryption client file
to export an EncryptedTable-shaped object (tableName + build()). Users
following the drizzle pattern typically only export the pgTable and the
initialised client, leaving the extractEncryptionSchema(...) result as
a non-exported const — which the loader couldn't see. Backfill would
then fail with "Table X was not found in the encryption client exports.
Available: (none)".
Now the loader does a second pass over module exports, detects drizzle
pgTables via Symbol.for('drizzle:Name'), dynamic-imports
@cipherstash/stack/drizzle, and calls extractEncryptionSchema() on each
to derive the EncryptedTable on the fly. Silently no-ops if the drizzle
subpath isn't installed (Supabase / generic projects are unaffected).
Manually-exported EncryptedTables still win over auto-derived ones
(the set-if-absent check preserves the explicit export).
Two correctness bugs in the backfill path, diagnosed from a real run
that wrote plaintext values through to the encrypted column:
1) The CLI defaulted `schemaColumnKey` to the plaintext column name
(`--column`). But under the drizzle convention the EncryptedTable's
column keys are the *encrypted* column names — because that's what
the user declared via `encryptedType('foo_encrypted', ...)`. With
the wrong key, `bulkEncryptModels` saw a model key that didn't
match any configured encrypted column and returned the models
unchanged. The runner then wrote the plaintext into the encrypted
column, which Postgres rendered as `(82.60)`-shaped composite values
because `eql_v2_encrypted` is a composite type. Default now uses
the encrypted column name.
2) Added a leak guard inside runBackfill: after bulkEncryptModels
returns, inspect `data[0][schemaColumnKey]`. Real ciphertext is
always an object (the EQL envelope with c/k/v fields); if we see
a primitive, throw with an actionable message that names the key
the schema should use. Prevents any future schema/key mismatch
from silently corrupting data — it fails loudly on the first chunk
before any write commits.
Updated the TypeDoc on BackfillOptions to make the two conventions
(drizzle-extracted vs handwritten encryptedTable) explicit.
… leak guard
Replace the hand-rolled object-shape check in runBackfill with the
canonical isEncryptedPayload helper already exported by @cipherstash/stack.
The helper checks for the actual EQL envelope shape (v, i, and either
c or sv) rather than just `typeof === 'object'`, so it also catches
non-null objects that happen to lack ciphertext fields.
Also validates every row in the returned chunk (not just the first)
and reports the offending primary key in the error message so a user
hitting a partial failure knows which row to look at.
Integration test stubs updated to return valid-shaped payloads
({v, i, c}) so they still exercise the write path under the new guard.
…ryption
pg's node driver returns `numeric` as a JS string (to preserve
precision), but an EncryptedTable schema declaring `dataType('number')`
expects a JS number — so bulkEncryptModels errored out with "Cannot
convert String to Float. String values can only be used with Utf8Str".
Fix is split across both packages:
- @cipherstash/migrate: new optional `transformPlaintext` callback on
BackfillOptions. Invoked on each row's plaintext before it goes into
the model passed to bulkEncryptModels. Library stays generic; does
not know anything about schemas.
- @cipherstash/cli: new `buildPlaintextCoercer` inspects
`tableSchema.build().columns[schemaColumnKey].cast_as` and returns
an appropriate coercer:
number / double / real / int / decimal → Number(string)
bigint / big_int → BigInt(string)
date / timestamp → new Date(string)
boolean → "true"/"false" → boolean
string / text / json / jsonb / unknown → identity
Null and undefined are always passed through unchanged.
The backfill "Backfilling x.y → y_enc" log line now also prints the
schema's cast_as value so a user diagnosing a type-coercion issue can
see immediately whether the coercer is reading the right dataType from
the EncryptedTable (vs. falling through to identity).
Refactored buildPlaintextCoercer to return { transform, castAs } so
the caller can log the detected value; behaviour unchanged.
… by protect-ffi Investigation into "Cannot convert String to Date" for a column with cast_as: 'date' turned up a genuine protect-ffi 0.21.2 limitation: its JsPlaintext wire enum has only String/Number/Boolean/JsonB variants — no JS Date representation. napi-rs serialises JS Date to ISO string via Date.toJSON, and the Rust side then refuses it because string values are only valid for Utf8Str columns. The Rust-internal NaiveDate / Timestamp types exist but have no JS-visible wire format. Not a tool bug; not fixable here. But running a backfill that will inevitably fail on the first chunk is a poor UX. Add a pre-flight check: if the schema declares cast_as 'date' or 'timestamp', print a warning explaining the FFI limitation and the mitigation (change to dataType: 'string' / ISO strings) and prompt before continuing. Accepts --yes-style confirmation via the standard clack confirm UI.
Summary
Adds first-class support for migrating existing plaintext columns to
eql_v2_encrypted— a production-shaped flow that today has no good answer in either Stack or Proxy land. Ships as a new CLI command group + library, usable by both Stack (Protect.js) and Proxy users.Lifecycle
Each column walks through:
State model (three layers, kept separate on purpose)
.cipherstash/migrations.json: desired columns, index set, target phase. Code-reviewable intent.eql_v2_configuration: unchanged. Proxy continues to read this as its source of truth.cipherstash.cs_migrations: append-only event log — per-column phase, backfill cursor, rows processed. Installed bystash db install. Designed to be upstreamed into EQL aseql_v2_migrationsin a later release so Stack and Proxy own it jointly.Why a new table instead of reusing
eql_v2_configuration: its CHECK constraint rejects custom metadata, its state enum is global (only one{active, pending, encrypting}at a time) so it can't represent multiple columns in different phases, and backfill-cadence writes would collide with Proxy's 60s config refresh. Full reasoning in the design doc.New CLI commands (under
stash encrypt)statusplan.cipherstash/migrations.json) vs observed stateadvance --to <phase>backfillcutovereql_v2.rename_encrypted_columns()in a txn; optional Proxy refresh viaCIPHERSTASH_PROXY_URLdropDROP COLUMN <col>_plaintextmigration fileNew package
@cipherstash/migrateExposes the same primitives (
runBackfill,appendEvent,progress,renameEncryptedColumns, …) so users can embed backfill in their own workers/cron without the CLI. Example inpackages/migrate/README.md.Phase 1 scope / Phase 2 follow-ups
cs_migrationsstate),stash db introspect --json/stash env setCLI subcommands, upstreamcs_migrations→eql_v2_migrationsin EQL.Test plan
pnpm --filter @cipherstash/migrate test— 14 unit tests pass (state DAO, manifest round-trip, SQL identifier quoting)pnpm --filter @cipherstash/cli test— all 126 existing tests still passpnpm -w build— full workspace builds cleanpnpm exec biome check <changed files>— clean./dist/bin/stash.js --helpshows the six newencryptsubcommandsbash packages/cli/scripts/e2e-encrypt.sh— seeds 5000-rowuserstable, runs install → advance → backfill (with SIGINT + resume) → status → cutover → drop. Requires CipherStash credentials in env.SELECT email FROM usersvia Proxy returns plaintext, direct Postgres returns ciphertext JSON.Design doc
docs/plans/encryption-migrations.md— full architecture including state-layer rationale, index-on-backfill implications, Proxy compatibility gotchas, and phased rollout.