What an operator running pk-auth in production needs to know: secrets, persistence, observability, rotation, and the most common ways a deployment goes wrong.
pk-auth ships as a JVM library — Spring Boot 4, Dropwizard 5, or Micronaut 4 adapters all consume the same core. A typical production deployment needs:
- JDK 21 (records, sealed types, virtual threads). Earlier JDKs will not compile.
- Postgres 16+ (when using
pk-auth-persistence-jdbi) — Flyway migrations run at startup, no manual schema work. - DynamoDB (when using
pk-auth-persistence-dynamodb) — two tables: a single-tablePkAuthCorecarrying every pk-auth auth item plus a separatePkAuthUserstable for the host-app user records theUserLookupSPI reads. See ADR 0008 for the table layout. - At least one trusted dispatcher for magic links + OTP if you enable those
flows. The testkit's
LoggingEmailSender/LoggingSmsSenderlog secrets to stdout; never use them in production.
| Setting | Min length | Notes |
|---|---|---|
pkauth.jwt.secret (HS256) |
32 bytes | Hard fail at boot if shorter. Rotate by issuing a fresh secret and tolerating a grace window (issue + verify in parallel — pk-auth itself does not rotate; the host shoulds run two issuers behind a load balancer until tokens expire). |
pkauth.relying-party.id |
n/a | The eTLD+1 (e.g. example.com, NOT auth.example.com). Cross-subdomain passkeys all bind to this. Once a credential is registered against an RP ID, it cannot be re-registered against a different one without a fresh enrollment. |
pkauth.relying-party.origins |
n/a | Strict allow-list of https:// origins. WebAuthn rejects mismatches; expand the list as you add subdomains. |
OTP pepper (pkauth.otp.pepper) |
16 bytes decoded (32+ recommended) | Base64-encoded per-deployment pepper for OTP hashes only — OTP codes are hashed with HMAC-SHA256(pepper, code), not Argon2id. (Backup codes use Argon2id with no pepper.) Hard fail at boot if the value is not valid Base64 or decodes to fewer than 16 bytes. If unset, the adapter auto-generates a throwaway per-startup pepper only when pkauth.dev-mode=true (dev only — it invalidates outstanding OTPs across restarts and across cluster instances); with dev-mode off, an unset pepper is a hard boot failure. Treat as a long-lived secret; rotating it invalidates every existing OTP hash. |
Recommended: stash secrets in a KMS/Secrets Manager and inject as environment
variables (PKAUTH_JWT_SECRET, PKAUTH_OTP_PEPPER). The adapters bind both.
CeremonyConfig.userVerification defaults to REQUIRED. With this default
WebAuthn4J enforces the asserted flagUV on every registration and
authentication, so each ceremony must carry a per-ceremony biometric or PIN —
this is what makes a passkey a genuine factor (something you have and
something you are/know).
Relaxing it to PREFERRED or DISCOURAGED accepts a present-but-unverified
authenticator (mere user-presence, no biometric/PIN). A passkey then degrades to
"something you have" alone, which materially weakens the factor for every user.
Generally do not relax UV. The only standard reason to opt out is supporting
UV-incapable roaming hardware security keys; if you do, scope it deliberately and
understand the trade-off.
Two ceremony knobs control which COSE signature algorithms are used (ADR 0019):
pkauth.ceremony.offered-algorithms— advertised to the authenticator in registration create-options. Default:[ES256, EdDSA, RS256].pkauth.ceremony.accepted-algorithms— the verify allow-list; a credential whose algorithm is absent is rejected on registration. Default (and the enforced superset):[ES256, EdDSA, RS256, ES384, RS384].
offered must be a subset of accepted. The defaults are the historical union,
so leaving them unset changes nothing. You may narrow either (e.g. drop RSA),
but narrowing accepted can reject already-registered credentials that use a
removed algorithm — drive a re-enrollment campaign first (see
AdminService.listCredentialsByAlgorithm). In Spring and Micronaut these bind
from application.yml; in Dropwizard they are the PkAuthConfig.Ceremony
record's offeredAlgorithms / acceptedAlgorithms components. All three example
apps set them explicitly (to the defaults) as living documentation. There is no
post-quantum signature algorithm to select yet — see docs/threat-model.md
(Post-quantum readiness).
- Flyway resources live in
pk-auth-persistence-jdbi/src/main/resources/db/migration. - Migrations run automatically when the SPI is wired (see ADR 0003).
- The shipped baseline is split across
V1__credentials.sql,V2__challenges.sql,V3__backup_codes.sql,V4__otp_codes.sql, andV5__example_users.sql— five tables (credentials,challenges,backup_codes,otp_codes,users) with nopkauth_prefix.V6__audit_soft_delete.sqladds the append-onlypkauth_audit_eventstable.V7__credentials_hard_delete.sqldrops therevoked_at/revoked_reasoncolumns oncredentials— credential delete is a hard delete, with the audit record captured as a structured log event (pkauth.credential.deleted).V8__create_access_tokens.sqlandV9__create_refresh_tokens.sqladd the 1.1.0access_tokensandrefresh_tokenstables;V10__refresh_tokens_amr.sqladds theamr(RFC 8176 authentication-method-reference) column torefresh_tokens. - Magic-link tokens are not persisted: the JWT is the credential, and the
consumed-JTI store is in-memory by default (see
ConsumedJtiStoreSPI for a multi-replica override). - The unique key on credential ID is byte-array shaped — do not introduce a string-encoded column without a migration.
- Two physical tables (see ADR 0008):
PkAuthCoreholds every pk-auth auth item (credentials, challenges, backup codes, OTP codes, and the 1.1.0 token rows), andPkAuthUsersholds the host-app user records theUserLookupSPI reads. Provision both before the app starts; the adapter does not create them. - The DynamoDB-native TTL attribute is
ttl(epoch seconds) — enable TTL on thettlattribute of thePkAuthCoretable. It is set onChallengeandOneTimePasscodeitems so DynamoDB evicts them after expiry. (Magic-link tokens are never persisted, in any backend.) - 1.1.0 adds
access_tokensandrefresh_tokensitems on the samePkAuthCoretable (ADR 0015, 0013), both pruned by the nativettlattribute. Access-token rows setttlto theirexpiresAtepoch second; refresh-token rows set it toexpiresAt + cleanupRetention(default 30 days) so used/revoked rows survive the forensic-retention window before the background sweep removes them — matching the JDBI cleanup semantics. TTL must be enabled on the table for this to work. - Capacity-mode: on-demand is recommended for steady reads but bursty registration; provisioned only makes sense once you have a stable signing/verification baseline.
The new stateful access-token store (ADR 0015) and refresh-token store (ADR 0013) keep used/revoked rows around for a configurable retention window so operators have a forensic trail. Schedule a daily cleanup job:
JDBI / Postgres — call the SPI methods or run the canonical SQL:
-- Access tokens: drop rows whose exp has passed.
DELETE FROM access_tokens WHERE expires_at < NOW() - INTERVAL '1 day';
-- Refresh tokens: keep used/revoked rows for the configured retention
-- (default 30 days) so a forensic look-back survives.
DELETE FROM refresh_tokens
WHERE expires_at < NOW() - INTERVAL '30 days'
AND (used_at IS NOT NULL OR revoked_at IS NOT NULL);DynamoDB — native TTL handles routine expiry asynchronously. If you
need synchronous pruning (operator action / test), call
DynamoDbAccessTokenStore.deleteExpiredBefore(Instant) and
DynamoDbRefreshTokenRepository.deleteExpiredBefore(Instant) —
both walk the primary items and remove anything past the cutoff.
A daily cron is sufficient for both tables; neither row count grows unboundedly because TTL is set at issue time.
Every ceremony and admin operation emits structured logs at INFO. Suggested fields to forward into your SIEM:
userHandle(base64url),challengeId,credentialIdceremony.phase(start/finish) andceremony.step(registration/authentication)verification.kind(signature/originPolicy/rpIdPolicy/counterRegression/attestationPolicy)result(success/denied:<reason>)
Counter regression and origin mismatch both surface as INFO log entries with a
distinct result.denied.reason. Alert on either — they are signals of credential
cloning or a misconfigured RP.
Recommended dashboards:
- p99 of
registration.finish/authentication.finish(target < 200ms with Postgres on the same VPC). - 4xx by reason on
/auth/passkeys/*(origin mismatch is almost always config drift; counter regression is almost always an issue). - Backup-code redemption and OTP attempt rates per user (the SPIs already rate-limit, but operator-side alerts catch credential-stuffing).
- Passkey rotation: users delete and re-add via
DELETE /auth/admin/credentials/{id}and a fresh registration ceremony. The "last credential" guard returns 409 — that is intentional. Encourage users to add a second passkey before removing the first. - JWT secret rotation: roll via the dual-issuer pattern in §2.
- RP ID change: a one-way migration. Every existing passkey is invalidated. Plan a re-enrollment campaign with backup codes / magic links as the bridge.
| Symptom | Likely cause | First check |
|---|---|---|
| Browser shows "Relying party not registrable" | RP ID doesn't match the page's domain | The pkauth.relying-party.id config and the page's actual host |
4xx on authentication.finish with counter_regression |
A counter wound back — either credential clone or counter-0 (synced) passkey crossing devices | Inspect the credential's backupEligible flag; if true, consider switching the policy to warn |
Challenge expired 4xx |
Five-minute default TTL elapsed | Often a slow user; do not extend the TTL — re-issue start |
DynamoDB ConditionalCheckFailedException on takeOnce |
Two clients tried to consume the same challenge | Expected; only one succeeds. If the rate is high, inspect for double-submit on the client |
| Spring Security 7 chain mounts before the pk-auth filter | Filter order regression | Verify PkAuthSecurityConfig.pkAuthSecurityFilterChain has the higher precedence in the host's chain |
The account-admin surface (/auth/admin/** — list / rename / delete passkeys,
regenerate backup codes, email / phone verification) lives in the optional
com.codeheadsystems:pk-auth-admin-api module. If a deployment drives those
operations out-of-band (an internal console, a separate service) and wants a
smaller public HTTP surface, the admin endpoints can be turned off by
configuration alone — no source changes to pk-auth. In every adapter the rule
is the same: the admin routes mount only when pk-auth-admin-api is on the
runtime classpath. Leave it off and no /auth/admin/** routes are registered
(requests get a clean 404); the ceremony, JWT, and refresh endpoints are
unaffected.
| Adapter | How admin is wired | To disable |
|---|---|---|
Spring Boot (pk-auth-spring-boot-starter) |
pk-auth-admin-api is compileOnly; PkAuthAdminAutoConfiguration is @ConditionalOnClass(AdminService) |
Do not add pk-auth-admin-api as a runtime dependency (the starter does not pull it transitively) |
Dropwizard (pk-auth-dropwizard) |
pk-auth-admin-api is compileOnly; the bundle mounts PkAuthAdminResource only when admin is wired |
Omit pk-auth-admin-api, or register the bundle with the no-admin constructor new PkAuthBundle(persistence) |
Micronaut (pk-auth-micronaut) |
pk-auth-admin-api is compileOnly; PkAuthAdminFactory is @Requires(classes = AdminService.class) and PkAuthAdminController is @Requires(beans = AdminService.class) |
Do not add pk-auth-admin-api as a runtime dependency (the adapter does not pull it transitively) |
For Maven/Gradle consumers this is purely a dependency decision in the host
application — the admin module is opt-in. The three example apps under
examples/ declare pk-auth-admin-api explicitly because they exercise the full
admin walkthrough; a production host that wants the ceremony surface only simply
leaves that line out.
Note (Micronaut). Keep
PkAuthAdminFactory(the@Requires-gatedAdminServicebean) separate fromPkAuthFactory. Because Micronaut's generated bean definition for a@Factoryreferences the return types of its factory methods, hosting the optionalAdminServicebean on the main factory would makePkAuthFactoryunloadable whenpk-auth-admin-apiis absent (NoClassDefFoundErroron the first ceremony request). The split keeps the always-on factory free of any reference to the optional module.
pk-auth's own primitives are post-quantum aware (see docs/threat-model.md →
Post-quantum readiness), but the one genuine harvest-now, decrypt-later risk
in a real deployment is outside this library: an adversary who records your
TLS sessions today can decrypt them later once a cryptographically-relevant
quantum computer (CRQC) breaks the classical key exchange that protected them. Those
sessions carry pk-auth's bearer material in transit — access JWTs, refresh tokens,
magic-link URLs, OTP codes. Unlike a WebAuthn assertion (challenge–response, nothing
to harvest), a recorded ciphertext is harvestable.
TLS is terminated at your edge — load balancer, reverse proxy, or CDN — not in pk-auth. So the mitigation is an operator action, not a library change:
- Enable a hybrid post-quantum key exchange at the TLS terminator / CDN. The
current deployable standard is the hybrid group
X25519MLKEM768(X25519 + ML-KEM-768 / FIPS 203), which stays classically secure even if the PQC half is later faulted, and is already supported by recent OpenSSL/BoringSSL, major CDNs, and current browsers. Configure it wherever you terminate TLS:- Nginx/OpenSSL 3.5+: include the hybrid group in
ssl_ecdh_curve/Groups/Curves(e.g.X25519MLKEM768:X25519). - Behind a CDN (Cloudflare, etc.): enable post-quantum / hybrid key agreement in the edge TLS settings.
- Nginx/OpenSSL 3.5+: include the hybrid group in
- Keep token TTLs short. A short access-token TTL (default 1 hour) and rotating refresh tokens bound the value of any session an attacker does eventually decrypt.
This does not change anything pk-auth signs or stores; it hardens the channel the tokens travel over. There is no pk-auth setting for it — it lives entirely in your TLS-terminating layer.
See docs/threat-model.md for the formal STRIDE pass.