Run Codex auth refreshers in target namespaces#1283
Conversation
|
/kelos review |
|
🤖 Kelos Task Status Task |
There was a problem hiding this comment.
🤖 Kelos Reviewer Agent @gjkim42
Review Summary
Verdict: COMMENT
Overall correctness: patch is incorrect
Scope: Moves managed Codex auth refresher CronJobs into the target Secret namespace with namespace-local ServiceAccount, Role, and RoleBinding cleanup.
Findings Overview
| Priority | Count | File:Line | Summary |
|---|---|---|---|
| P0 | 0 | — | none |
| P1 | 1 | internal/controller/codex_auth_refresher_controller.go:61 | Secret owner refs require missing finalizers RBAC |
| P2 | 0 | — | none |
| P3 | 0 | — | none |
Findings
Correctness
- [P1] internal/controller/codex_auth_refresher_controller.go:61 —
controllerutil.SetControllerReferencedefaultsblockOwnerDeletionto true, and Kubernetes'OwnerReferencesPermissionEnforcementadmission controller requiresupdateon the referenced owner'sfinalizerssubresource before a client can set that bit. The new children are owned by Secrets, but the controller RBAC only grantssecretsget/list/watch/update, notsecrets/finalizers, so creating or updating these CronJob/Role/RoleBinding objects can be rejected on default clusters and the refresher will never be installed. Either grantsecrets/finalizersin the generated/controller RBAC or setcontrollerutil.WithBlockOwnerDeletion(false)for the Secret owner references.
Key takeaways
- This would be a
REQUEST_CHANGESreview, but GitHub does not allow this bot account to request changes on its own PR. - The target-namespace RBAC model is otherwise aligned with the PR goal, but the new owner references need matching RBAC semantics before this can work reliably after upgrade.
| ``` | ||
|
|
||
| For each labeled Secret with a non-empty `CODEX_AUTH_JSON` key, the controller manages a dedicated CronJob. The CronJob runs the Codex image under the controller's ServiceAccount, seeds `auth.json` from that one Secret, invokes Codex so the CLI performs its own refresh, and writes the refreshed file back to only the `CODEX_AUTH_JSON` key — other keys are preserved and the token is never logged. Removing the label or deleting the Secret removes its CronJob. Secrets without a `CODEX_AUTH_JSON` bundle, or whose bundle carries no `refresh_token` (e.g. API-key credentials), are skipped. Externally-managed Secrets (ExternalSecrets, Vault, sealed-secrets) will overwrite the refreshed value on their next sync and are not supported. | ||
| For each labeled Secret with a non-empty `CODEX_AUTH_JSON` key, the controller manages a dedicated CronJob in the Secret's namespace. The CronJob runs the Codex image under a namespace-local `kelos-codex-auth-refresher` ServiceAccount, and the controller creates per-Secret Role/RoleBinding objects that grant that ServiceAccount `get` and `update` on only the target Secret. The CronJob seeds `auth.json` from that one Secret, invokes Codex so the CLI performs its own refresh, and writes the refreshed file back to only the `CODEX_AUTH_JSON` key — other keys are preserved and the token is never logged. Removing the label or deleting the Secret removes its CronJob and per-Secret RBAC. Secrets without a `CODEX_AUTH_JSON` bundle, or whose bundle carries no `refresh_token` (e.g. API-key credentials), are skipped. Externally-managed Secrets (ExternalSecrets, Vault, sealed-secrets) will overwrite the refreshed value on their next sync and are not supported. |
There was a problem hiding this comment.
We should not be too specific for internal implementation.
|
|
||
| type CodexAuthRefresherBuilder struct { | ||
| // Namespace is the controller namespace used to clean up legacy CronJobs | ||
| // created before refreshers moved into the target Secret namespace. |
There was a problem hiding this comment.
You don't have to care about the migration since the feature is not released yet.
| } | ||
| logger.Info("Updated Codex auth refresher CronJob", "cronJob", current.Name, "secret", secret.Name, "namespace", secret.Namespace) | ||
| } | ||
| if err := r.deleteLegacyCronJob(ctx, secret.Namespace, secret.Name); err != nil { |
There was a problem hiding this comment.
Don't care about the legacy implementation and its migration.
|
🤖 Kelos Task Status Task |
|
Update the prompt as well to 'Reply with the single word OK.' |
|
/kelos pick-up |
|
🤖 Kelos Task Status Task |
|
🤖 Kelos Agent @gjkim42 /kelos review Addressed the latest review feedback: removed unreleased legacy cleanup, softened docs, updated the Codex refresher prompt, and made Secret owner references non-blocking. Local |
|
🤖 Kelos Task Status Task |
|
🤖 Kelos Agent @gjkim42 CI is green and the review feedback changes are pushed. I requested |
|
🤖 Kelos Task Status Task |
There was a problem hiding this comment.
🤖 Kelos Reviewer Agent @gjkim42
Review Summary
Verdict: COMMENT
Overall correctness: patch is correct
Scope: Runs managed Codex auth refresher CronJobs and per-Secret RBAC in the target Secret namespace.
Findings Overview
| Priority | Count | File:Line | Summary |
|---|---|---|---|
| P0 | 0 | — | none |
| P1 | 0 | — | none |
| P2 | 1 | internal/controller/codex_auth_refresher_controller.go:460 | Managed ServiceAccount drift does not requeue reconciliation |
| P3 | 0 | — | none |
Findings
Correctness
- [P2] internal/controller/codex_auth_refresher_controller.go:460 —
ensureServiceAccountnow creates and updates the sharedkelos-codex-auth-refresherServiceAccount, butSetupWithManageronly watches Secrets, CronJobs, Roles, and RoleBindings. If the ServiceAccount is deleted in a target namespace while the refreshable Secret and per-Secret resources remain unchanged, no event requeues that Secret, so the existing CronJob keeps creating pods that fail because the referenced ServiceAccount is missing. Add a ServiceAccount watch or another namespace-to-secret requeue path so the controller restores it like the other managed access resources.
There was a problem hiding this comment.
🤖 Kelos Reviewer Agent @gjkim42
Review Summary
Verdict: COMMENT
Overall correctness: patch is correct
Scope: Runs managed Codex auth refresher CronJobs and namespace-local per-Secret RBAC in the target Secret namespace.
Findings Overview
| Priority | Count | File:Line | Summary |
|---|---|---|---|
| P0 | 0 | — | none |
| P1 | 0 | — | none |
| P2 | 1 | internal/controller/codex_auth_refresher_controller.go:460 | Managed ServiceAccount drift does not requeue reconciliation |
| P3 | 0 | — | none |
Findings
Correctness
- [P2] internal/controller/codex_auth_refresher_controller.go:460 —
ensureServiceAccountcreates and updates the sharedkelos-codex-auth-refresherServiceAccount, butSetupWithManageronly watches Secrets, CronJobs, Roles, and RoleBindings. If that ServiceAccount is deleted in a target namespace while the opted-in Secret and per-Secret Role/RoleBinding remain unchanged, no watched event requeues the Secret, so the existing CronJob keeps creating pods that fail because the referenced ServiceAccount is missing. Add a ServiceAccount watch with a namespace-to-refreshable-Secret mapping, or another requeue path, so the controller restores this shared dependency like the other managed resources.
Key takeaways
- No blocking P0/P1 issues found in the current patch.
- The previously reported non-blocking ServiceAccount drift path still applies.
What type of PR is this?
/kind feature
What this PR does / why we need it:
Runs managed Codex auth refresher CronJobs in the same namespace as the Secret they refresh.
The controller now creates namespace-local refresher access for each opted-in Secret:
kelos-codex-auth-refresherServiceAccount in the target namespaceresourceNamesto the target SecretIt uses non-blocking Secret owner references for managed CronJobs, Roles, and RoleBindings so the controller does not require Secret finalizer update permissions. The refresher prompt is now
Reply with the single word OK.Docs describe the target-namespace behavior without exposing unnecessary implementation detail.Which issue(s) this PR is related to:
Fixes #1282
Special notes for your reviewer:
Verification run locally:
make verifymake testmake test-integrationDoes this PR introduce a user-facing change?