[OSPRH-29575] Clean up certificate handling#105
Conversation
|
Skipping CI for Draft Pull Request. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR consolidates operator/system, kube-root, OpenShift service CA, and optional user CAs into a single reconciled ConfigMap and mounts it into deployments; it updates constants, errors, controller watches, deployment wiring, Postgres TLS config paths/SSL mode, and test fixtures. ChangesCombined CA Bundle System
Sequence DiagramsequenceDiagram
participant Operator as OpenStackLightspeed Operator
participant KubeAPI as Kubernetes API
participant ConfigMapSources as ConfigMaps (kube-root-ca, service-ca, user CA)
participant CombinedCM as combined-ca-bundle ConfigMap
participant Deployment as LCore Deployment
Operator->>KubeAPI: Watch OpenStackLightspeed CRs & ConfigMaps
KubeAPI->>Operator: Notify on ConfigMap resourceVersion change
Operator->>ConfigMapSources: Read system CA, kube-root CA, service CA, optional user CA
ConfigMapSources-->>Operator: Return CA PEM data
Operator->>Operator: Parse PEMs, skip expired, dedupe by SHA256(DER)
Operator->>Operator: Merge and encode combined PEM
Operator->>CombinedCM: Create/patch combined-ca-bundle ConfigMap
CombinedCM-->>Operator: ConfigMap created/updated
Operator->>Deployment: Ensure pod templates mount combined CA at configured path
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/test all |
|
/test all |
|
/test all |
01d0e1c to
ef39cc1
Compare
|
/test all |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
internal/controller/ca_bundle.go (1)
71-72: 💤 Low valueConsider extracting
"CERTIFICATE"to a constant.The string
"CERTIFICATE"appears multiple times in this file. Extracting it to a package-level constant would satisfy the linter and improve maintainability.♻️ Suggested constant definition
+const pemTypeCertificate = "CERTIFICATE" + type caCert struct {Then replace occurrences:
- if block.Type != "CERTIFICATE" { + if block.Type != pemTypeCertificate {block := &pem.Block{ - Type: "CERTIFICATE", + Type: pemTypeCertificate, Bytes: c.cert.Raw, }Also applies to: 109-109
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/ca_bundle.go` around lines 71 - 72, Extract the repeated literal "CERTIFICATE" into a package-level constant (e.g., const pemCertificate = "CERTIFICATE") and replace all literal uses in this file—such as the block.Type comparison in the error path that currently uses block.Type != "CERTIFICATE" and any other occurrences (including the fmt.Errorf message and the occurrence around line 109)—with that constant; ensure the error message still prints block.Type but references the constant name in the code instead of the raw string.internal/controller/openstacklightspeed_controller.go (1)
306-326: ⚡ Quick winConsider also triggering reconciliation when system CA ConfigMaps change.
The handler only triggers reconciliation when a user-provided CA ConfigMap (matching
Spec.TLSCACertBundle) changes. However,reconcileCombinedCABundleConfigMapalso mergesopenshift-service-ca.crtandkube-root-ca.crtinto the combined bundle. If those system ConfigMaps rotate (e.g., service CA renewal), the combined bundle won't be updated until another reconciliation occurs.Consider extending the handler to also trigger on changes to the system CA ConfigMaps:
♻️ Suggested enhancement
func (r *OpenStackLightspeedReconciler) NotifyOpenStackLightspeedsByCAConfigMap(ctx context.Context, obj client.Object) []ctrl.Request { + // System CA ConfigMaps affect all instances - trigger reconciliation for all + if obj.GetName() == OpenShiftServiceCAConfigMap || obj.GetName() == KubeRootCAConfigMap { + return r.NotifyAllOpenStackLightspeeds(ctx, obj) + } + + // User-provided CA ConfigMaps - only trigger for matching instances var lightspeedList apiv1beta1.OpenStackLightspeedList if err := r.List(ctx, &lightspeedList, client.InNamespace(obj.GetNamespace())); err != nil { return nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/openstacklightspeed_controller.go` around lines 306 - 326, The current NotifyOpenStackLightspeedsByCAConfigMap only enqueues reconcilations when a user-specified CA ConfigMap matches Spec.TLSCACertBundle; extend it to also detect system CA ConfigMap changes (names "openshift-service-ca.crt" and "kube-root-ca.crt") and, when the incoming obj.GetName() matches either of those, enqueue ctrl.Requests for all OpenStackLightspeed items in the list (same way as the loop currently builds requests) so reconcileCombinedCABundleConfigMap will run and rebuild the combined bundle; update NotifyOpenStackLightspeedsByCAConfigMap to check obj.GetName() against those system names and append requests for every item when a match is found.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/controller/ca_bundle_test.go`:
- Around line 590-592: The test assertion expects "service CA" but
reconcileCombinedCABundleConfigMap wraps errors with the ConfigMap name
(openshift-service-ca.crt); update the assertion in ca_bundle_test.go to check
for the actual ConfigMap name or the exact wrapped message from
reconcileCombinedCABundleConfigMap (e.g., assert strings.Contains(err.Error(),
"openshift-service-ca.crt") or match the full wrapped error), referencing the
reconcileCombinedCABundleConfigMap function to ensure the test matches its error
text.
- Around line 628-630: The test assertion is checking for "kube root CA" but the
actual error message contains "kube-root-ca.crt"; update the assertion in the
test (the strings.Contains(err.Error(), "kube root CA") check) to look for
"kube-root-ca.crt" (or a more robust substring like "kube-root-ca" or a
case-insensitive match) so the test matches the real error text returned by err.
---
Nitpick comments:
In `@internal/controller/ca_bundle.go`:
- Around line 71-72: Extract the repeated literal "CERTIFICATE" into a
package-level constant (e.g., const pemCertificate = "CERTIFICATE") and replace
all literal uses in this file—such as the block.Type comparison in the error
path that currently uses block.Type != "CERTIFICATE" and any other occurrences
(including the fmt.Errorf message and the occurrence around line 109)—with that
constant; ensure the error message still prints block.Type but references the
constant name in the code instead of the raw string.
In `@internal/controller/openstacklightspeed_controller.go`:
- Around line 306-326: The current NotifyOpenStackLightspeedsByCAConfigMap only
enqueues reconcilations when a user-specified CA ConfigMap matches
Spec.TLSCACertBundle; extend it to also detect system CA ConfigMap changes
(names "openshift-service-ca.crt" and "kube-root-ca.crt") and, when the incoming
obj.GetName() matches either of those, enqueue ctrl.Requests for all
OpenStackLightspeed items in the list (same way as the loop currently builds
requests) so reconcileCombinedCABundleConfigMap will run and rebuild the
combined bundle; update NotifyOpenStackLightspeedsByCAConfigMap to check
obj.GetName() against those system names and append requests for every item when
a match is found.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f7880f53-8b9a-4e6d-8e8e-c55751fd4542
📒 Files selected for processing (20)
internal/controller/assets/postgres.confinternal/controller/ca_bundle.gointernal/controller/ca_bundle_test.gointernal/controller/common.gointernal/controller/constants.gointernal/controller/errors.gointernal/controller/lcore_config.gointernal/controller/lcore_deployment.gointernal/controller/lcore_reconciler.gointernal/controller/llama_stack_config.gointernal/controller/openstacklightspeed_controller.gointernal/controller/postgres_deployment.gotest/kuttl/common/expected-configs/lightspeed-stack-update.yamltest/kuttl/common/expected-configs/lightspeed-stack.yamltest/kuttl/common/expected-configs/ogx_config-update.yamltest/kuttl/common/expected-configs/ogx_config.yamltest/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/errors-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/08-assert-openstacklightspeed-update.yamltest/kuttl/tests/update-openstacklightspeed/11-assert-configmaps-update.yaml
💤 Files with no reviewable changes (5)
- test/kuttl/common/expected-configs/ogx_config.yaml
- test/kuttl/common/expected-configs/ogx_config-update.yaml
- internal/controller/common.go
- internal/controller/llama_stack_config.go
- internal/controller/postgres_deployment.go
|
/test all |
|
/test all |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/controller/ca_bundle_test.go (1)
434-638: ⚡ Quick winAdd reconcile coverage for
Spec.TLSCACertBundle(including multi-key user ConfigMap).Current reconcile tests don’t exercise the user-specified CA ConfigMap path, which is central to this change. Add a test that sets
instance.Spec.TLSCACertBundle, provides a ConfigMap with multiple keys, and verifies all valid certs are merged (plus invalid-key failure case).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/ca_bundle_test.go` around lines 434 - 638, Add a new unit test that exercises reconcileCombinedCABundleConfigMap using an OpenStackLightspeed instance with Spec.TLSCACertBundle set to a user ConfigMap name; create that user ConfigMap with multiple data keys containing valid PEM certs (and include a case with an invalid/non-cert key to assert failure), call reconcileCombinedCABundleConfigMap, then fetch the resulting CombinedCABundleConfigMapName and validate CombinedCABundleKey by parsing with caBundle.getCertsFromPEM and asserting the expected number of merged/deduplicated certs; reference functions/types: reconcileCombinedCABundleConfigMap, OpenStackLightspeed.Spec.TLSCACertBundle, CombinedCABundleConfigMapName, CombinedCABundleKey, and caBundle.getCertsFromPEM.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/controller/ca_bundle.go`:
- Around line 63-101: The function getCertsFromPEM should reject trailing
non-PEM garbage: after the for loop that repeatedly calls pem.Decode(rest) and
before returning nil, check whether the remaining bytes in rest contain any
non-whitespace characters; if so return an error (e.g., fmt.Errorf("trailing
non-PEM data")) so inputs like "valid cert + junk bytes" fail validation. Locate
the loop that uses pem.Decode(rest) and the variable rest, and add this
post-loop validation to prevent accepting malformed ConfigMap values.
---
Nitpick comments:
In `@internal/controller/ca_bundle_test.go`:
- Around line 434-638: Add a new unit test that exercises
reconcileCombinedCABundleConfigMap using an OpenStackLightspeed instance with
Spec.TLSCACertBundle set to a user ConfigMap name; create that user ConfigMap
with multiple data keys containing valid PEM certs (and include a case with an
invalid/non-cert key to assert failure), call
reconcileCombinedCABundleConfigMap, then fetch the resulting
CombinedCABundleConfigMapName and validate CombinedCABundleKey by parsing with
caBundle.getCertsFromPEM and asserting the expected number of
merged/deduplicated certs; reference functions/types:
reconcileCombinedCABundleConfigMap, OpenStackLightspeed.Spec.TLSCACertBundle,
CombinedCABundleConfigMapName, CombinedCABundleKey, and
caBundle.getCertsFromPEM.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1830c078-881c-4c9e-ba9e-7a12b59a421c
📒 Files selected for processing (20)
internal/controller/assets/postgres.confinternal/controller/ca_bundle.gointernal/controller/ca_bundle_test.gointernal/controller/common.gointernal/controller/constants.gointernal/controller/errors.gointernal/controller/lcore_config.gointernal/controller/lcore_deployment.gointernal/controller/lcore_reconciler.gointernal/controller/llama_stack_config.gointernal/controller/openstacklightspeed_controller.gointernal/controller/postgres_deployment.gotest/kuttl/common/expected-configs/lightspeed-stack-update.yamltest/kuttl/common/expected-configs/lightspeed-stack.yamltest/kuttl/common/expected-configs/ogx_config-update.yamltest/kuttl/common/expected-configs/ogx_config.yamltest/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/errors-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/08-assert-openstacklightspeed-update.yamltest/kuttl/tests/update-openstacklightspeed/11-assert-configmaps-update.yaml
💤 Files with no reviewable changes (5)
- test/kuttl/common/expected-configs/ogx_config.yaml
- test/kuttl/common/expected-configs/ogx_config-update.yaml
- internal/controller/llama_stack_config.go
- internal/controller/common.go
- internal/controller/postgres_deployment.go
✅ Files skipped from review due to trivial changes (1)
- test/kuttl/common/openstack-lightspeed-instance/errors-openstack-lightspeed-instance.yaml
🚧 Files skipped from review as they are similar to previous changes (9)
- test/kuttl/tests/update-openstacklightspeed/11-assert-configmaps-update.yaml
- test/kuttl/common/expected-configs/lightspeed-stack-update.yaml
- internal/controller/lcore_config.go
- test/kuttl/tests/update-openstacklightspeed/08-assert-openstacklightspeed-update.yaml
- internal/controller/lcore_reconciler.go
- test/kuttl/common/expected-configs/lightspeed-stack.yaml
- internal/controller/openstacklightspeed_controller.go
- internal/controller/constants.go
- test/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yaml
umago
left a comment
There was a problem hiding this comment.
I can't fault the patch, it LGTM! Thanks for working on this and simplifying the process we handle these certificates.
There were multiple issues with how the operator handled certificates: 1) The lightspeed-stack pod used REQUESTS_CA_BUNDLE and SSL_CERT_FILE environment variables, which bypassed the system-configured certificates. 2) When a user provided a custom CA certificate, it was expected under the cert.crt key in their ConfigMap. This was undocumented and the required key name was not obvious. 3) PostgresDB appeared to be configured for mTLS because ssl_ca_file was set in postgres.conf and the openshift-service-ca.crt was mounted into the PostgresDB pod. This created a false sense of mTLS being in place, but with the default pg_hba.conf, client certificate verification [1] is not enabled. Neither OGX [3][4] nor Lightspeed Stack [5] supports providing client certificates to PostgresDB. 4) PostgresDB SSL connection settings were configured for OGX even though they have no effect. OGX does not support configuring the SSL mode for its PostgresDB connection [3][4], so the PostgresDB certificate verification cannot be strictly enforced on the OGX side (the default is "prefer" [2], which does not enforce certificate verification and can fall back to unencrypted communication). OGX uses a non-strict config mode, so unrecognized options are silently ignored. 5) The operator did not watch for changes to ConfigMaps. When the content of the CA bundle ConfigMap was updated, the operator did not automatically reconcile. 6) Not a bug strictly speaking, but Lightspeed Stack used ssl_mode "require" when it could have used "verify-full", which checks both that the certificate is signed by a trusted CA and that the server hostname matches the CN field in the certificate. This commit simplifies certificate handling with the following changes: - Introduce a single CA bundle ConfigMap (openstack-lightspeed-ca-bundle) containing the system CAs, user-provided CA certificates from the OpenStackLightspeed CRD, kube-root-ca.crt, and openshift-service-ca.crt. This bundle is mounted into all containers in the lightspeed-stack-deployment pod, eliminating the need for REQUESTS_CA_BUNDLE and SSL_CERT_FILE. - When a user specifies a ConfigMap with custom CA certificates, iterate over all keys, validate that each holds a valid certificate, and append it to the CA bundle (resolves openstack-lightspeed#2). - Stop mounting openshift-service-ca into the Postgres pod and remove ssl_ca_file from postgres.conf. These gave a false sense of client certificate validation; actually enforcing it requires configuring pg_hba.conf [1], and neither OGX [3][4] nor Lightspeed Stack [5] currently supports providing client certificates. - Remove ssl_mode, ca_cert_path, and gss_encmode from storage.backends.postgres_backend in ogx_config.yaml. These options are not supported by OGX [3][4] and gave a false sense of SSL being configured. - Add a Watch() on ConfigMaps to the reconciler so that whenever a user updates the CA bundle ConfigMap, the reconcile loop runs automatically. - Configure Lightspeed Stack with ssl_mode "verify-full" for its PostgreSQL connection, ensuring both CA trust and hostname verification. [1] https://www.postgresql.org/docs/current/ssl-tcp.html#SSL-CLIENT-CERTIFICATES [2] https://magicstack.github.io/asyncpg/current/api/index.html#asyncpg.connection.connect [3] https://github.com/ogx-ai/ogx/blob/34d7901/src/ogx/core/storage/datatypes.py#L200 [4] https://github.com/ogx-ai/ogx/blob/34d7901/src/ogx/core/storage/sqlalchemy_sqlstore.py#L125 [5] https://github.com/lightspeed-core/lightspeed-stack/blob/7503ebd/src/models/config.py#L181 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lpiwowar, umago The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
45676d8
into
openstack-lightspeed:lcore-migration
Issue
There were multiple issues with how the operator handled certificates:
The lightspeed-stack container used
REQUESTS_CA_BUNDLEandSSL_CERT_FILEenvironment variables, which bypassed the system-configured certificates.When a user provided a custom CA certificate, it was expected under the
cert.crtkey in their ConfigMap. This was undocumented and the required key name was not obvious.PostgresDB appeared to be configured for mTLS because
ssl_ca_filewas set inpostgres.confand theopenshift-service-ca.crtwas mounted into the PostgresDB pod. This created a false sense of mTLS being in place, but with the defaultpg_hba.conf, client certificate verification [1] is not enabled. Neither OGX [3][4] nor Lightspeed Stack [5] supports providing client certificates to PostgresDB.PostgresDB SSL connection settings were configured for OGX even though they have no effect. OGX does not support configuring the SSL mode for its PostgresDB connection [3][4], so the PostgresDB certificate verification cannot be strictly enforced on the OGX side (the default is "prefer" [2], which does not enforce certificate verification and can fall back to unencrypted communication). OGX uses a non-strict config mode, so unrecognized options are silently ignored. Securing the connection between OGX container and Postgres DB container should be done as part of follow up work (OSPRH-30503).
The operator did not watch for changes to ConfigMaps. When the content of the CA bundle ConfigMap was updated, the operator did not automatically reconcile.
Not a bug strictly speaking, but Lightspeed Stack used ssl_mode "require" when it could have used "verify-full", which checks both that the certificate is signed by a trusted CA and that the server hostname matches the CN field in the certificate.
Proposed Solution
This commit simplifies certificate handling with the following changes:
Introduce a single combined CA bundle ConfigMap (
openstack-lightspeed-combined-ca-bundle) containing the system CAs, user-provided CA certificates from theOpenStackLightspeedCRD,kube-root-ca.crt, andopenshift-service-ca.crt. This bundle is mounted into all containers in thelightspeed-stack-deploymentpod, eliminating the need forREQUESTS_CA_BUNDLEandSSL_CERT_FILE.When a user specifies a ConfigMap with custom CA certificates, iterate over all keys, validate that each holds a valid certificate, and append it to the combined CA bundle (resolves
#2).Stop mounting
openshift-service-ca.crtinto the Postgres pod and removessl_ca_filefrompostgres.conf. These gave a false sense of client certificate validation. Actually enforcing it requires configuringpg_hba.conf[1], and neither OGX [3][4][6] nor Lightspeed Stack [5] currently supports providing client certificates.Remove
ssl_mode,ca_cert_path, andgss_encmodefromstorage.backends.postgres_backendinogx_config.yaml. These options are not supported by OGX [3][4][6] and gave a false sense of SSL being configured.Add a
Watch()on ConfigMaps to the reconciler so that whenever a user updates the CA bundle ConfigMap, the reconcile loop runs automatically.Configure Lightspeed Stack with
ssl_mode"verify-full" for its PostgreSQL connection, ensuring both CA trust and hostname verification.[1] https://www.postgresql.org/docs/current/ssl-tcp.html#SSL-CLIENT-CERTIFICATES
[2] https://magicstack.github.io/asyncpg/current/api/index.html#asyncpg.connection.connect
[3] https://github.com/ogx-ai/ogx/blob/34d7901/src/ogx/core/storage/datatypes.py#L200
[4] https://github.com/ogx-ai/ogx/blob/34d7901/src/ogx/core/storage/sqlstore/sqlalchemy_sqlstore.py#L125
[5] https://github.com/lightspeed-core/lightspeed-stack/blob/7503ebd/src/models/config.py#L181
[6] ogx-ai/ogx#5978
Summary by CodeRabbit
New Features
Behavior Changes
Tests