Skip to content

MGMT-24419: Delete DataImage after cluster is installed and remove detached annotation#844

Draft
giladravid16 wants to merge 1 commit into
openshift:mainfrom
giladravid16:MGMT-24419
Draft

MGMT-24419: Delete DataImage after cluster is installed and remove detached annotation#844
giladravid16 wants to merge 1 commit into
openshift:mainfrom
giladravid16:MGMT-24419

Conversation

@giladravid16

@giladravid16 giladravid16 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Having the detached annotation prevents users from performing hardware lifecycle operations, and keeping the DataImage causes the BMO to remount it after firmware updates. So we want to remove both of them.

Summary by CodeRabbit

  • New Features

    • Improved cluster installation flow to handle cleanup of associated storage artifacts before marking an installation complete.
    • Added automatic host reboot handling when cleanup requires it.
  • Bug Fixes

    • Installation monitoring now waits and requeues while cleanup is still in progress, instead of proceeding too early.
    • End-of-install behavior now uses a more reliable cleanup path and condition reporting.

…tached annotation

Having the detached annotation prevents users from performing hardware lifecycle operations, and keeping the DataImage causes the BMO to remount it after firmware updates. So we want to remove both of them.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 24, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 24, 2026
@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot

openshift-ci-robot commented Jun 24, 2026

Copy link
Copy Markdown

@giladravid16: This pull request references MGMT-24419 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "5.0." or "openshift-5.0.", but it targets "ACM 5.0" instead.

Details

In response to this:

Having the detached annotation prevents users from performing hardware lifecycle operations, and keeping the DataImage causes the BMO to remount it after firmware updates. So we want to remove both of them.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Walkthrough

A new controllers/common.go file centralizes package-level helper functions for BMH and DataImage lifecycle management (get, delete, reboot, annotation utilities). All reconciler instance methods providing these functions are removed from imageclusterinstall_controller.go and imageclusterinstall_monitor.go, with call sites updated to the package-level equivalents. The monitor's post-install flow replaces detached-annotation patching with a DataImage deletion and termination-wait gate before marking installation complete.

Changes

BMH/DataImage helper extraction and monitor install-completion gate

Layer / File(s) Summary
New package-level BMH/DataImage helpers
controllers/common.go
Adds getDataImage, deleteDataImage, removeBMHDataImage, attachAndRebootBMH, getBMH, setAnnotationIfNotExists, and annotationExists as package-level functions, consolidating shared BMH and DataImage lifecycle logic previously duplicated as reconciler instance methods.
Controller call-site migration
controllers/imageclusterinstall_controller.go
Switches validateConfiguration, ensureBMHDataImage, labelBMHForBackup, labelDataImageForBackup, and handleFinalizer from r.* instance method calls to package-level equivalents; removes the now-duplicate instance methods (getBMH, getDataImage, removeBMHDataImage, attachAndRebootBMH, deleteDataImage) and local annotation helpers.
Monitor DataImage deletion gate
controllers/imageclusterinstall_monitor.go
Adds RBAC for DataImage deletion and a dataImageRemovalPendingMessage constant; migrates getBMH call; introduces a DataImage-termination pre-check at the start of checkClusterStatus; replaces the detached-annotation patch in the install-completed path with removeBMHDataImage + handleDataImageDeletion wait; removes the monitor's own getBMH method.
Monitor tests for deletion gate
controllers/imageclusterinstall_monitor_test.go
Updates the existing "cluster installed" test to create a DataImage and assert its removal and BMH reboot annotation; adds two tests verifying requeue behavior and ClusterInstallStopped/ClusterInstallCompleted condition messages while DataImage termination is pending.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Microshift Test Compatibility ⚠️ Warning New Ginkgo tests use openshift hive objects (ClusterDeployment/ClusterImageSet) and BMH/DataImage with no MicroShift skip/tag guard. Add a MicroShift guard or tag the tests with the relevant apigroup/[Skipped:MicroShift], or move this coverage out of MicroShift-run suites.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: deleting the DataImage after installation and removing the detached annotation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The modified Ginkgo titles are static string literals; I found no dynamic suffixes, timestamps, UUIDs, or other run-dependent names.
Test Structure And Quality ✅ Passed PASS: The modified Ginkgo tests use BeforeEach/AfterEach cleanup, no indefinite waits, and the new It blocks stay focused on the DataImage-reconcile flow.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PASS: The added Ginkgo specs are controller/fake-client tests and only exercise BMH/DataImage lifecycle; they don't assume multiple nodes or HA.
Topology-Aware Scheduling Compatibility ✅ Passed The PR only changes BMH/DataImage cleanup logic in controllers; no affinity, nodeSelector, topology spread, PDB, or replica scheduling constraints were added.
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes were added: the touched controller/test files contain no fmt.Print*, klog stdout setup, or suite entrypoint logging.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PASS: The PR only adds controller unit tests using a fake client; no IPv4 parsing, host/IP URL joins, or external connectivity is exercised.
No-Weak-Crypto ✅ Passed No weak algorithms, custom crypto, or secret/token comparisons were added; the only crypto imports are sha256/sha512 digest registrations.
Container-Privileges ✅ Passed Changed Kubernetes manifests set privileged/allowPrivilegeEscalation false, drop ALL caps, hostNetwork/hostPID false, and runAsNonRoot true; no forbidden settings found.
No-Sensitive-Data-In-Logs ✅ Passed Touched logs only print Kubernetes object names/IDs and status messages; no passwords, tokens, kubeconfig contents, or other secret data are logged.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giladravid16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2026
@giladravid16

Copy link
Copy Markdown
Contributor Author

/test e2e-ibio

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
controllers/imageclusterinstall_monitor_test.go (1)

200-205: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert detached annotation removal too.

This updated assertion verifies rebooting, but the fixture BMH never has detachedAnnotation, so the test would miss a regression where the annotation remains and blocks lifecycle operations. Seed it before reconcile and assert it is absent here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/imageclusterinstall_monitor_test.go` around lines 200 - 205, The
test in imageclusterinstall_monitor_test.go only checks rebootAnnotation on the
BMH after DataImage removal, so it can miss a regression where
detachedAnnotation is still present. Update the test setup around the bmh
fixture and reconcile path to seed detachedAnnotation before the operation, then
extend the existing BMH assertions to verify that detachedAnnotation has been
removed alongside the reboot behavior. Use the same bmh object and the existing
annotation checks in this test to keep the assertion aligned with the DataImage
removal flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@controllers/common.go`:
- Around line 60-74: In removeBMHDataImage, detached annotation cleanup is
skipped when deleteDataImage returns nil, so the BareMetalHost is never fetched
in the already-missing DataImage case. Update the flow so the BareMetalHost
lookup and detachedAnnotation removal always happen after deleteDataImage, and
only call attachAndRebootBMH when a DataImage deletion was actually initiated
and dataImage is non-nil. Use removeBMHDataImage, deleteDataImage, and
attachAndRebootBMH to locate the logic.

---

Nitpick comments:
In `@controllers/imageclusterinstall_monitor_test.go`:
- Around line 200-205: The test in imageclusterinstall_monitor_test.go only
checks rebootAnnotation on the BMH after DataImage removal, so it can miss a
regression where detachedAnnotation is still present. Update the test setup
around the bmh fixture and reconcile path to seed detachedAnnotation before the
operation, then extend the existing BMH assertions to verify that
detachedAnnotation has been removed alongside the reboot behavior. Use the same
bmh object and the existing annotation checks in this test to keep the assertion
aligned with the DataImage removal flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 909251f3-c3a2-405e-aa52-143d0e250f69

📥 Commits

Reviewing files that changed from the base of the PR and between 6621941 and 7e1d6a7.

📒 Files selected for processing (4)
  • controllers/common.go
  • controllers/imageclusterinstall_controller.go
  • controllers/imageclusterinstall_monitor.go
  • controllers/imageclusterinstall_monitor_test.go

Comment thread controllers/common.go
Comment on lines +60 to +74
func removeBMHDataImage(ctx context.Context, c client.Client, log logrus.FieldLogger, bmhRef types.NamespacedName) (*bmh_v1alpha1.DataImage, error) {
dataImage, err := deleteDataImage(ctx, c, log, bmhRef)
if err != nil || dataImage == nil {
return dataImage, err
}

bmh := &bmh_v1alpha1.BareMetalHost{}
if err := c.Get(ctx, bmhRef, bmh); err != nil {
if k8sapierrors.IsNotFound(err) {
log.Warnf("Referenced BareMetalHost %s/%s does not exist, not waiting for dataImage deletion", bmhRef.Namespace, bmhRef.Name)
return nil, nil
}
return dataImage, fmt.Errorf("failed to get BareMetalHost %s/%s: %w", bmhRef.Namespace, bmhRef.Name, err)
}
return dataImage, attachAndRebootBMH(ctx, c, log, bmh)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Decouple detached-annotation cleanup from DataImage existence.

When deleteDataImage returns nil for an already-missing DataImage, Line 62 returns before fetching the BMH, so detachedAnnotation is never removed. That leaves the lifecycle-blocking annotation in exactly the already-cleaned-up/DataImage-missing case this PR should handle.

Possible fix direction
 	dataImage, err := deleteDataImage(ctx, c, log, bmhRef)
-	if err != nil || dataImage == nil {
+	if err != nil {
 		return dataImage, err
 	}
 
 	bmh := &bmh_v1alpha1.BareMetalHost{}

Then patch the BMH so detached removal always runs, while adding the reboot annotation only when a DataImage deletion was actually requested.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func removeBMHDataImage(ctx context.Context, c client.Client, log logrus.FieldLogger, bmhRef types.NamespacedName) (*bmh_v1alpha1.DataImage, error) {
dataImage, err := deleteDataImage(ctx, c, log, bmhRef)
if err != nil || dataImage == nil {
return dataImage, err
}
bmh := &bmh_v1alpha1.BareMetalHost{}
if err := c.Get(ctx, bmhRef, bmh); err != nil {
if k8sapierrors.IsNotFound(err) {
log.Warnf("Referenced BareMetalHost %s/%s does not exist, not waiting for dataImage deletion", bmhRef.Namespace, bmhRef.Name)
return nil, nil
}
return dataImage, fmt.Errorf("failed to get BareMetalHost %s/%s: %w", bmhRef.Namespace, bmhRef.Name, err)
}
return dataImage, attachAndRebootBMH(ctx, c, log, bmh)
func removeBMHDataImage(ctx context.Context, c client.Client, log logrus.FieldLogger, bmhRef types.NamespacedName) (*bmh_v1alpha1.DataImage, error) {
dataImage, err := deleteDataImage(ctx, c, log, bmhRef)
if err != nil {
return dataImage, err
}
bmh := &bmh_v1alpha1.BareMetalHost{}
if err := c.Get(ctx, bmhRef, bmh); err != nil {
if k8sapierrors.IsNotFound(err) {
log.Warnf("Referenced BareMetalHost %s/%s does not exist, not waiting for dataImage deletion", bmhRef.Namespace, bmhRef.Name)
return nil, nil
}
return dataImage, fmt.Errorf("failed to get BareMetalHost %s/%s: %w", bmhRef.Namespace, bmhRef.Name, err)
}
return dataImage, attachAndRebootBMH(ctx, c, log, bmh)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/common.go` around lines 60 - 74, In removeBMHDataImage, detached
annotation cleanup is skipped when deleteDataImage returns nil, so the
BareMetalHost is never fetched in the already-missing DataImage case. Update the
flow so the BareMetalHost lookup and detachedAnnotation removal always happen
after deleteDataImage, and only call attachAndRebootBMH when a DataImage
deletion was actually initiated and dataImage is non-nil. Use
removeBMHDataImage, deleteDataImage, and attachAndRebootBMH to locate the logic.

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

@giladravid16: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ibio 7e1d6a7 link true /test e2e-ibio

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@giladravid16

Copy link
Copy Markdown
Contributor Author

/testwith openshift/image-based-install-operator/main/e2e-ibio #844 openshift/release#80991

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants