Skip to content

OCPBUGS-90560: Resolve GCP boot image from worker machines#1512

Open
nader-ziada wants to merge 1 commit into
openshift:mainfrom
nader-ziada:fix-gcp-boot-image
Open

OCPBUGS-90560: Resolve GCP boot image from worker machines#1512
nader-ziada wants to merge 1 commit into
openshift:mainfrom
nader-ziada:fix-gcp-boot-image

Conversation

@nader-ziada

@nader-ziada nader-ziada commented Jun 23, 2026

Copy link
Copy Markdown

Instead of hardcoding a stale RHCOS 4.14 boot disk image for GCP defaults, resolve the image dynamically from existing worker machines at webhook startup. Falls back to the hardcoded default when no workers are found. Filters by architecture (x86-64/aarch64) to avoid selecting the wrong image on mixed-arch clusters.

Introduces ResolveDefaulterConfig() so both Machine and MachineSet defaulters share a single client and boot image resolution call.

Summary by CodeRabbit

  • New Features
    • Added configurable defaulting for GCP boot image selection for both Machine and MachineSet webhooks, resolving the best match from existing worker machine settings by architecture and falling back to hardcoded defaults when no match is available.
  • Refactor
    • Updated webhook defaulter wiring to use a shared, configuration-driven approach for consistent behavior across Machine and MachineSet.
  • Tests
    • Expanded unit tests to cover GCP boot image resolution, resolved-image defaulting, and fallback behavior for both Machine and MachineSet webhooks.

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 23, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@nader-ziada: This pull request references Jira Issue OCPBUGS-90560, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Instead of hardcoding a stale RHCOS 4.14 boot disk image for GCP defaults, resolve the image dynamically from existing worker machines at webhook startup. Falls back to the hardcoded default when no workers are found. Filters by architecture (x86-64/aarch64) to avoid selecting the wrong image on mixed-arch clusters.

Introduces ResolveDefaulterConfig() so both Machine and MachineSet defaulters share a single client and boot image resolution call.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 53eab1ad-437f-4134-95de-3546436725a0

📥 Commits

Reviewing files that changed from the base of the PR and between fef4618 and 8389bf4.

📒 Files selected for processing (5)
  • cmd/machineset/main.go
  • pkg/webhooks/machine_webhook.go
  • pkg/webhooks/machine_webhook_test.go
  • pkg/webhooks/machineset_webhook.go
  • pkg/webhooks/machineset_webhook_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/webhooks/machineset_webhook.go
  • pkg/webhooks/machineset_webhook_test.go
  • pkg/webhooks/machine_webhook_test.go
  • cmd/machineset/main.go
  • pkg/webhooks/machine_webhook.go

Walkthrough

Adds a shared DefaulterConfig struct resolved at webhook startup. For GCP clusters, ResolveDefaulterConfig scans existing worker Machines to find an architecture-specific boot disk image. This resolved image is threaded into NewMachineDefaulter and NewMachineSetDefaulter constructors (both now accept *DefaulterConfig) and used by defaultGCPDisks with fallback to the previous static default. All test calls to these constructors are updated to pass the new parameter, and new tests validate boot image resolution and defaulting behavior.

Changes

GCP Boot Image Resolution and Configurable Defaulting

Layer / File(s) Summary
DefaulterConfig type and resolution contract
pkg/webhooks/machine_webhook.go
Introduces DefaulterConfig{PlatformStatus, ClusterID, GCPBootImage} struct and adds gcpBootImage field to admissionConfig. Creates ResolveDefaulterConfig() function that fetches infra and resolves GCP boot images via platform detection. Updates NewMachineDefaulter signature to accept *DefaulterConfig.
GCP boot image resolution helper
pkg/webhooks/machine_webhook.go
Implements resolveGCPBootImage that lists worker Machines in openshift-machine-api, unmarshals their GCP providerSpec, and returns the first boot disk image matching the current architecture suffix (x86-64 or aarch64), or empty string on error/no match.
GCP disk defaulting with resolved image
pkg/webhooks/machine_webhook.go
Updates defaultGCPDisks to accept gcpBootImage parameter and compute boot disk image from resolved value or fallback to static default with info log. Updates call site in GCP defaulting logic and boot disk defaulting to use computed image when disk.Image is empty.
MachineSet defaulter refactoring
pkg/webhooks/machineset_webhook.go
Changes NewMachineSetDefaulter to accept *DefaulterConfig and return *admission.Webhook directly (removes inline infra fetch and error return). Updates createMachineSetDefaulter to include gcpBootImage in admissionConfig.
Webhook wiring in main.go
cmd/machineset/main.go
Resolves shared defaulter configuration at startup via ResolveDefaulterConfig() and wires both Machine and MachineSet defaulters through the resolved config. Removes local error handling from MachineSet defaulter construction.
Boot image resolution and GCP defaulting tests
pkg/webhooks/machine_webhook_test.go, pkg/webhooks/machineset_webhook_test.go
Adds TestResolveGCPBootImage validating worker-only machine selection and arch filtering. Adds TestGCPDefaultsWithResolvedBootImage and TestGCPDefaultsFallbackToHardcodedImage (both Machine and MachineSet variants) verifying resolved images propagate and fallback occurs. Adds encoding/json import for test unmarshaling. Updates all existing createMachineDefaulter and createMachineSetDefaulter calls across test flows to pass empty string as new gcpBootImage argument.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main objective of the changeset: resolving GCP boot images from worker machines, which is the core feature implemented across the webhook files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 5 new test functions use static, deterministic names (TestResolveGCPBootImage, TestGCPDefaultsWithResolvedBootImage, TestGCPDefaultsFallbackToHardcodedImage, and two MachineSet equivalents) wit...
Test Structure And Quality ✅ Passed The custom check reviews Ginkgo test code quality (BeforeEach/AfterEach, It blocks, Eventually/Consistently), but the modified test files use standard Go testing package with func TestXyz patterns,...
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All test additions are standard Go unit tests (TestFunc pattern with *testing.T), not Ginkgo e2e tests. Check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All new tests are standard Go unit tests (testing.T) in pkg/webhooks/. The check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies webhook defaulting logic only; introduces no pod scheduling constraints (affinity, nodeSelector, PDB, replicas, topology spread). Code gracefully handles SNO/TNF/TNA topologies by falli...
Ote Binary Stdout Contract ✅ Passed PR properly configures all logging to stderr: klog uses flag.Set("logtostderr", "true"), standard log package defaults to stderr, and ResolveDefaulterConfig() has no module-level or process-level s...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Check not applicable: PR adds standard Go unit tests (TestName(t *testing.T)), not Ginkgo e2e tests (It/Describe/Context/When patterns). Check requires Ginkgo patterns to apply.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5/SHA1/DES/RC4/3DES/Blowfish/ECB), custom crypto implementations, or insecure secret comparisons detected in any modified files.
Container-Privileges ✅ Passed PR modifies only Go source files (webhooks and main.go); no K8s manifests with container security configurations are changed.
No-Sensitive-Data-In-Logs ✅ Passed All logging statements in the PR log non-sensitive data: machine resource names, public GCP boot image paths (rhcos-cloud project), and error messages. No credentials, PII, tokens, or confidential...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/webhooks/machine_webhook_test.go`:
- Line 6805: The TestResolveGCPBootImage test cases have hardcoded expected and
mismatch image values that are specific to x86-64 architecture, but the
resolveGCPBootImage function filters results by the arch parameter. Make the
test architecture-aware by parameterizing the expected image assertions based on
the architecture being tested (such as by deriving expected images from the arch
field in each test case) so that assertions pass for both x86-64 and ARM64
architectures. This should be applied to all the test cases mentioned at the
specified line ranges.
- Line 6827: In the machine_webhook_test.go file, the json.Marshal calls at
lines 6827, 6857, 6887, 6911, and 6941 are currently ignoring errors by using
blank identifier (_). Instead of discarding the error, capture it in a variable
and check for non-nil errors. When json.Marshal fails, the test should fail
immediately using the testing.T helper (commonly using t.Fatalf or t.Errorf) to
prevent invalid ProviderSpec payloads from being silently used in the test
fixtures. This ensures test reliability and follows the project guideline of
never ignoring error returns in Go files.

In `@pkg/webhooks/machine_webhook.go`:
- Around line 354-366: The function resolveGCPBootImage currently treats all
List API failures as "no image found" by returning an empty string, which masks
RBAC and API outages. Modify resolveGCPBootImage to return both a string and an
error instead of just a string. When the c.List() call fails in the error
handling block, return the actual error along with an empty string rather than
silently returning an empty string. Reserve the empty string return for
successful list operations that find no matching worker machines. Additionally,
replace context.Background() with a context that supports cancellation and
timeouts, such as one derived from a timeout or cancellation mechanism
appropriate to your codebase standards.
- Around line 378-379: The loop iterating over providerSpec.Disks dereferences
each disk element without checking if it is nil, which will cause a panic if the
slice contains nil elements. Add a nil guard check inside the for loop that
iterates over providerSpec.Disks to ensure each disk is not nil before accessing
its Boot and Image fields.
- Around line 378-380: In the disk iteration loop where you check disk.Boot and
disk.Image, replace the strings.Contains call with strings.HasSuffix to ensure
the architecture suffix matches exactly at the end of the image path rather than
anywhere within it. This prevents accidentally selecting the wrong image variant
if the suffix text appears elsewhere in the image path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4245399b-3c94-4d86-a8ab-a45017c1c659

📥 Commits

Reviewing files that changed from the base of the PR and between d7772c6 and fef4618.

📒 Files selected for processing (5)
  • cmd/machineset/main.go
  • pkg/webhooks/machine_webhook.go
  • pkg/webhooks/machine_webhook_test.go
  • pkg/webhooks/machineset_webhook.go
  • pkg/webhooks/machineset_webhook_test.go

Comment thread pkg/webhooks/machine_webhook_test.go
Comment thread pkg/webhooks/machine_webhook_test.go
Comment thread pkg/webhooks/machine_webhook.go
Comment thread pkg/webhooks/machine_webhook.go
Comment thread pkg/webhooks/machine_webhook.go
Instead of hardcoding a stale RHCOS 4.14 boot disk image for GCP
defaults, resolve the image dynamically from existing worker machines
at webhook startup. Falls back to the hardcoded default when no workers
are found. Filters by architecture (x86-64/aarch64) to avoid selecting
the wrong image on mixed-arch clusters.

Introduces ResolveDefaulterConfig() so both Machine and MachineSet
defaulters share a single client and boot image resolution call.
@nader-ziada

Copy link
Copy Markdown
Author

/retest

@theobarberbany

Copy link
Copy Markdown
Contributor

/pipeline auto

@theobarberbany

Copy link
Copy Markdown
Contributor

/test ?

@theobarberbany

Copy link
Copy Markdown
Contributor

/test e2e-gcp-ovn
/test e2e-gcp-operator

@nader-ziada

Copy link
Copy Markdown
Author

/retest

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@nader-ziada: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@nader-ziada

Copy link
Copy Markdown
Author

@theobarberbany all the tests passed, is there other specific tests to kick off?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants