OTA-1550 Automate accept risks for OTA-1543#2226
OTA-1550 Automate accept risks for OTA-1543#2226JianLi-RH wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a Ginkgo E2E test suite for Changes
Sequence Diagram(s)sequenceDiagram
participant Test
participant OC as "oc CLI"
participant API as "ClusterVersion API"
participant Web as "Upstream Graph (HTTP)"
Test->>OC: run `oc adm upgrade --help` / `oc adm upgrade accept --help`
Test->>Web: fetch faux upstream graph (GetWebResource)
Web-->>Test: return graph JSON
Test->>OC: patch ClusterVersion to point to faux upstream (CLI.Patch)
OC->>API: PATCH /apis/config.openshift.io/v1/clusterversions
API-->>OC: updated ClusterVersion
Test->>OC: run `oc adm upgrade` (polling for available versions)
OC->>API: GET /status for ClusterVersion
API-->>OC: clusterversion status (available/desired versions)
OC-->>Test: upgrade output (recommended / non-recommended)
Test->>OC: `oc adm upgrade --accept-risk` (add/replace/clear)
OC->>API: PATCH .spec.desiredUpdate.acceptRisks
API-->>OC: updated acceptRisks status
OC-->>Test: verify acceptRisks reflected in output/status (polling)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: JianLi-RH The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/uncc @ardaguclu @tchap |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
test/e2e/upgrade.go (4)
229-241: Remove or properly skip the commented-out upgrade code.This commented-out block creates maintenance burden. If the upgrade step is intentionally deferred (e.g., due to long runtime), consider using
g.Skip()with a reason, or a separate test spec with appropriate labels. If it's obsolete, remove it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 229 - 241, The commented-out upgrade block inside the test that uses g.By("Upgrade to not recommend version") should be removed or explicitly skipped; replace the commented code with a clear g.Skip(...) call (with a concise reason) in the same test/spec or move the upgrade logic into a separate test spec annotated/labeled for long-running E2E upgrades (so it’s not hidden in comments). Locate the block around the g.By("Upgrade to not recommend version") call and either delete the commented lines or add a g.Skip invocation there, or extract the logic into a new It/Context with appropriate labels for deferred/long-running upgrade tests.
52-52: Use camelCase for local variables per Go conventions.Variables like
new_channel,recommend_version,notrecommend_version_1use snake_case. Go convention is camelCase (e.g.,newChannel,recommendVersion,notRecommendVersion1).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` at line 52, The variables in test/e2e/upgrade.go use snake_case; rename them to camelCase to follow Go conventions: change new_channel to newChannel, recommend_version to recommendVersion, notrecommend_version_1 to notRecommendVersion1 (and any similar snake_case variables) and update every declaration and all references/usages (assignments, function calls, comparisons) to the new names so the code compiles and behavior is unchanged.
15-27: Remove misplacedGinkgoRecoverand empty hooks.
defer g.GinkgoRecover()at theDescribelevel has no effect—it only runs once when the suite is set up, not during test execution. Additionally, emptyBeforeEach/AfterEachblocks add no value.♻️ Suggested cleanup
var _ = g.Describe(`[Jira:"oc adm upgrade"] accept`, func() { - - defer g.GinkgoRecover() - var ( oc = NewCLI("oc", KubeConfigPath()) ) - - g.BeforeEach(func() { - }) - - g.AfterEach(func() { - }) g.It("Accepted Risks for OCP Cluster Updates", g.Label("TechPreview", "88175", "Slow", "manual"), func() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 15 - 27, Remove the misplaced defer g.GinkgoRecover() call and the empty g.BeforeEach / g.AfterEach hooks from the Describe block: the GinkgoRecover defer at the Describe level (in the Describe function that declares oc and calls g.GinkgoRecover) should be deleted and any empty BeforeEach/AfterEach functions removed to clean up the test; keep the Describe, the oc variable declaration (NewCLI("oc", KubeConfigPath())), and any real setup/teardown only if needed.
125-137: Migrate from deprecatedwait.Polltowait.PollUntilContextTimeout.
wait.Pollis deprecated in k8s.io/apimachinery and will be removed in a future Kubernetes release. The context-basedwait.PollUntilContextTimeoutis the recommended replacement, providing better cancellation support. This pattern appears at lines 125, 144, 168, and 207.♻️ Example migration
+ ctx := context.Background() - err = wait.Poll(10*time.Second, 10*time.Minute, func() (bool, error) { + err = wait.PollUntilContextTimeout(ctx, 10*time.Second, 10*time.Minute, true, func(ctx context.Context) (bool, error) { out, err = oc.Run("adm").Args("upgrade").Output()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 125 - 137, The test uses the deprecated wait.Poll; replace calls to wait.Poll with context-aware wait.PollUntilContextTimeout by creating a context with timeout (matching the original 10*time.Minute) and passing a cancelable context into wait.PollUntilContextTimeout, update the polling callback signature to accept the context and return (bool, error) as before, and call cancel when done; update the specific usage here (the anonymous function passed to wait.Poll, the variable names out/err and the surrounding AssertWaitPollNoErr call) so the behavior and retry interval (10*time.Second) remain the same; repeat the same migration at the other occurrences noted (lines referencing wait.Poll at the other locations) to remove the deprecated API usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/upgrade.go`:
- Around line 71-77: The chained type assertions retrieving version strings
(e.g., recommend_version, notrecommend_version_1/2/3) are unsafe and may panic;
update the extraction to check types and presence before casting: after
confirming nodes from result, for each index 1–4 fetch nodes[i], assert it's a
map[string]interface{} (with an ok check), assert the "version" field exists and
is a string (with an ok check), and use o.Expect to fail tests with clear
messages if any check fails; reference the variables recommend_version and
notrecommend_version_1/2/3 and replace the current direct assertions with
guarded checks that provide descriptive errors instead of allowing panics.
- Around line 36-54: GetVersion(oc) currently discards its error so the later
o.Expect(err).NotTo(o.HaveOccurred()) is checking an uninitialized err; change
the call to capture and check the error (use clusterVersion, err :=
GetVersion(oc) and immediately o.Expect(err).NotTo(o.HaveOccurred())), and then
ensure any subsequent o.Expect(err) assertions correspond to the most recent
oc.Run/... call that returned err (or remove the stray o.Expect(err) if it was
meant to assert an earlier, already-checked call); reference symbols:
GetVersion, clusterVersion, err, and the
oc.AsAdmin().WithoutNamespace().Run("get")... calls.
In `@test/e2e/util.go`:
- Around line 1291-1300: GetVersion may panic by indexing version[0..2] when
fullVersion isn't three dot-separated parts; update the function (GetVersion) to
validate the split result before accessing indices: check len(version) and if
fewer than 3 parts either return a descriptive error or populate ClusterVersion
with FullVersion and empty Major/Minor/Patch (or default values) to avoid index
out of range; ensure the returned ClusterVersion always reflects FullVersion and
does not perform unsafe indexing on the version slice.
- Around line 409-427: In GetWebResource, the function currently returns nil,
err when resp.StatusCode != http.StatusOK but err is nil, causing callers to see
(nil, nil); change this to return an explicit error (e.g., fmt.Errorf including
resp.StatusCode and http.StatusText) so non-200 responses are reported; locate
the status-check in GetWebResource and replace the nil error return with a
constructed error (optionally include response body or URL for more context)
while keeping resp.Body.Close and subsequent io.ReadAll logic intact.
---
Nitpick comments:
In `@test/e2e/upgrade.go`:
- Around line 229-241: The commented-out upgrade block inside the test that uses
g.By("Upgrade to not recommend version") should be removed or explicitly
skipped; replace the commented code with a clear g.Skip(...) call (with a
concise reason) in the same test/spec or move the upgrade logic into a separate
test spec annotated/labeled for long-running E2E upgrades (so it’s not hidden in
comments). Locate the block around the g.By("Upgrade to not recommend version")
call and either delete the commented lines or add a g.Skip invocation there, or
extract the logic into a new It/Context with appropriate labels for
deferred/long-running upgrade tests.
- Line 52: The variables in test/e2e/upgrade.go use snake_case; rename them to
camelCase to follow Go conventions: change new_channel to newChannel,
recommend_version to recommendVersion, notrecommend_version_1 to
notRecommendVersion1 (and any similar snake_case variables) and update every
declaration and all references/usages (assignments, function calls, comparisons)
to the new names so the code compiles and behavior is unchanged.
- Around line 15-27: Remove the misplaced defer g.GinkgoRecover() call and the
empty g.BeforeEach / g.AfterEach hooks from the Describe block: the
GinkgoRecover defer at the Describe level (in the Describe function that
declares oc and calls g.GinkgoRecover) should be deleted and any empty
BeforeEach/AfterEach functions removed to clean up the test; keep the Describe,
the oc variable declaration (NewCLI("oc", KubeConfigPath())), and any real
setup/teardown only if needed.
- Around line 125-137: The test uses the deprecated wait.Poll; replace calls to
wait.Poll with context-aware wait.PollUntilContextTimeout by creating a context
with timeout (matching the original 10*time.Minute) and passing a cancelable
context into wait.PollUntilContextTimeout, update the polling callback signature
to accept the context and return (bool, error) as before, and call cancel when
done; update the specific usage here (the anonymous function passed to
wait.Poll, the variable names out/err and the surrounding AssertWaitPollNoErr
call) so the behavior and retry interval (10*time.Second) remain the same;
repeat the same migration at the other occurrences noted (lines referencing
wait.Poll at the other locations) to remove the deprecated API usage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9dc82d2f-d56e-48de-a469-7d566a8383f4
📒 Files selected for processing (2)
test/e2e/upgrade.gotest/e2e/util.go
57d292b to
1be2183
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
test/e2e/upgrade.go (1)
36-39:⚠️ Potential issue | 🟠 MajorHandle the
GetVersionerror before usingclusterVersion.Line 36 still discards the error, so
clusterVersion.FullVersioncan be empty and later assertions point at the graph fetch instead of the real failure.🐛 Proposed fix
- clusterVersion, _ := GetVersion(oc) + clusterVersion, err := GetVersion(oc) + o.Expect(err).NotTo(o.HaveOccurred()) g.By("checking the help info of `oc adm upgrade --help`") out, err := oc.Run("adm").Args("upgrade", "--help").Output()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 36 - 39, GetVersion's error is being ignored; change the code that calls GetVersion (the call assigning clusterVersion, _ := GetVersion(oc)) to capture and check the error (e.g., clusterVersion, err := GetVersion(oc) and fail the test or return if err != nil) before using clusterVersion.FullVersion; ensure any subsequent assertions (and the oc.Run("adm").Args("upgrade", "--help") block) only run after verifying GetVersion succeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/upgrade.go`:
- Around line 63-64: The graph query currently hardcodes "arch=amd64" when
building fullGraph (using variables graph, new_channel,
clusterVersion.FullVersion and then calling GetWebResource), which breaks on
non-amd64 clusters; change the code to determine the cluster architecture at
runtime (e.g., read the cluster's arch from clusterVersion or the cluster API)
and substitute that value into the query string, or if the test does not support
the cluster's arch, explicitly skip the test for unsupported arches instead of
forcing amd64. Ensure fullGraph uses the derived arch value (or the skip) before
calling GetWebResource.
- Around line 117-120: Capture the original .spec.desiredUpdate.acceptRisks
value at test start (before any oc.Run("adm").Args("upgrade"... mutations) and
store it in a variable; then explicitly set a known starting state (e.g., empty
list) before exercising the accept-risks flows (the blocks that call
oc.Run("adm").Args("upgrade","accept","--clear") and the related --replace tests
between the current region and lines 122-205) so the test is hermetic; finally,
in the test cleanup/teardown restore the saved original value back to
.spec.desiredUpdate.acceptRisks to avoid leaking this test's changes to later
tests. Ensure you reference the same oc.Run invocations (oc, out, err) and the
upgrade accept/replace flows when adding the snapshot/normalize/restore steps.
- Around line 69-82: The code currently indexes nodes with fixed positions
(extractVersion(1), extractVersion(2), extractVersion(3), extractVersion(4))
which will panic if fewer than five nodes are returned; before calling
extractVersion, assert the slice length (nodes) is at least 5 using
o.Expect(len(nodes) >= 5). Alternatively, avoid positional indexing by finding
nodes by a stable property (e.g., id/version/role) and call extractVersion on
the found index(s) instead; update recommendVersion/notRecommendVersion*
assignments to use the validated indexes or the discovered indexes from the
property-based lookup.
- Around line 84-94: The code unconditionally issues a JSON Patch remove using
the string output stored in overrides which will fail if /spec/overrides doesn't
exist and will not preserve structured JSON for re-adding; change the logic to
GET the clusterversion as JSON (use
oc.AsAdmin().WithoutNamespace().Run("get").Args("clusterversion","version","-o","json").Output())
and unmarshal to check for the presence of .spec.overrides, if present marshal
that node as JSON (or keep it as an interface{} value) then perform the Patch
remove (using JSONp{"remove","/spec/overrides",nil}) and only install the defer
add patch when you actually removed it, using the previously captured structured
JSON value for the add operation (refer to the overrides variable, the Patch
calls, and the defer block).
In `@test/e2e/util.go`:
- Around line 410-413: GetWebResource currently calls http.Get which uses the
default client with no timeout; create and use an http.Client with a sensible
Timeout (e.g., 5–15s) and call client.Get instead inside GetWebResource so the
request will fail fast if the external endpoint stalls, then continue to handle
resp and err as before. Ensure you replace the http.Get call in GetWebResource
with client.Get using the new client and keep existing response-body
handling/closing logic.
---
Duplicate comments:
In `@test/e2e/upgrade.go`:
- Around line 36-39: GetVersion's error is being ignored; change the code that
calls GetVersion (the call assigning clusterVersion, _ := GetVersion(oc)) to
capture and check the error (e.g., clusterVersion, err := GetVersion(oc) and
fail the test or return if err != nil) before using clusterVersion.FullVersion;
ensure any subsequent assertions (and the oc.Run("adm").Args("upgrade",
"--help") block) only run after verifying GetVersion succeeded.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ce413192-fdcd-4d3e-9789-496857174b55
📒 Files selected for processing (2)
test/e2e/upgrade.gotest/e2e/util.go
test/e2e/upgrade.go
Outdated
| fullGraph := fmt.Sprintf("%s?channel=%s&version=%s&arch=amd64", graph, new_channel, clusterVersion.FullVersion) | ||
| graphData, err := GetWebResource(fullGraph) |
There was a problem hiding this comment.
Don't hardcode arch=amd64 in the graph query.
This spec only skips on TechPreview. On arm64/s390x/ppc64le clusters, querying fauxinnati as amd64 can return the wrong graph or no graph at all. Please derive the arch from the cluster here, or explicitly skip unsupported multi-arch cases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/upgrade.go` around lines 63 - 64, The graph query currently
hardcodes "arch=amd64" when building fullGraph (using variables graph,
new_channel, clusterVersion.FullVersion and then calling GetWebResource), which
breaks on non-amd64 clusters; change the code to determine the cluster
architecture at runtime (e.g., read the cluster's arch from clusterVersion or
the cluster API) and substitute that value into the query string, or if the test
does not support the cluster's arch, explicitly skip the test for unsupported
arches instead of forcing amd64. Ensure fullGraph uses the derived arch value
(or the skip) before calling GetWebResource.
test/e2e/upgrade.go
Outdated
| nodes, ok := result["nodes"].([]interface{}) | ||
| o.Expect(ok).To(o.BeTrue(), "get nodes failed") | ||
|
|
||
| extractVersion := func(index int) string { | ||
| node, ok := nodes[index].(map[string]interface{}) | ||
| o.Expect(ok).To(o.BeTrue(), "node %d is not a map", index) | ||
| version, ok := node["version"].(string) | ||
| o.Expect(ok).To(o.BeTrue(), "node %d missing version string", index) | ||
| return version | ||
| } | ||
| recommendVersion := extractVersion(1) | ||
| notRecommendVersion1 := extractVersion(2) | ||
| notRecommendVersion2 := extractVersion(3) | ||
| notRecommendVersion3 := extractVersion(4) |
There was a problem hiding this comment.
Guard the fixed node indexes before calling extractVersion.
extractVersion(4) will panic if the graph ever returns fewer than five nodes. At minimum, assert the length before indexing; ideally, select the target nodes by property instead of position.
🛡️ Minimal fix
nodes, ok := result["nodes"].([]interface{})
o.Expect(ok).To(o.BeTrue(), "get nodes failed")
+ o.Expect(len(nodes)).To(o.BeNumerically(">=", 5), "expected at least 5 nodes in the graph")
extractVersion := func(index int) string {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/upgrade.go` around lines 69 - 82, The code currently indexes nodes
with fixed positions (extractVersion(1), extractVersion(2), extractVersion(3),
extractVersion(4)) which will panic if fewer than five nodes are returned;
before calling extractVersion, assert the slice length (nodes) is at least 5
using o.Expect(len(nodes) >= 5). Alternatively, avoid positional indexing by
finding nodes by a stable property (e.g., id/version/role) and call
extractVersion on the found index(s) instead; update
recommendVersion/notRecommendVersion* assignments to use the validated indexes
or the discovered indexes from the property-based lookup.
test/e2e/upgrade.go
Outdated
| g.By("clear risks when the accept risk list is empty") | ||
| out, err = oc.Run("adm").Args("upgrade", "accept", "--clear").Output() | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| o.Expect(out).To(o.ContainSubstring("info: Accept risks are not changed")) |
There was a problem hiding this comment.
Make the accept-risks flow hermetic from the start.
Lines 117-120 assume .spec.desiredUpdate.acceptRisks is already empty, and the final --replace leaves the cluster modified even on success. Snapshot the original value before the first mutation, normalize to a known starting state, and restore it in cleanup so later specs do not inherit this test's upgrade policy.
Also applies to: 122-205
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/upgrade.go` around lines 117 - 120, Capture the original
.spec.desiredUpdate.acceptRisks value at test start (before any
oc.Run("adm").Args("upgrade"... mutations) and store it in a variable; then
explicitly set a known starting state (e.g., empty list) before exercising the
accept-risks flows (the blocks that call
oc.Run("adm").Args("upgrade","accept","--clear") and the related --replace tests
between the current region and lines 122-205) so the test is hermetic; finally,
in the test cleanup/teardown restore the saved original value back to
.spec.desiredUpdate.acceptRisks to avoid leaking this test's changes to later
tests. Ensure you reference the same oc.Run invocations (oc, out, err) and the
upgrade accept/replace flows when adding the snapshot/normalize/restore steps.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
test/e2e/util.go (1)
409-427:⚠️ Potential issue | 🟠 MajorAdd a deadline to this HTTP fetch.
http.Getuses the default client with no timeout. If the external upgrade-graph endpoint stalls, the test will hang indefinitely until the outer test timeout fires.⏱️ Proposed fix
func GetWebResource(url string) ([]byte, error) { e2e.Logf("Getting web resource: %s", url) - resp, err := http.Get(url) + client := &http.Client{Timeout: 30 * time.Second} + resp, err := client.Get(url) if err != nil { return nil, err },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util.go` around lines 409 - 427, GetWebResource uses http.Get with no timeout which can hang; change it to perform the request with a deadline by creating an http.Client with a reasonable Timeout (or use context.WithTimeout + http.NewRequestWithContext) and use that client to do the request inside GetWebResource; ensure you still close resp.Body and return bodyBytes/error unchanged (refer to function GetWebResource and variables resp/bodyBytes) and pick a sensible timeout (e.g. several seconds or configurable) so the test won't stall.test/e2e/upgrade.go (4)
70-83:⚠️ Potential issue | 🟠 MajorGuard the fixed node indexes before calling
extractVersion.
extractVersion(4)will panic if the graph returns fewer than five nodes. Add a length assertion before indexing.🛡️ Minimal fix
nodes, ok := result["nodes"].([]interface{}) o.Expect(ok).To(o.BeTrue(), "get nodes failed") + o.Expect(len(nodes)).To(o.BeNumerically(">=", 5), "expected at least 5 nodes in the graph, got %d", len(nodes)) extractVersion := func(index int) string {,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 70 - 83, The code calls extractVersion with fixed indexes (1..4) without ensuring nodes has at least five elements, which can panic; before calling extractVersion, assert the slice length (e.g., check len(nodes) >= 5) using the existing test helper (o.Expect) or similar guard, and fail the test with a clear message if there are too few nodes so extractVersion(1), extractVersion(2), extractVersion(3), and extractVersion(4) are only invoked when nodes contains the required indexes.
118-120:⚠️ Potential issue | 🟠 MajorMake the accept-risks flow hermetic.
The test assumes
.spec.desiredUpdate.acceptRisksis empty at the start, but doesn't verify or restore this state. If a previous test or cluster state left risks configured, the test may fail spuriously or leave the cluster modified after completion.Capture the original
acceptRisksvalue at test start and restore it in cleanup (e.g., viadefer).,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 118 - 120, Before calling the adm upgrade accept --clear flow, read and save the current .spec.desiredUpdate.acceptRisks for the ClusterVersion resource (e.g., by using the existing oc client used in this test file) and install a defer to restore that saved acceptRisks value at cleanup; specifically, just prior to the adm command invocation around the oc.Run("adm").Args("upgrade","accept","--clear") call capture the current acceptRisks, and in a deferred function issue the corresponding patch/update (using the oc client) to set .spec.desiredUpdate.acceptRisks back to the original value so the test is hermetic and leaves cluster state unchanged.
63-64:⚠️ Potential issue | 🟠 MajorDon't hardcode
arch=amd64in the graph query.This spec only skips on TechPreview. On arm64/s390x/ppc64le clusters, querying fauxinnati as amd64 can return the wrong graph or no graph at all. Please derive the arch from the cluster here, or explicitly skip unsupported multi-arch cases.
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 63 - 64, The query string hardcodes "arch=amd64" when building fullGraph which fails on non-amd64 clusters; change the code that constructs fullGraph (using variables graph, new_channel, clusterVersion.FullVersion) to derive the architecture from the test cluster (e.g., read the cluster/node architecture field you already have accessible in this test—use the clusterVersion or node status/architecture value instead of the literal "amd64") or, if multi-arch is unsupported here, add an explicit skip for unsupported architectures before constructing fullGraph; update the format string to use the derived arch variable rather than the hardcoded value.
85-95:⚠️ Potential issue | 🟠 MajorUnconditional remove of
/spec/overrideswill fail when the field doesn't exist.The comment says "if exists" but the code unconditionally executes a JSON Patch "remove" operation. Per RFC 6902, "remove" fails if the target path doesn't exist. Additionally, the jsonpath output (a string with quotes) used in the "add" restore won't faithfully recreate the original structure.
Check whether
/spec/overridesexists before attempting removal, and capture it as structured JSON for proper restoration.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 85 - 95, The current code always issues a JSON Patch "remove" on clusterversion/version /spec/overrides using the string output in overrides, causing errors when the path is absent and preventing faithful restore; change the logic to first GET clusterversion/version -o json (via oc.AsAdmin().WithoutNamespace().Run(...).Args(...).Output()), parse the returned JSON to check if spec.overrides exists, only call Patch(..., []JSONp{{"remove","/spec/overrides",nil}}) when the field is present, and when saving for restore store the actual parsed JSON value (not the jsonpath string) and use that structured value in the deferred Patch add operation (JSONp add with the parsed overrides) so removal is conditional and restoration preserves structure.
🧹 Nitpick comments (6)
test/e2e/upgrade.go (4)
23-27: Remove empty lifecycle hooks.The
BeforeEachandAfterEachblocks are empty and serve no purpose. Remove them to reduce noise.🧹 Proposed fix
- g.BeforeEach(func() { - }) - - g.AfterEach(func() { - }) - g.It("Accepted Risks for OCP Cluster Updates", g.Label("TechPreview", "88175", "Slow", "manual"), func() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 23 - 27, Remove the no-op test lifecycle hooks: delete the empty BeforeEach and AfterEach blocks (the calls to g.BeforeEach(func() {}) and g.AfterEach(func() {})) from the test setup so there are no empty hooks left; ensure no other setup/teardown logic is accidentally removed and that the test suite still compiles after removing these functions.
235-246: Commented-out test code should be removed or tracked.The "Upgrade to not recommend version" step is fully commented out. If this is intentional and deferred work, consider adding a TODO with a tracking issue reference. Otherwise, remove dead code.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 235 - 246, The block under the test step g.By("Upgrade to not recommend version") contains fully commented-out code (calls like oc.Run("adm").Args("upgrade", "--to", notRecommendVersion3) and the wait.Poll/assert logic) and should be cleaned up: either delete the dead commented code or replace it with a single-line TODO that references a tracking issue/PR and explains why the upgrade test is disabled (include the variables notRecommendVersion3/notRecommendVersion1 in the TODO for clarity); ensure the surrounding test flow (the g.By call) is updated accordingly so no misleading empty step remains.
17-17: Incorrect usage ofg.GinkgoRecover().
GinkgoRecover()is intended to be called inside goroutines spawned during test execution to catch panics and report them as test failures. Placingdefer g.GinkgoRecover()at theDescribeblock level has no effect since the main test flow doesn't need recovery in this manner.Remove this line.
🔧 Proposed fix
var _ = g.Describe(`[Jira:"oc adm upgrade"] accept`, func() { - - defer g.GinkgoRecover() var ( oc = NewCLI("oc", KubeConfigPath())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` at line 17, Remove the top-level defer g.GinkgoRecover() call in test/e2e/upgrade.go — GinkgoRecover should only be deferred inside goroutines to convert panics into test failures, so delete the lone "defer g.GinkgoRecover()" (found in the Describe/test setup) and, if you have any goroutines in this file, wrap their entry functions with a deferred g.GinkgoRecover() inside the goroutine bodies instead.
131-142:wait.Pollis deprecated; usewait.PollUntilContextTimeoutinstead.
wait.Pollhas been deprecated in favor of context-aware variants. This applies to all polling loops in this file (lines 131, 150, 174, 213).♻️ Example refactor for the first polling loop
- err = wait.Poll(10*time.Second, 10*time.Minute, func() (bool, error) { + err = wait.PollUntilContextTimeout(context.Background(), 10*time.Second, 10*time.Minute, true, func(ctx context.Context) (bool, error) { out, err = oc.Run("adm").Args("upgrade").Output() if err != nil { e2e.Logf("get oc adm upgrade output failed. Trying again") return false, nil } if !strings.Contains(out, notRecommendVersion1) { e2e.Logf("non-recommend version %s is not available. Trying again", notRecommendVersion1) return false, nil } return true, nil })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 131 - 142, Replace deprecated wait.Poll usage with the context-aware wait.PollUntilContextTimeout: change the call to wait.PollUntilContextTimeout(ctx, 10*time.Second, 10*time.Minute, func(ctx context.Context) (bool, error) { ... }) (or use context.Background()/context.TODO() if no test context is available), update the callback signature to accept context.Context, keep the same logic calling oc.Run("adm").Args("upgrade").Output() and checking strings.Contains(out, notRecommendVersion1), and add the "context" import; apply the same replacement for the other polling loops (the ones around lines 150, 174, 213) to use wait.PollUntilContextTimeout consistently.test/e2e/util.go (2)
41-46: Consider more descriptive field names forJSONp.The field names
Oper,Path,Valuare abbreviated inconsistently. While the JSON serialization uses standard JSON Patch field names (op,path,value), the Go field names could be clearer (e.g.,OporOperation,Value).This is a minor readability nit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util.go` around lines 41 - 46, The JSONp struct uses abbreviated Go field names (Oper, Path, Valu) which hurts readability; rename the struct fields to clearer identifiers (e.g., Op or Operation, Path (keep or PathName), and Value) while keeping the json tags unchanged (`json:"op"`, `json:"path"`, `json:"value,omitempty"`) so external JSON stays the same; update any usages of JSONp, JSONp.Oper, JSONp.Path, and JSONp.Valu throughout tests to the new field names to ensure compilation.
1025-1031: Usingo.Expectin a utility function can cause panics outside Ginkgo context.
isTechPreviewuseso.Expect(err).NotTo(o.HaveOccurred())which will panic if called from code that isn't running inside a Ginkgo test. Consider returning an error or a tuple(bool, error)to make this function safer and more reusable.♻️ Proposed safer signature
-func isTechPreview(oc *CLI) bool { +func isTechPreview(oc *CLI) (bool, error) { techPreview, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("clusterversion", "version", "-o=jsonpath='{.status.conditions[?(@.type==\"Upgradeable\")].message}'").Output() - o.Expect(err).NotTo(o.HaveOccurred()) + if err != nil { + return false, err + } e2e.Logf("Upgradeable: %v\n", techPreview) - return strings.Contains(techPreview, "TechPreviewNoUpgrade") + return strings.Contains(techPreview, "TechPreviewNoUpgrade"), nil }Then update callers (in
upgrade.go) to handle the error:isTech, err := isTechPreview(oc) o.Expect(err).NotTo(o.HaveOccurred()) if !isTech { g.Skip("Skipping test: only tech-preview clusters supported") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util.go` around lines 1025 - 1031, Change isTechPreview to return (bool, error) instead of asserting with o.Expect: detect the cluster by running oc.AsAdmin().WithoutNamespace().Run("get")...Output(), then return false with the encountered error if err != nil, otherwise return strings.Contains(...), nil; update all callers (e.g., in upgrade.go) to check the error (isTech, err := isTechPreview(oc); o.Expect(err).NotTo(o.HaveOccurred())) and then use isTech to decide whether to skip (g.Skip) or continue, removing any direct o.Expect calls from inside isTechPreview.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/upgrade.go`:
- Around line 50-51: The code discards errors when fetching original cluster
values into oldUpstream and oldChannel via
oc.AsAdmin().WithoutNamespace().Run("get").Args(...).Output(), which can lead
the deferred restoration to overwrite real config with empty strings; update
both calls to capture and check the returned error (e.g., errUpstream,
errChannel) and if either is non-nil, fail the test or return the error (use
t.Fatalf or return) instead of proceeding, so the deferred restoration block
that writes oldUpstream/oldChannel back only runs when those values were
successfully read.
---
Duplicate comments:
In `@test/e2e/upgrade.go`:
- Around line 70-83: The code calls extractVersion with fixed indexes (1..4)
without ensuring nodes has at least five elements, which can panic; before
calling extractVersion, assert the slice length (e.g., check len(nodes) >= 5)
using the existing test helper (o.Expect) or similar guard, and fail the test
with a clear message if there are too few nodes so extractVersion(1),
extractVersion(2), extractVersion(3), and extractVersion(4) are only invoked
when nodes contains the required indexes.
- Around line 118-120: Before calling the adm upgrade accept --clear flow, read
and save the current .spec.desiredUpdate.acceptRisks for the ClusterVersion
resource (e.g., by using the existing oc client used in this test file) and
install a defer to restore that saved acceptRisks value at cleanup;
specifically, just prior to the adm command invocation around the
oc.Run("adm").Args("upgrade","accept","--clear") call capture the current
acceptRisks, and in a deferred function issue the corresponding patch/update
(using the oc client) to set .spec.desiredUpdate.acceptRisks back to the
original value so the test is hermetic and leaves cluster state unchanged.
- Around line 63-64: The query string hardcodes "arch=amd64" when building
fullGraph which fails on non-amd64 clusters; change the code that constructs
fullGraph (using variables graph, new_channel, clusterVersion.FullVersion) to
derive the architecture from the test cluster (e.g., read the cluster/node
architecture field you already have accessible in this test—use the
clusterVersion or node status/architecture value instead of the literal "amd64")
or, if multi-arch is unsupported here, add an explicit skip for unsupported
architectures before constructing fullGraph; update the format string to use the
derived arch variable rather than the hardcoded value.
- Around line 85-95: The current code always issues a JSON Patch "remove" on
clusterversion/version /spec/overrides using the string output in overrides,
causing errors when the path is absent and preventing faithful restore; change
the logic to first GET clusterversion/version -o json (via
oc.AsAdmin().WithoutNamespace().Run(...).Args(...).Output()), parse the returned
JSON to check if spec.overrides exists, only call Patch(...,
[]JSONp{{"remove","/spec/overrides",nil}}) when the field is present, and when
saving for restore store the actual parsed JSON value (not the jsonpath string)
and use that structured value in the deferred Patch add operation (JSONp add
with the parsed overrides) so removal is conditional and restoration preserves
structure.
In `@test/e2e/util.go`:
- Around line 409-427: GetWebResource uses http.Get with no timeout which can
hang; change it to perform the request with a deadline by creating an
http.Client with a reasonable Timeout (or use context.WithTimeout +
http.NewRequestWithContext) and use that client to do the request inside
GetWebResource; ensure you still close resp.Body and return bodyBytes/error
unchanged (refer to function GetWebResource and variables resp/bodyBytes) and
pick a sensible timeout (e.g. several seconds or configurable) so the test won't
stall.
---
Nitpick comments:
In `@test/e2e/upgrade.go`:
- Around line 23-27: Remove the no-op test lifecycle hooks: delete the empty
BeforeEach and AfterEach blocks (the calls to g.BeforeEach(func() {}) and
g.AfterEach(func() {})) from the test setup so there are no empty hooks left;
ensure no other setup/teardown logic is accidentally removed and that the test
suite still compiles after removing these functions.
- Around line 235-246: The block under the test step g.By("Upgrade to not
recommend version") contains fully commented-out code (calls like
oc.Run("adm").Args("upgrade", "--to", notRecommendVersion3) and the
wait.Poll/assert logic) and should be cleaned up: either delete the dead
commented code or replace it with a single-line TODO that references a tracking
issue/PR and explains why the upgrade test is disabled (include the variables
notRecommendVersion3/notRecommendVersion1 in the TODO for clarity); ensure the
surrounding test flow (the g.By call) is updated accordingly so no misleading
empty step remains.
- Line 17: Remove the top-level defer g.GinkgoRecover() call in
test/e2e/upgrade.go — GinkgoRecover should only be deferred inside goroutines to
convert panics into test failures, so delete the lone "defer g.GinkgoRecover()"
(found in the Describe/test setup) and, if you have any goroutines in this file,
wrap their entry functions with a deferred g.GinkgoRecover() inside the
goroutine bodies instead.
- Around line 131-142: Replace deprecated wait.Poll usage with the context-aware
wait.PollUntilContextTimeout: change the call to
wait.PollUntilContextTimeout(ctx, 10*time.Second, 10*time.Minute, func(ctx
context.Context) (bool, error) { ... }) (or use
context.Background()/context.TODO() if no test context is available), update the
callback signature to accept context.Context, keep the same logic calling
oc.Run("adm").Args("upgrade").Output() and checking strings.Contains(out,
notRecommendVersion1), and add the "context" import; apply the same replacement
for the other polling loops (the ones around lines 150, 174, 213) to use
wait.PollUntilContextTimeout consistently.
In `@test/e2e/util.go`:
- Around line 41-46: The JSONp struct uses abbreviated Go field names (Oper,
Path, Valu) which hurts readability; rename the struct fields to clearer
identifiers (e.g., Op or Operation, Path (keep or PathName), and Value) while
keeping the json tags unchanged (`json:"op"`, `json:"path"`,
`json:"value,omitempty"`) so external JSON stays the same; update any usages of
JSONp, JSONp.Oper, JSONp.Path, and JSONp.Valu throughout tests to the new field
names to ensure compilation.
- Around line 1025-1031: Change isTechPreview to return (bool, error) instead of
asserting with o.Expect: detect the cluster by running
oc.AsAdmin().WithoutNamespace().Run("get")...Output(), then return false with
the encountered error if err != nil, otherwise return strings.Contains(...),
nil; update all callers (e.g., in upgrade.go) to check the error (isTech, err :=
isTechPreview(oc); o.Expect(err).NotTo(o.HaveOccurred())) and then use isTech to
decide whether to skip (g.Skip) or continue, removing any direct o.Expect calls
from inside isTechPreview.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9b489c72-c85e-40a8-8d24-dbdfcbbad04c
📒 Files selected for processing (2)
test/e2e/upgrade.gotest/e2e/util.go
ee76ab8 to
834aa52
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
test/e2e/upgrade.go (3)
70-70: Use snake_case for local variable naming consistency.Go convention prefers
camelCasefor local variables.new_channelshould benewChannelfor consistency with the rest of the codebase (e.g.,oldUpstream,oldChannel).✏️ Suggested rename
graph := "https://fauxinnati-fauxinnati.apps.ota-stage.q2z4.p1.openshiftapps.com/api/upgrades_info/graph" - new_channel := "OCP-88175" + newChannel := "OCP-88175" _, err = oc.AsAdmin().WithoutNamespace().Patch("", "clusterversion/version", []JSONp{ {"add", "/spec/upstream", graph}, - {"add", "/spec/channel", new_channel}, + {"add", "/spec/channel", newChannel}, })Also update references at lines 81 and any other occurrences.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` at line 70, Rename the local variable new_channel to newChannel to follow Go naming conventions and match surrounding variables like oldUpstream and oldChannel; update every reference to new_channel (including the usage near the upgrade test where it's passed/compared/assigned) to use newChannel so identifiers remain consistent and compile.
15-27: Remove unnecessary constructs:defer GinkgoRecover()and empty hooks.
defer g.GinkgoRecover()at line 17 is typically used inside goroutines to catch panics; at theDescribelevel it serves no purpose since Ginkgo already handles panics in specs.- The empty
BeforeEachandAfterEachhooks (lines 23-27) add noise without functionality.♻️ Proposed cleanup
var _ = g.Describe(`[Jira:"oc adm upgrade"] accept`, func() { - - defer g.GinkgoRecover() - var ( oc = NewCLI("oc", KubeConfigPath()) ) - - g.BeforeEach(func() { - }) - - g.AfterEach(func() { - }) g.It("Accepted Risks for OCP Cluster Updates", g.Label("TechPreview", "88175", "Slow", "manual"), func() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 15 - 27, Remove the unnecessary defer g.GinkgoRecover() call and the empty g.BeforeEach and g.AfterEach hooks from the g.Describe block; locate the g.Describe(`[Jira:"oc adm upgrade"] accept`, func() { ... }) declaration and delete the g.GinkgoRecover() line and the empty BeforeEach/AfterEach functions (they add no behavior), leaving only the variable declarations (e.g., oc := NewCLI("oc", KubeConfigPath())) and any real test code.
240-251: Remove or address the commented-out upgrade block.Large blocks of commented-out code reduce readability. If this functionality is planned for future implementation, replace it with a
TODOcomment or track it in the issue tracker. Otherwise, remove it.♻️ Suggested cleanup
g.By("Upgrade to not recommend version") - // out, err = oc.Run("adm").Args("upgrade", "--to", notRecommendVersion3).Output() - // o.Expect(err).NotTo(o.HaveOccurred()) - // expectedStatus := fmt.Sprintf("Cluster version is %s", notRecommendVersion3) - // err = wait.Poll(30*time.Second, 90*time.Minute, func() (bool, error) { - // out, _ = oc.Run("get").Args("clusterversion").Output() - // if !strings.Contains(out, expectedStatus) { - // return false, nil - // } - // return true, nil - // }) - // AssertWaitPollNoErr(err, fmt.Sprintf("upgrade to version %s failed", notRecommendVersion1)) + // TODO(OTA-1543): Implement actual upgrade verification once runtime allows longer test execution + e2e.Logf("Skipping actual upgrade to %s - requires extended runtime", notRecommendVersion3) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 240 - 251, Remove the large commented-out upgrade block (the lines using oc.Run("adm").Args("upgrade", "--to", notRecommendVersion3) and subsequent wait.Poll/AssertWaitPollNoErr logic) or replace it with a single-line TODO referencing a tracking issue; if keeping for future work, add a TODO comment indicating the issue/PR number and why it's disabled and keep only a minimal one-line note instead of the entire commented code, and ensure any referenced identifiers (notRecommendVersion3, expectedStatus, wait.Poll, AssertWaitPollNoErr, oc.Run) are either removed from the file if unused or left only in active code paths.test/e2e/util.go (1)
41-46: Consider using conventional field nameValueinstead ofValu.The field name
Valuis non-standard and may confuse readers expecting the typicalValuespelling. While the JSON tag ensures correct serialization, the Go identifier could be clearer.✏️ Suggested rename
type JSONp struct { Oper string `json:"op"` Path string `json:"path"` - Valu interface{} `json:"value,omitempty"` + Value interface{} `json:"value,omitempty"` }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util.go` around lines 41 - 46, The struct JSONp uses a non-standard field identifier Valu which is confusing; rename the field to Value (i.e., change the struct field from Valu to Value while keeping the existing json tag `json:"value,omitempty"`) so the code reads clearer and remains backwards-compatible for JSON encoding; update any references to JSONp.Valu throughout the test/e2e package to JSONp.Value to compile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/upgrade.go`:
- Around line 36-49: The json.Unmarshal call shadows the outer err (using :=)
and its error isn't checked, so if Unmarshal fails the deferred add will try to
restore invalid data; change `err := json.Unmarshal(...)` to `err =
json.Unmarshal(...)` to avoid shadowing, immediately check `err` (e.g.
o.Expect(err).NotTo(o.HaveOccurred()) or return/fail) before using `data`, and
only install the deferred Patch add(...) after a successful unmarshal so the
defer restores valid data.
In `@test/e2e/util.go`:
- Around line 170-177: In the Patch function, the json.Marshal call's error is
checked after logging, causing a misleading empty log when marshal fails; move
the e2e.Logf("will patching: ...", ...) to after the error check (i.e., after
verifying err from json.Marshal) and ensure you log the marshal error via
e2e.Logf or return it immediately if err != nil; reference the Patch function
and the variables p and err from the json.Marshal(...) call to locate where to
reorder the logging.
---
Nitpick comments:
In `@test/e2e/upgrade.go`:
- Line 70: Rename the local variable new_channel to newChannel to follow Go
naming conventions and match surrounding variables like oldUpstream and
oldChannel; update every reference to new_channel (including the usage near the
upgrade test where it's passed/compared/assigned) to use newChannel so
identifiers remain consistent and compile.
- Around line 15-27: Remove the unnecessary defer g.GinkgoRecover() call and the
empty g.BeforeEach and g.AfterEach hooks from the g.Describe block; locate the
g.Describe(`[Jira:"oc adm upgrade"] accept`, func() { ... }) declaration and
delete the g.GinkgoRecover() line and the empty BeforeEach/AfterEach functions
(they add no behavior), leaving only the variable declarations (e.g., oc :=
NewCLI("oc", KubeConfigPath())) and any real test code.
- Around line 240-251: Remove the large commented-out upgrade block (the lines
using oc.Run("adm").Args("upgrade", "--to", notRecommendVersion3) and subsequent
wait.Poll/AssertWaitPollNoErr logic) or replace it with a single-line TODO
referencing a tracking issue; if keeping for future work, add a TODO comment
indicating the issue/PR number and why it's disabled and keep only a minimal
one-line note instead of the entire commented code, and ensure any referenced
identifiers (notRecommendVersion3, expectedStatus, wait.Poll,
AssertWaitPollNoErr, oc.Run) are either removed from the file if unused or left
only in active code paths.
In `@test/e2e/util.go`:
- Around line 41-46: The struct JSONp uses a non-standard field identifier Valu
which is confusing; rename the field to Value (i.e., change the struct field
from Valu to Value while keeping the existing json tag `json:"value,omitempty"`)
so the code reads clearer and remains backwards-compatible for JSON encoding;
update any references to JSONp.Valu throughout the test/e2e package to
JSONp.Value to compile.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cdffda8b-653d-4dd6-a933-ff34c3fd1110
📒 Files selected for processing (2)
test/e2e/upgrade.gotest/e2e/util.go
test/e2e/upgrade.go
Outdated
| g.By("remove overrides from clusterversion version if exists") | ||
| overrides, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("clusterversion", "version", "-o", "jsonpath={.spec.overrides}").Output() | ||
| if err == nil && strings.Contains(overrides, "ClusterImagePolicy") { | ||
| e2e.Logf("overrides: <<%s>>", overrides) | ||
| var data []map[string]interface{} | ||
| err := json.Unmarshal([]byte(overrides), &data) | ||
| _, err = oc.AsAdmin().WithoutNamespace().Patch("", "clusterversion/version", []JSONp{ | ||
| {"remove", "/spec/overrides", nil}, | ||
| }) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| defer oc.AsAdmin().WithoutNamespace().Patch("", "clusterversion/version", []JSONp{ | ||
| {"add", "/spec/overrides", data}, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Bug: json.Unmarshal error is never checked.
Line 41 declares a new err variable that shadows the outer scope, and line 42 immediately overwrites it. If json.Unmarshal fails, data will be nil/empty, and the deferred add operation will restore invalid data.
🐛 Proposed fix
g.By("remove overrides from clusterversion version if exists")
overrides, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("clusterversion", "version", "-o", "jsonpath={.spec.overrides}").Output()
if err == nil && strings.Contains(overrides, "ClusterImagePolicy") {
e2e.Logf("overrides: <<%s>>", overrides)
var data []map[string]interface{}
- err := json.Unmarshal([]byte(overrides), &data)
+ if err := json.Unmarshal([]byte(overrides), &data); err != nil {
+ e2e.Failf("Failed to unmarshal overrides: %v", err)
+ }
_, err = oc.AsAdmin().WithoutNamespace().Patch("", "clusterversion/version", []JSONp{
{"remove", "/spec/overrides", nil},
})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/upgrade.go` around lines 36 - 49, The json.Unmarshal call shadows
the outer err (using :=) and its error isn't checked, so if Unmarshal fails the
deferred add will try to restore invalid data; change `err :=
json.Unmarshal(...)` to `err = json.Unmarshal(...)` to avoid shadowing,
immediately check `err` (e.g. o.Expect(err).NotTo(o.HaveOccurred()) or
return/fail) before using `data`, and only install the deferred Patch add(...)
after a successful unmarshal so the defer restores valid data.
834aa52 to
56a5ee3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
test/e2e/util.go (1)
173-176:⚠️ Potential issue | 🟡 MinorMove patch payload logging after marshal error handling.
On Line 174,
pis logged before the marshal error check on Line 175. If marshal fails, the log is misleading.Suggested fix
func (c *CLI) Patch(namespace string, resource string, patch []JSONPatchOperation) (output string, err error) { p, err := json.Marshal(patch) - e2e.Logf("will patching: %s with: %s in namespace: %s", resource, string(p), namespace) if err != nil { e2e.Logf("Marshal json Error - json.Marshal: '%v'", err) return "", err } + e2e.Logf("will patching: %s with: %s in namespace: %s", resource, string(p), namespace)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util.go` around lines 173 - 176, The log statement prints the marshaled patch payload before checking json.Marshal error, which can be misleading; move the e2e.Logf("will patching: %s with: %s in namespace: %s", resource, string(p), namespace) so it executes only after verifying err == nil (i.e., after json.Marshal(patch) and the nil-check), and keep the existing error log e2e.Logf("Marshal json Error - json.Marshal: '%v'", err) unchanged to report marshal failures.test/e2e/upgrade.go (4)
88-101:⚠️ Potential issue | 🟠 MajorGuard node count before fixed-position indexing.
extractVersion(4)can panic when the graph returns fewer than five nodes; add a length assertion before Line 98-101 calls.Suggested fix
nodes, ok := result["nodes"].([]interface{}) o.Expect(ok).To(o.BeTrue(), "get nodes failed") + o.Expect(len(nodes)).To(o.BeNumerically(">=", 5), "expected at least 5 nodes in graph response")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 88 - 101, The test assumes at least five nodes but indexes nodes[1..4] directly which can panic; before calling extractVersion (and before using nodes), assert the length: check len(nodes) >= 5 (or adapt to expected count) and fail the test with a clear message if not, or replace the fixed-index calls with a safe loop over existing nodes; update references around variables nodes and the extractVersion function and the calls that assign recommendVersion / notRecommendVersion1/2/3 to use this guarded length check.
42-43:⚠️ Potential issue | 🟠 MajorCheck
json.Unmarshalerror before patching overrides.On Line 42, the unmarshal error is overwritten on Line 43 before it is validated, so malformed
overridescan propagate into patch/restore logic.Suggested fix
if err == nil && strings.Contains(overrides, "ClusterImagePolicy") { e2e.Logf("overrides: <<%s>>", overrides) var data []map[string]interface{} - err := json.Unmarshal([]byte(overrides), &data) + err = json.Unmarshal([]byte(overrides), &data) + o.Expect(err).NotTo(o.HaveOccurred(), "failed to unmarshal .spec.overrides") _, err = oc.Patch("", "clusterversion/version", []JSONPatchOperation{ {"remove", "/spec/overrides", nil}, })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 42 - 43, The code currently calls json.Unmarshal into variable err and then immediately reassigns err with the result of oc.Patch (overwriting any unmarshal error); check and handle the json.Unmarshal error before calling oc.Patch to prevent malformed overrides from being used—specifically, after json.Unmarshal([]byte(overrides), &data) verify err is nil and return or fail the test if not, then proceed to call oc.Patch("", "clusterversion/version", []JSONPatchOperation{...}); do the same guard for any later restore/patch logic that expects valid unmarshaled data.
82-83:⚠️ Potential issue | 🟠 MajorAvoid hardcoding
arch=amd64in graph query.Line 82 forces
amd64, which can produce incorrect graph behavior on arm64/s390x/ppc64le/multi-arch clusters.Suggested fix
- fullGraph := fmt.Sprintf("%s?channel=%s&version=%s&arch=amd64", graph, newChannel, clusterVersion.FullVersion) + arch := ClusterArchitecture(oc) + if arch == string(MULTI) { + g.Skip("Skipping test: fauxinnati graph query requires a single architecture") + } + fullGraph := fmt.Sprintf("%s?channel=%s&version=%s&arch=%s", graph, newChannel, clusterVersion.FullVersion, arch)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 82 - 83, The graph query hardcodes arch=amd64 which breaks non-amd64 clusters; change the construction of fullGraph to use the actual runtime architecture (e.g. set arch := runtime.GOARCH and use that in fmt.Sprintf) or derive the cluster architecture from cluster metadata if available, then call GetWebResource(fullGraph) with the computed arch instead of the literal "amd64"; add the runtime import if you use runtime.GOARCH and update references to fullGraph/graph/newChannel/clusterVersion.FullVersion accordingly.
124-127:⚠️ Potential issue | 🟠 MajorMake accept-risks flow hermetic and restore original state.
This flow assumes an empty starting state and does not restore the original
spec.desiredUpdate.acceptRisksat the end, so it can leak cluster state across specs.Also applies to: 151-155, 170-239
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 124 - 127, The test block beginning with g.By("clear risks when the accept risk list is empty") mutates cluster state without restoring it; save the current spec.desiredUpdate.acceptRisks before calling oc.Run("adm").Args("upgrade","accept","--clear").Output(), and after the assertions restore the original value (use the same oc.Run/oc.Apply helpers used elsewhere in this file to patch the ClusterVersion). If the test assumes an empty starting list, make it hermetic by first capturing the original acceptRisks, explicitly setting acceptRisks to an empty list for the test, and then deferring or ensuring a final patch that restores the saved acceptRisks; apply the same pattern to the other affected blocks referenced (around the other tests in the 151-155 and 170-239 ranges).
🧹 Nitpick comments (1)
test/e2e/upgrade.go (1)
24-29: Remove empty hooks and dead commented block to reduce noise.Empty
BeforeEach/AfterEachand the commented upgrade block add maintenance overhead without behavior.Also applies to: 241-252
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/upgrade.go` around lines 24 - 29, Remove the no-op test hooks and dead commented upgrade block to reduce noise: delete the empty BeforeEach and AfterEach function blocks (the g.BeforeEach and g.AfterEach declarations) and remove the commented upgrade block referenced around lines 241-252 so only active test logic remains; ensure no remaining references expect those hooks before committing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/upgrade.go`:
- Around line 47-49: The deferred oc.Patch calls silently ignore errors; replace
each defer oc.Patch(...) with g.DeferCleanup wrappers that call oc.Patch and
assert no error (e.g. g.DeferCleanup(func(){ if err := oc.Patch("",
"clusterversion/version", []JSONPatchOperation{{"add","/spec/overrides",data}});
g.Expect(err).NotTo(HaveOccurred()) })) so cleanup failures are reported—do this
for both occurrences of oc.Patch and reference the oc.Patch function and
g.DeferCleanup/g.Expect in the changes.
---
Duplicate comments:
In `@test/e2e/upgrade.go`:
- Around line 88-101: The test assumes at least five nodes but indexes
nodes[1..4] directly which can panic; before calling extractVersion (and before
using nodes), assert the length: check len(nodes) >= 5 (or adapt to expected
count) and fail the test with a clear message if not, or replace the fixed-index
calls with a safe loop over existing nodes; update references around variables
nodes and the extractVersion function and the calls that assign recommendVersion
/ notRecommendVersion1/2/3 to use this guarded length check.
- Around line 42-43: The code currently calls json.Unmarshal into variable err
and then immediately reassigns err with the result of oc.Patch (overwriting any
unmarshal error); check and handle the json.Unmarshal error before calling
oc.Patch to prevent malformed overrides from being used—specifically, after
json.Unmarshal([]byte(overrides), &data) verify err is nil and return or fail
the test if not, then proceed to call oc.Patch("", "clusterversion/version",
[]JSONPatchOperation{...}); do the same guard for any later restore/patch logic
that expects valid unmarshaled data.
- Around line 82-83: The graph query hardcodes arch=amd64 which breaks non-amd64
clusters; change the construction of fullGraph to use the actual runtime
architecture (e.g. set arch := runtime.GOARCH and use that in fmt.Sprintf) or
derive the cluster architecture from cluster metadata if available, then call
GetWebResource(fullGraph) with the computed arch instead of the literal "amd64";
add the runtime import if you use runtime.GOARCH and update references to
fullGraph/graph/newChannel/clusterVersion.FullVersion accordingly.
- Around line 124-127: The test block beginning with g.By("clear risks when the
accept risk list is empty") mutates cluster state without restoring it; save the
current spec.desiredUpdate.acceptRisks before calling
oc.Run("adm").Args("upgrade","accept","--clear").Output(), and after the
assertions restore the original value (use the same oc.Run/oc.Apply helpers used
elsewhere in this file to patch the ClusterVersion). If the test assumes an
empty starting list, make it hermetic by first capturing the original
acceptRisks, explicitly setting acceptRisks to an empty list for the test, and
then deferring or ensuring a final patch that restores the saved acceptRisks;
apply the same pattern to the other affected blocks referenced (around the other
tests in the 151-155 and 170-239 ranges).
In `@test/e2e/util.go`:
- Around line 173-176: The log statement prints the marshaled patch payload
before checking json.Marshal error, which can be misleading; move the
e2e.Logf("will patching: %s with: %s in namespace: %s", resource, string(p),
namespace) so it executes only after verifying err == nil (i.e., after
json.Marshal(patch) and the nil-check), and keep the existing error log
e2e.Logf("Marshal json Error - json.Marshal: '%v'", err) unchanged to report
marshal failures.
---
Nitpick comments:
In `@test/e2e/upgrade.go`:
- Around line 24-29: Remove the no-op test hooks and dead commented upgrade
block to reduce noise: delete the empty BeforeEach and AfterEach function blocks
(the g.BeforeEach and g.AfterEach declarations) and remove the commented upgrade
block referenced around lines 241-252 so only active test logic remains; ensure
no remaining references expect those hooks before committing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e49e15bc-ccd0-46e9-be15-b58484d4c9c2
📒 Files selected for processing (2)
test/e2e/upgrade.gotest/e2e/util.go
| defer oc.Patch("", "clusterversion/version", []JSONPatchOperation{ | ||
| {"add", "/spec/overrides", data}, | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n test/e2e/upgrade.go | head -100Repository: openshift/oc
Length of output: 4564
🏁 Script executed:
# Search for Patch method definition to understand return type
rg -A 5 "func.*Patch\(" test/e2e/ -t go | head -50Repository: openshift/oc
Length of output: 482
🏁 Script executed:
# Search for g.DeferCleanup usage patterns in the test suite
rg "g\.DeferCleanup" test/e2e/ -B 2 -A 5 | head -80Repository: openshift/oc
Length of output: 430
🏁 Script executed:
# Look for other deferred cleanup patterns in the test suite
rg "defer.*Patch\(" test/e2e/ -B 2 -A 2Repository: openshift/oc
Length of output: 617
🏁 Script executed:
# Get the full Patch method to understand better
rg -A 15 "func \(c \*CLI\) Patch" test/e2e/util.goRepository: openshift/oc
Length of output: 701
🏁 Script executed:
# Look at the surrounding context for line 43-45 to see proper error handling
sed -n '40,50p' test/e2e/upgrade.goRepository: openshift/oc
Length of output: 465
🏁 Script executed:
# Check if there are any other similar patterns in upgrade.go
rg "defer.*Run\(" test/e2e/upgrade.go -B 1 -A 1Repository: openshift/oc
Length of output: 38
Add error handling to deferred cleanup patches.
Both deferred oc.Patch() calls at lines 47-49 and 77-80 drop errors silently. If cleanup fails, the cluster remains in a mutated state with no indication of failure. Use g.DeferCleanup() with error assertions to catch restoration failures:
Suggested fix
- defer oc.Patch("", "clusterversion/version", []JSONPatchOperation{
- {"add", "/spec/overrides", data},
- })
+ g.DeferCleanup(func() {
+ _, restoreErr := oc.Patch("", "clusterversion/version", []JSONPatchOperation{
+ {"add", "/spec/overrides", data},
+ })
+ o.Expect(restoreErr).NotTo(o.HaveOccurred())
+ })
@@
- defer oc.Patch("", "clusterversion/version", []JSONPatchOperation{
- {"add", "/spec/upstream", oldUpstream},
- {"add", "/spec/channel", oldChannel},
- })
+ g.DeferCleanup(func() {
+ _, restoreErr := oc.Patch("", "clusterversion/version", []JSONPatchOperation{
+ {"add", "/spec/upstream", oldUpstream},
+ {"add", "/spec/channel", oldChannel},
+ })
+ o.Expect(restoreErr).NotTo(o.HaveOccurred())
+ })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/upgrade.go` around lines 47 - 49, The deferred oc.Patch calls
silently ignore errors; replace each defer oc.Patch(...) with g.DeferCleanup
wrappers that call oc.Patch and assert no error (e.g. g.DeferCleanup(func(){ if
err := oc.Patch("", "clusterversion/version",
[]JSONPatchOperation{{"add","/spec/overrides",data}});
g.Expect(err).NotTo(HaveOccurred()) })) so cleanup failures are reported—do this
for both occurrences of oc.Patch and reference the oc.Patch function and
g.DeferCleanup/g.Expect in the changes.
e6448d3 to
61ec941
Compare
|
Although I haven't get a passed run for the PR, I think you can start to review it, it almost done. /cc @hongkailiu |
|
The case handles non-tp clusters well: |
|
Need update fauxinnati. What we expect risks are: But now fauxinnati and test case are inconsistent: |
61ec941 to
5cdcc42
Compare
5cdcc42 to
84622ed
Compare
|
@JianLi-RH: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@hongkailiu Test passed.
|
This will be a manual case (introduces a new label "manual") because it needs too long time to be finished.
Eipc: https://issues.redhat.com/browse/OTA-1543
We can run it follow below steps:
The case needs upstream: https://fauxinnati-fauxinnati.apps.ota-stage.q2z4.p1.openshiftapps.com/api/upgrades_info/graph and channel "OCP-88175"
Summary by CodeRabbit
Tests
Tools