test: add DumpOnFailure debug infrastructure for all e2e tests#1115
Conversation
Introduce a composable DebugFunc interface and DumpOnFailure helper in the e2e framework. Tests register debug functions (namespace dumps, CR-specific diagnostics) that only run when the test fails. Apply it across all e2e test suites for consistent post-failure diagnostics. Co-authored-by: Cursor <cursoragent@cursor.com>
📝 WalkthroughWalkthroughThis PR adds diagnostic utilities to the e2e test framework to automatically capture Kubernetes namespace state when tests fail. Three new public types and methods are introduced to Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e/uiplugin_cluster_health_analyzer_test.go (1)
179-216: ⚡ Quick winConsider allowing the List operation to run even when Get fails.
If fetching the specific UIPlugin fails at line 186, the function returns early and skips the List operation (lines 206-214) that shows all UIPlugins in the cluster. That cluster-wide view could still provide valuable diagnostics even when the specific plugin is not found.
🔍 Proposed adjustment to maximize diagnostic coverage
func dumpUIPluginDebug(pluginName string) framework.DebugFunc { return func(t *testing.T) { t.Helper() ctx := context.WithoutCancel(t.Context()) var plugin uiv1.UIPlugin if err := f.K8sClient.Get(ctx, client.ObjectKey{Name: pluginName}, &plugin); err != nil { t.Logf("Failed to get UIPlugin %q: %v", pluginName, err) - return + } else { + t.Logf("UIPlugin %q generation=%d, resourceVersion=%s", pluginName, plugin.Generation, plugin.ResourceVersion) + t.Logf("UIPlugin spec.type=%s", plugin.Spec.Type) + if plugin.Spec.Monitoring != nil { + if plugin.Spec.Monitoring.ClusterHealthAnalyzer != nil { + t.Logf("UIPlugin spec.monitoring.clusterHealthAnalyzer.enabled=%v", plugin.Spec.Monitoring.ClusterHealthAnalyzer.Enabled) + } + if plugin.Spec.Monitoring.Incidents != nil { + t.Logf("UIPlugin spec.monitoring.incidents.enabled=%v", plugin.Spec.Monitoring.Incidents.Enabled) + } + } + if len(plugin.Status.Conditions) == 0 { + t.Log("UIPlugin has no status conditions") + } + for _, c := range plugin.Status.Conditions { + t.Logf("UIPlugin condition: type=%s status=%s reason=%s message=%s", c.Type, c.Status, c.Reason, c.Message) + } } - t.Logf("UIPlugin %q generation=%d, resourceVersion=%s", pluginName, plugin.Generation, plugin.ResourceVersion) - t.Logf("UIPlugin spec.type=%s", plugin.Spec.Type) - if plugin.Spec.Monitoring != nil { - if plugin.Spec.Monitoring.ClusterHealthAnalyzer != nil { - t.Logf("UIPlugin spec.monitoring.clusterHealthAnalyzer.enabled=%v", plugin.Spec.Monitoring.ClusterHealthAnalyzer.Enabled) - } - if plugin.Spec.Monitoring.Incidents != nil { - t.Logf("UIPlugin spec.monitoring.incidents.enabled=%v", plugin.Spec.Monitoring.Incidents.Enabled) - } - } - if len(plugin.Status.Conditions) == 0 { - t.Log("UIPlugin has no status conditions") - } - for _, c := range plugin.Status.Conditions { - t.Logf("UIPlugin condition: type=%s status=%s reason=%s message=%s", c.Type, c.Status, c.Reason, c.Message) - } var plugins uiv1.UIPluginList if err := f.K8sClient.List(ctx, &plugins); err != nil { t.Logf("Failed to list UIPlugins: %v", err) } else { t.Logf("Total UIPlugins in cluster: %d", len(plugins.Items)) for _, p := range plugins.Items { t.Logf(" UIPlugin: name=%s type=%s conditions=%d", p.Name, p.Spec.Type, len(p.Status.Conditions)) } } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 179 - 216, The dumpUIPluginDebug helper returns early when f.K8sClient.Get for the named plugin (in function dumpUIPluginDebug, variable plugin) fails, which prevents the cluster-wide List (f.K8sClient.List, variable plugins) from running; change the flow to log the Get error but do not return—continue to run the List block so the cluster-wide UIPlugin diagnostics are always emitted even if the specific Get fails, and keep the existing logging for successful Get results separate from the List logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 179-216: The dumpUIPluginDebug helper returns early when
f.K8sClient.Get for the named plugin (in function dumpUIPluginDebug, variable
plugin) fails, which prevents the cluster-wide List (f.K8sClient.List, variable
plugins) from running; change the flow to log the Get error but do not
return—continue to run the List block so the cluster-wide UIPlugin diagnostics
are always emitted even if the specific Get fails, and keep the existing logging
for successful Get results separate from the List logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5d2395a2-0c84-40d5-8df7-afd8359f2cdb
📒 Files selected for processing (9)
test/e2e/framework/framework.gotest/e2e/monitoring_stack_controller_test.gotest/e2e/observability_installer_test.gotest/e2e/operator_metrics_test.gotest/e2e/po_admission_webhook_test.gotest/e2e/prometheus_operator_test.gotest/e2e/thanos_querier_controller_test.gotest/e2e/uiplugin_cluster_health_analyzer_test.gotest/e2e/uiplugin_test.go
simonpasquier
left a comment
There was a problem hiding this comment.
If we want to be fancy, we could use https://pkg.go.dev/testing#T.ArtifactDir to store the debug information as test artifacts and persist them to disk/object storage but it can be a follow-up.
|
|
||
| // DebugNamespace returns a DebugFunc that dumps deployments, pods, and events | ||
| // for the given namespaces. | ||
| func (f *Framework) DebugNamespace(namespaces ...string) DebugFunc { |
There was a problem hiding this comment.
(nit) since it can accept several namespaces
| func (f *Framework) DebugNamespace(namespaces ...string) DebugFunc { | |
| func (f *Framework) DebugNamespaces(namespaces ...string) DebugFunc { |
| if !t.Failed() { | ||
| return | ||
| } | ||
| for _, fn := range fns { |
There was a problem hiding this comment.
should we print the test name before ranging over the functions?
| } | ||
|
|
||
| func testObservabilityInstallerTracing(t *testing.T) { | ||
| f.DumpOnFailure(t, f.DebugNamespace(f.OperatorNamespace, "tracing-observability")) |
There was a problem hiding this comment.
should we declare a const value?
| func dumpClusterHealthAnalyzerDebug(t *testing.T, pluginName string) { | ||
| t.Helper() | ||
| ctx := context.WithoutCancel(t.Context()) | ||
| func dumpUIPluginDebug(pluginName string) framework.DebugFunc { |
There was a problem hiding this comment.
I assume that we could move this to test/e2e/framework eventually?
|
/approve none of my comments are blocking. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DavidRajnoha, simonpasquier The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
DebugFunctype andDumpOnFailurehelper in the e2e framework — tests register diagnostic functions that only fire when the test failsDebugNamespace(ns...)which dumps deployments (with conditions), pods (with container statuses), and events for the given namespacesdumpClusterHealthAnalyzerDebuginto a reusabledumpUIPluginDebugreturning aDebugFuncDumpOnFailureinto all e2e test suites for consistent post-failure diagnosticsHow it works
Cleanups run in LIFO order, so
DumpOnFailureshould be called beforeCleanUpto ensure debug info is captured while resources still exist.Example output (truncated) — namespace with 6 deployments, 7 pods, 137 events
Example — empty namespace (instant, no noise)
Example — no output on passing tests
Test plan
go build ./test/e2e/...compiles cleanlyMade with Cursor