WIP: THREESCALE-14224 - Fix perpetual reconcile loops: deployment churn, status writes, HPA deletes#1167
Draft
urbanikb wants to merge 5 commits into3scale:masterfrom
Draft
WIP: THREESCALE-14224 - Fix perpetual reconcile loops: deployment churn, status writes, HPA deletes#1167urbanikb wants to merge 5 commits into3scale:masterfrom
urbanikb wants to merge 5 commits into3scale:masterfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ReconcileHpa() was calling DeleteResource() directly when HPA is disabled, which issues a Client().Delete() API call every reconcile cycle regardless of whether the HPA exists. After the first successful delete, subsequent cycles hit a 404 on every cycle indefinitely. Replace with TagObjectToDelete + ReconcileResource, which does a cache-backed Get first and is a no-op when the object is already gone. This is consistent with how PDB and monitoring resources are handled. Add table-driven tests covering all three HPA targets (backend-listener, backend-worker, apicast-production) across enabled, disabled, and async-disable annotation scenarios. Related: THREESCALE-14224 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace unconditional Status().Update() loop: skip write when newStatus equals current status; previously the && apiManagerAvailable guard prevented the short-circuit when routes were not ready, causing a write every reconcile cycle even when nothing had changed - Replace Requeue:true (feeds rate limiter AddRateLimited) with RequeueAfter:30s (uses AddAfter, no backoff accumulation) when Available is False, as a safety net for missed watch events - Replace namespace-wide route informer and unreliable ownerRef-chain mapper with a label-filtered watch (3scale.net/created-by=zync) and a domain-name mapper that enqueues the APIManager whose expected tenant hosts overlap with the changed route's host Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… reconcile Probe fields left at zero (TimeoutSeconds, PeriodSeconds, SuccessThreshold, FailureThreshold) are defaulted by the API server on write. DeploymentProbesMutator uses reflect.DeepEqual, so the desired spec (zeros) never matched the live spec (K8s defaults), causing an update every reconcile cycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1167 +/- ##
==========================================
- Coverage 41.84% 40.97% -0.88%
==========================================
Files 203 204 +1
Lines 20859 26067 +5208
==========================================
+ Hits 8729 10681 +1952
- Misses 11350 14570 +3220
- Partials 780 816 +36
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes three independent sources of unnecessary reconcile work that caused either continuous API writes or missed reconcile triggers.
1. Deployment update churn (K8s default normalization)
The K8s API server applies defaults (probe integers, HTTPGet scheme, container termination fields, volume DefaultModes, pod-level policies, deployment strategy) to stored Deployments. Without normalization, every reconcile pass detected a false-positive diff between the in-memory desired state and the API-server-enriched existing state, issuing an unnecessary update on every cycle.
Introduces
normalizeDeploymentDefaults(and its call chain throughnormalizePodSpecDefaults,normalizeContainerDefaults,normalizeProbeDefaults,normalizeVolumeDefaults) inpkg/reconcilers/k8s_defaults.go, called at the top ofDeploymentMutatorbefore anyDMutateFnruns.2. Status reconciler write loop and unreliable route watch
Status().Update()with a no-op whennewStatusequals the current status; the previous&& apiManagerAvailableguard prevented the short-circuit when routes were not ready, causing a write every reconcile cycle even when nothing had changed.Requeue: true(feeds the rate-limiter backoff queue) withRequeueAfter: 30swhen Available is False, as a safety net for missed watch events.3scale.net/created-by=zync) and a domain-name mapper that enqueues the APIManager whose expected tenant hosts overlap with the changed route's host.3. HPA reconciler unnecessary delete calls
ReconcileHpa()was callingDeleteResource()directly when HPA is disabled, issuing aClient().Delete()API call every reconcile cycle regardless of whether the HPA still existed. After the first successful delete, subsequent cycles hit a 404 indefinitely. Replaced withTagObjectToDelete+ReconcileResource, consistent with how PDB and monitoring resources are handled.Testing
test/envtest/suite with 21 tests — one per normalized field — using a real API server to confirm each default is handled and no false-positive update is produced.make test-envtesttarget included in the GitHub Actionsunit-testsjob.Status
🤖 Generated with Claude Code