Skip to content

Conversation

@Karthik-K-N
Copy link
Contributor

@Karthik-K-N Karthik-K-N commented Dec 19, 2025

Since both node controller and node readiness controller trying to update the status of rule, There may be conflict where the status updated by the NRR controller overriden by node controller, this tries to match the rule generation and make sure both are at the same level before updating the status,

From Error logs

2025-12-17T20:09:42+05:30	DEBUG	Updating rule status	{"controller": "nodereadiness-controller", "object": {"name":"network-readiness-rule"}, "namespace": "", "name": "network-readiness-rule", "reconcileID": "46cd3c25-2e2a-4e51-a39a-5852d051e052", "rule": "network-readiness-rule", "nodeEvaluations": 2, "appliedNodes": 2}
.
.
.
2025-12-17T20:09:53+05:30	DEBUG	Updating rule status	{"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"nrg-test-worker2"}, "namespace": "", "name": "nrg-test-worker2", "reconcileID": "49acd35b-8ad5-4da8-902f-102b25da14a1", "rule": "network-readiness-rule", "nodeEvaluations": 1, "appliedNodes": 0}

Fixes: #43

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Karthik-K-N
Once this PR has been reviewed and has the lgtm label, please assign sergeykanzhelev for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 19, 2025
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 19, 2025
@netlify
Copy link

netlify bot commented Dec 19, 2025

Deploy Preview for node-readiness-controller canceled.

Name Link
🔨 Latest commit d0914e8
🔍 Latest deploy log https://app.netlify.com/projects/node-readiness-controller/deploys/6944c95405bbf50008ff46c7

return err
}

if latestRule.GetGeneration() != rule.GetGeneration() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of the issue seems to me that node-reconciler overwrites correct (recent) values with stale/empty values from cache.

For instance:

  1. T0: NodeReconciler starts rule.gen=1
  2. T1: RuleReconciler updates rule.gen=2
  3. T2: NodeReconciler updates with copying rule from cache?

I think instead of calling the full updateRuleStatus(), NodeReconciler should use Patch ONLY the fields it modifies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try this and see if it helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeReconciler should use Patch ONLY the fields it modifies.

This will be the best solution of all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My chances here: ajaysundark@0f1b4fc. Haven't gotten a chance to test this fully yet

Copy link
Contributor

@ajaysundark ajaysundark Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#49 for your review fixing the status updates.

rule generation mismatch suggests the node reconciliation is dealing with a older rule and next rule-reconciliation should handle this pending node update anyway; so, including this check makes sense (could be an optimization to save some redundant reconciliation cycles).


if latestRule.GetGeneration() != rule.GetGeneration() {
log.V(4).Info("Rule generation mismatch during status update, avoiding retry to let new reconciliation handle it")
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning here silently is problematic as it would skip the node update entirely..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix status update inconsistency

3 participants