[release-4.21] OCPBUGS-90543: fix etcd operator deadlock when etcd-endpoints configmap is stale#1637
Conversation
…rList fails When the etcd-endpoints configmap contains stale IPs (e.g. after VM migration), the etcd client pool cannot reach any member, causing MemberList to fail. This creates a deadlock: the operator cannot update the configmap because MemberList is needed to get member addresses, but MemberList fails because the configmap has stale addresses. Break the deadlock directly in EtcdEndpointsController.syncConfigMap(): when MemberList fails, fall back to control-plane node internal IPs discovered via the node lister and network config. This populates the configmap with reachable IPs, allowing the etcd client to reconnect. On the next successful sync, MemberList overwrites with authoritative member data. Also adds WithMemberListError option to FakeEtcdClient for testing. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@openshift-cherrypick-robot: Jira Issue OCPBUGS-88490 has been cloned as Jira Issue OCPBUGS-90543. Will retitle bug to link to clone. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-90543, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-cherrypick-robot: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is an automated cherry-pick of #1631
/assign dpateriya