-
Notifications
You must be signed in to change notification settings - Fork 223
Description
/kind bug
What steps did you take and what happened:
I performed an experiment to test the behavior of updating network infrastructure on a running cluster.
- Created a valid GCPCluster with a defined network and subnets.
- Waited for the infrastructure to be provisioned (VPC and Subnets created).
- Updated the GCPCluster manifest, changing the spec.network.name and spec.network.subnets to new values (simulating a move to a new VPC).
- Applied the change.
Observed Behavior: The API server accepted the update (the fields are mutable). The CAPG controller reconciliation loop then proceeded to create the new subnets defined in the updated spec. However, it did not migrate existing Control Plane or Worker machines to the new network, nor did it clean up the old network resources.
This resulted in a "split" state:
- The GCPCluster object references a new VPC.
- Existing VMs (Control Plane/Workers) remain stranded in the old VPC.
- New MachineDeployments (or rollouts) attempt to provision in the new VPC, potentially causing connectivity failures (e.g., if the new VPC is isolated/not peered).
What did you expect to happen:
I expected the API to reject the update to these fields, Since these are destructive changed for any running cluster.
Infrastructure fields that cannot be reconciled without breaking cluster continuity (like spec.network.name, spec.project, and spec.region) should be marked as Immutable to prevent users from accidentally entering this undefined state.
Anything else you would like to add:
I'd like to hear the community's opinion regarding this behavior. I find it very concerning to work with clusters in such a way, especially in production environments...
Environment:
- CAPI version: v0.24.1
- CAPG version: v0.17
- Kubernetes Version: v1.33.5-gke.1308000 (GKE)