Skip to content

Add Operation CRD and agent daemon operation controller with soft-reboot support#59

Draft
bcho wants to merge 23 commits intomainfrom
hbc/agent-op-cr-poc
Draft

Add Operation CRD and agent daemon operation controller with soft-reboot support#59
bcho wants to merge 23 commits intomainfrom
hbc/agent-op-cr-poc

Conversation

@bcho
Copy link
Copy Markdown
Member

@bcho bcho commented Apr 20, 2026

Summary

Replace the temporary ConfigMap-based operation shim with a proper Operation custom resource (unbounded-kube.io/v1alpha3) for managing machine operations like soft-reboot.

Changes

  • Operation CRD: New cluster-scoped CR with SoftReboot and HardReboot types, status subresource, owner references for GC, and TTL-based cleanup
  • Agent daemon: Rewrite operation watcher to watch Operation CRs instead of ConfigMaps; replace reconciler_softrestart with generic reconciler_operation that dispatches by type
  • kubectl plugin: Rewrite soft-reboot command to create an Operation CR with owner reference and TTL (default 300s), then watch status until completion
  • RBAC: Replace ConfigMap rules with operations and operations/status verbs
  • Tests: 7 operation reconciler tests covering all phases, TTL cleanup, and edge cases

E2E Validation

Validated on oc-vm3 (Ubuntu 24.04, Standard_D2as_v7):

$ kubectl get machines -o wide
NAME    HOST   PHASE   K8S VERSION   AGE
agent                                14m

$ ./bin/kubectl-unbounded machine soft-reboot agent
  --> Soft-rebooting Machine agent...
  operation         agent-softreboot-1776734492
  --> Operation SoftReboot: agent-softreboot-1776734492 in progress...
  --> Operation SoftReboot: agent-softreboot-1776734492 completed
  ready

$ kubectl get operations -o wide
NAME                          MACHINE   TYPE         PHASE       AGE
agent-softreboot-1776734492   agent     SoftReboot   Completed   4m5s

$ kubectl get operations -o yaml
apiVersion: unbounded-kube.io/v1alpha3
kind: Operation
metadata:
  name: agent-softreboot-1776734492
  ownerReferences:
  - apiVersion: unbounded-kube.io/v1alpha3
    kind: Machine
    name: agent
    uid: d5fc240c-0b2a-4a3f-9468-1d2aabf691bb
spec:
  machineRef: agent
  ttlSecondsAfterFinished: 300
  type: SoftReboot
status:
  completedAt: "2026-04-21T01:21:33Z"
  phase: Completed
  startedAt: "2026-04-21T01:21:32Z"

Node returned to Ready after soft reboot completed successfully.

@bcho bcho force-pushed the hbc/agent-op-cr-poc branch 2 times, most recently from 12852aa to df6331a Compare April 20, 2026 22:54
Introduce a unified action queue pattern for the agent daemon that
processes both Machine CR updates and operation requests sequentially
through a single worker goroutine. Operations are delivered via
ConfigMaps labeled unbounded.io/agent-op=<machine-name> as a temporary
shim until a dedicated Operation CRD is introduced.

- Refactor daemon to use Action-typed workqueue with discriminated dispatch
- Extract machine reconciler into reconciler_machine.go
- Add ConfigMap operation shim (opshim.go) and watch loop (opwatch.go)
- Implement soft-reboot: systemctl restart of nspawn service, avoiding
  the machinectl disable/enable cycle that breaks re-enablement
- Add kubectl unbounded machine soft-reboot command
- Move Redfish check from getMachine to runReboot where it belongs
- Make nspawn.conf [Files] section unconditional with /lib/modules bind
- Add RBAC for ConfigMap access (annotated as POC/temporary)
- Add tests for opshim, soft-restart reconciler, and machine reconciler
@bcho bcho force-pushed the hbc/agent-op-cr-poc branch from df6331a to b412e35 Compare April 20, 2026 22:57
@bcho
Copy link
Copy Markdown
Member Author

bcho commented Apr 21, 2026

Events:
  Type     Reason                   Age                    From             Message
  ----     ------                   ----                   ----             -------
  Normal   Starting                 2m4s                   kube-proxy
  Normal   Starting                 15s                    kube-proxy
  Normal   Starting                 11m                    kube-proxy
  Normal   NodeHasNoDiskPressure    6m11s (x8 over 11m)    kubelet          Node oc-vm3 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     6m11s (x8 over 11m)    kubelet          Node oc-vm3 status is now: NodeHasSufficientPID
  Normal   NodeHasSufficientMemory  6m11s (x8 over 11m)    kubelet          Node oc-vm3 status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientMemory  2m28s (x7 over 2m29s)  kubelet          Node oc-vm3 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    2m28s (x7 over 2m29s)  kubelet          Node oc-vm3 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     2m28s (x7 over 2m29s)  kubelet          Node oc-vm3 status is now: NodeHasSufficientPID
  Normal   RegisteredNode           2m24s                  node-controller  Node oc-vm3 event: Registered Node oc-vm3 in Controller
  Normal   NodeReady                113s                   kubelet          Node oc-vm3 status is now: NodeReady
  Normal   Starting                 18s                    kubelet          Starting kubelet.
  Normal   NodeHasSufficientMemory  18s (x2 over 18s)      kubelet          Node oc-vm3 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    18s (x2 over 18s)      kubelet          Node oc-vm3 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     18s (x2 over 18s)      kubelet          Node oc-vm3 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  18s                    kubelet          Updated Node Allocatable limit across pods
  Warning  Rebooted                 18s                    kubelet          Node oc-vm3 has been rebooted, boot id: 583dc021-bcda-4f75-a08c-4d7156b8364e
  Warning  InvalidDiskCapacity      18s                    kubelet          invalid capacity 0 on image filesystem

Introduce a proper Operation custom resource (unbounded-kube.io/v1alpha3)
for managing machine operations, replacing the temporary ConfigMap-based
approach. The Operation CRD supports SoftReboot (agent-executed) and
HardReboot (reserved for machina controller) types with status subresource,
owner references for GC, and TTL-based cleanup of completed operations.

- Add Operation CRD types, deepcopy, and generated CRD YAML
- Rewrite agent operation watcher to watch Operation CRs (was ConfigMaps)
- Replace reconciler_softrestart with reconciler_operation dispatching by type
- Update action queue to use ActionOperation instead of ActionSoftRestart
- Rewrite kubectl soft-reboot to create Operation CR with owner ref and TTL
- Update RBAC from ConfigMap rules to operations and operations/status
- Add 7 operation reconciler tests covering all phases and TTL cleanup
- Remove opshim.go, opshim_test.go, reconciler_softrestart.go and its tests
@bcho bcho changed the title POC: ConfigMap-based operation controller with soft-reboot Replace ConfigMap-based operation shim with Operation CRD Apr 21, 2026
@bcho bcho changed the title Replace ConfigMap-based operation shim with Operation CRD Add Operation CRD and agent daemon operation controller with soft-reboot support Apr 21, 2026
Align the Operation CRD with the MachineOperation design from PR #46:
- Rename Operation CR to MachineOperation (shortName: mop)
- Rename spec.type to spec.operationName (enum-as-string)
- Rename SoftReboot/HardReboot to Reboot/PowerCycle plus Shutdown,
  PowerOff, PowerOn, RestartService placeholders
- Rename Completed phase to Complete
- Add spec.parameters map[string]string for operation arguments
- Add unbounded-kube.io/machine label on every MachineOperation for
  label-selector-based informer scoping in the agent
- Update agent opwatch to use label selector instead of client-side filter
- Update RBAC from operations to machineoperations
- Update kubectl soft-reboot, tests, and e2e
@bcho
Copy link
Copy Markdown
Member Author

bcho commented Apr 21, 2026

E2E: Kubernetes Version Upgrade via MachineConfiguration (v1.34.3 -> v1.35.1)

Cluster: AKS bahe-test-nodes (southcentralus, v1.34.3)
VM: oc-vm3 (Ubuntu 24.04, nspawn blue/green)

Flow

  1. kubectl unbounded config create upgrade-test --kubernetes-version=v1.35.1 --node-labels=...
  2. Controller auto-creates upgrade-test-v1 MCV
  3. kubectl unbounded config assign upgrade-test agent --version=1
  4. Bump spec.operations.repaveCounter to trigger repave
  5. Agent resolves MCV, detects version drift, performs full blue/green node update

Agent Logs

[I] daemon starting [machine_cr=agent] [nspawn_machine=kube1] [applied_version=1.34.3]
[I] operation counter drift detected [current_version=1.34.3] [desired_version=v1.35.1] [mcv=upgrade-test-v1]
[I] starting node update [old_machine=kube1] [new_machine=kube2] [old_version=1.34.3] [new_version=1.35.1]
[I] pulling OCI image [image=ghcr.io/azure/agent-ubuntu2404:v20260409]
[I] downloading kubernetes binary [binary=kubelet] [url=https://dl.k8s.io/v1.35.1/bin/linux/amd64/kubelet]
[I] downloading kubernetes binary [binary=kubectl] [url=https://dl.k8s.io/v1.35.1/bin/linux/amd64/kubectl]
[I] downloading kubernetes binary [binary=kube-proxy] [url=https://dl.k8s.io/v1.35.1/bin/linux/amd64/kube-proxy]
[I] stopping machine [machine=kube1]
[I] kubelet is active [machine=kube2]
[I] removing machine rootfs [machine=kube1]
[I] node update completed [active_machine=kube2] [version=1.35.1]
[I] reconciliation completed [new_version=1.35.1] [mcv=upgrade-test-v1]

Controller Logs

INFO  creating initial MachineConfigurationVersion  {"name": "upgrade-test", "version": 1}

Machine CR Status (after)

status:
  phase: Joining
  message: node update completed
  configuration:
    name: upgrade-test
    version: 1
    versionName: upgrade-test-v1
  conditions:
  - type: NodeUpdated
    status: "True"
    reason: Succeeded
    message: node update completed
  operations:
    repaveCounter: 2

MCV Status

spec:
  version: 1
  template:
    kubernetes:
      version: v1.35.1
      nodeLabels:
        kubernetes.azure.com/cluster: bahe-test-nodes
        kubernetes.azure.com/managed: "false"

Node (after)

NAME     STATUS   ROLES    AGE     VERSION
oc-vm3   Ready    <none>   3h17m   v1.35.1

Summary

  • Total repave time: ~16 seconds (rootfs provision 13.4s + stop/start/cleanup 2.6s)
  • Blue/green swap: kube1 (v1.34.3) -> kube2 (v1.35.1)
  • Node rejoined cluster as Ready with v1.35.1
  • Machine status.configuration correctly records the applied MCV
  • RBAC fix: added machineconfigurationversions (get/list/watch) to bootstrapper ClusterRole

kubectl plugin commands tested

kubectl unbounded config create upgrade-test --kubernetes-version=v1.35.1 --node-labels=...
kubectl unbounded config get
kubectl unbounded config get upgrade-test
kubectl unbounded config versions upgrade-test
kubectl unbounded config assign upgrade-test agent --version=1

…ller, agent MCV resolution, and kubectl config commands

Introduce a Deployment/ReplicaSet-style versioning model for machine
configuration. MachineConfiguration acts as the mutable profile;
edits automatically create or update MachineConfigurationVersion
snapshots (immutable once deployed).

Agent changes:
- reconcileUpdateMachine resolves MCV from Machine.spec.configurationRef
  instead of reading config from Machine.Spec.Kubernetes/Agent directly
- Fails if configurationRef is missing (no fallback)
- Records applied MCV in Machine.status.configuration after success

kubectl unbounded config commands:
- config create: creates a MachineConfiguration with k8s version, agent
  image, node labels, taints, and update strategy
- config get: lists MachineConfigurations or shows detail for one
- config versions: lists MCVs for a MachineConfiguration
- config assign: sets configurationRef on a Machine with optional
  version pin

Also adds machineconfigurationversions RBAC to bootstrapper ClusterRole.

E2E validated on oc-vm3: v1.34.3 -> v1.35.1 upgrade via blue/green
repave in ~16 seconds, node rejoined as Ready.
@bcho
Copy link
Copy Markdown
Member Author

bcho commented Apr 21, 2026

E2E: Node Delete as Repave Trigger

Validated the Node-deletion repave signal on oc-vm3. The agent now watches the Kubernetes Node object by hostname and triggers a forced repave (bypassing operation counter drift check) when the Node is deleted.

Resource Setup

MachineConfiguration (upgrade-test):

apiVersion: unbounded-kube.io/v1alpha3
kind: MachineConfiguration
metadata:
  name: upgrade-test
spec:
  priority: 0
  revisionHistoryLimit: 10
  template:
    kubernetes:
      version: v1.35.1
      nodeLabels:
        kubernetes.azure.com/cluster: bahe-test-nodes
        kubernetes.azure.com/managed: "false"

MachineConfigurationVersion v1 (upgrade-test-v1) - initial version matching applied:

apiVersion: unbounded-kube.io/v1alpha3
kind: MachineConfigurationVersion
metadata:
  name: upgrade-test-v1
  labels:
    unbounded-kube.io/configuration: upgrade-test
spec:
  version: 1
  template:
    kubernetes:
      version: v1.35.1
      nodeLabels:
        kubernetes.azure.com/cluster: bahe-test-nodes
        kubernetes.azure.com/managed: "false"

MachineConfigurationVersion v2 (upgrade-test-v2) - target version for repave:

apiVersion: unbounded-kube.io/v1alpha3
kind: MachineConfigurationVersion
metadata:
  name: upgrade-test-v2
  labels:
    unbounded-kube.io/configuration: upgrade-test
spec:
  version: 2
  template:
    kubernetes:
      version: v1.34.3
      nodeLabels:
        kubernetes.azure.com/cluster: bahe-test-nodes
        kubernetes.azure.com/managed: "false"

Machine CR (agent) - assigned to MCV v2:

apiVersion: unbounded-kube.io/v1alpha3
kind: Machine
metadata:
  name: agent
spec:
  configurationRef:
    name: upgrade-test
    version: 2
  kubernetes:
    bootstrapTokenRef:
      name: bootstrap-token-ftbv20
    nodeLabels:
      kubernetes.azure.com/cluster: bahe-test-nodes
      kubernetes.azure.com/managed: "false"

Test 1: Node delete with no config drift

Assigned MCV v1 (v1.35.1, same as applied), deleted Node oc-vm3:

22:44:17 [I] Node deleted, enqueuing repave [watcher=node] [node=oc-vm3]
22:44:17 [I] Node deleted, forcing repave [action=NodeDeleted] [source=oc-vm3]
                [current_version=1.35.1] [desired_version=v1.35.1] [mcv=upgrade-test-v1]
22:44:17 [I] no config drift detected, skipping node update [action=NodeDeleted] [source=oc-vm3]
22:44:17 [I] reconciliation completed [new_version=1.35.1] [mcv=upgrade-test-v1] [action=NodeDeleted]

Result: Agent detected Node deletion, bypassed operation counter check (forceRepave=true), but updateNode correctly found no config drift and skipped the expensive repave. No unnecessary work.

Test 2: Node delete with version change (v1.35.1 -> v1.34.3)

Assigned MCV v2 (v1.34.3), deleted Node oc-vm3:

22:49:24 [I] Node deleted, enqueuing repave [watcher=node] [node=oc-vm3]
22:49:24 [I] Node deleted, forcing repave [action=NodeDeleted] [source=oc-vm3]
                [current_version=1.35.1] [desired_version=v1.34.3] [mcv=upgrade-test-v2]
22:49:24 [I] starting node update [old_machine=kube2] [new_machine=kube1]
                [old_version=1.35.1] [new_version=1.34.3]
22:49:24 [I] pulling OCI image [image=ghcr.io/azure/agent-ubuntu2404:v20260409]
22:49:26 [I] OCI image extraction complete                              (2.0s)
22:49:28 [I] downloaded kube binaries (v1.34.3)                         (2.0s)
22:49:30 [I] stopping machine [machine=kube2]
22:49:32 [I] [stop-node] completed                                      (1.9s)
22:49:32 [I] [start-nspawn-machine] started (kube1)
22:49:33 [I] kubelet is active [machine=kube1]
22:49:33 [I] removing machine rootfs [machine=kube2]
22:49:34 [I] node update completed [active_machine=kube1] [version=1.34.3]
22:49:34 [I] reconciliation completed [new_version=1.34.3] [mcv=upgrade-test-v2]

Total repave time: ~10 seconds (22:49:24 -> 22:49:34). Blue/green: kube2 -> kube1.

Post-repave state

Machine status after repave:

status:
  phase: Joining
  message: node update completed
  configuration:
    name: upgrade-test
    version: 2
    versionName: upgrade-test-v2
  conditions:
  - type: NodeUpdated
    status: "True"
    reason: Succeeded
    message: node update completed
    observedGeneration: 12

Node re-registered with new version:

NAME     STATUS   ROLES    AGE   VERSION   INTERNAL-IP
oc-vm3   Ready    <none>   9m    v1.34.3   10.1.0.4

Agent watchers

On startup, agent now runs three concurrent watch loops:

22:43:30 [I] daemon starting [machine_cr=agent] [nspawn_machine=kube2] [applied_version=1.35.1]
22:43:30 [I] Node watch starting [watcher=node] [node=oc-vm3]
22:43:30 [I] watching Node [name=oc-vm3]
22:43:30 [I] watching MachineOperation CRs [machineRef=agent]
22:43:30 [I] watching Machine CR [name=agent]

RBAC

Added to 07-bootstrapper-rbac.yaml:

- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]

New files

  • cmd/agent/internal/daemon/nodewatch.go - Node watch by hostname, enqueues ActionNodeDeleted on delete
  • cmd/agent/internal/daemon/nodewatch_test.go - 2 tests (delete enqueue, hostname error)
  • Updated reconciler.go dispatch for ActionNodeDeleted
  • Updated reconcileUpdateMachine with forceRepave bool parameter (skips operation counter drift check)
  • 2 new reconciler tests (ForceRepaveSkipsDriftCheck, NoDrift_NormalPath)

@bcho

This comment was marked as outdated.

@bcho

This comment was marked as outdated.

description: OperationName is the operation to perform on the target
machine.
enum:
- Reboot
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRD definition is using the values I suggested (Reboot/PowerCycle) but I think the POC is using SoftRestart and HardRestart). I personally prefer reboot/powercycle as I think they're clearer, but either way can we be consistent?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of prefer the Soft/Hard Restart (or Reboot) myself. PowerCycle implies something very specific which may or may not be happening depending on the provider implementation below us.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm certainly open to what others think is clearer!

@plombardi89
Copy link
Copy Markdown
Collaborator

I feel like the unbounded config should move to unbounded machine config or if we want a top-level command unbounded machine-config to express the relationship better. The current relationship unbounded config makes me think it's a high-level system configuration.

@bcho
Copy link
Copy Markdown
Member Author

bcho commented Apr 22, 2026

I feel like the unbounded config should move to unbounded machine config or if we want a top-level command unbounded machine-config to express the relationship better. The current relationship unbounded config makes me think it's a high-level system configuration.

But the config is actually assigned to multiple machines, not sure if putting it as "sub command" under machine causes confusion. @phealy wdyt?

bcho added 2 commits April 22, 2026 10:11
Align naming with PR #59 review feedback (phealy/plombardi89):
- OperationPowerCycle -> OperationHardReboot in API types and CRD
- softRestart -> softReboot in agent executor interface and reconciler
- Update all tests and log messages accordingly
@bcho
Copy link
Copy Markdown
Member Author

bcho commented Apr 22, 2026

E2E Test Results — eacfce0 (PowerCycle→HardReboot / softRestart→softReboot rename)

All 3 tests passed on oc-vm3 with the latest agent binary.

Test Result Duration Details
Node-delete repave PASS ~10s v1.34.3 → v1.35.1, kube1 → kube2, MCV v2 → v3
HardReboot operation PASS (expected fail) <1s Correctly rejected: "handled by machina controller, not the agent"
Soft Reboot operation PASS ~1s kube2 soft rebooted, node re-registered Ready

Command Flows

1. MachineConfiguration + MCV lifecycle

# Create a MachineConfiguration
kubectl apply -f - <<YAML
apiVersion: unbounded-kube.io/v1alpha3
kind: MachineConfiguration
metadata:
  name: upgrade-test
spec:
  template:
    kubernetes:
      version: "v1.34.3"
      nodeLabels:
        kubernetes.azure.com/cluster: bahe-test-nodes
YAML

# Create versioned snapshots (MCVs)
kubectl apply -f - <<YAML
apiVersion: unbounded-kube.io/v1alpha3
kind: MachineConfigurationVersion
metadata:
  name: upgrade-test-v1
  labels:
    unbounded-kube.io/configuration: upgrade-test
spec:
  version: 1
  template:
    kubernetes:
      version: "v1.34.3"
      nodeLabels:
        kubernetes.azure.com/cluster: bahe-test-nodes
YAML

# List configurations and versions
kubectl get machineconfigurations
kubectl get machineconfigurationversions

2. Assign a config version to a Machine

# Point Machine to a specific MCV
kubectl patch machine agent --type=merge \
  -p '{"spec":{"configurationRef":{"name":"upgrade-test","version":3}}}'

# Verify assignment
kubectl get machine agent -o jsonpath='{.spec.configurationRef}'
kubectl get machine agent -o jsonpath='{.status.configuration}'

3. Trigger repave via Node delete (OnDelete strategy)

# Agent detects drift but waits for Node delete signal.
# Delete the Node to trigger repave:
kubectl delete node oc-vm3

# Agent detects deletion -> repaves to target MCV version.
# Node re-registers automatically after repave (~10-15s).
kubectl get node oc-vm3

4. MachineOperations

# Soft reboot (handled by in-VM agent)
kubectl apply -f - <<YAML
apiVersion: unbounded-kube.io/v1alpha3
kind: MachineOperation
metadata:
  name: reboot-1
  labels:
    unbounded-kube.io/machine: agent
spec:
  machineRef: agent
  operationName: Reboot
YAML

# HardReboot (rejected by agent - handled by machina controller)
kubectl apply -f - <<YAML
apiVersion: unbounded-kube.io/v1alpha3
kind: MachineOperation
metadata:
  name: hardreboot-1
  labels:
    unbounded-kube.io/machine: agent
spec:
  machineRef: agent
  operationName: HardReboot
YAML

# Check operation status
kubectl get machineoperations -o wide

5. kubectl unbounded plugin commands

# Create a configuration
kubectl unbounded config create my-config --k8s-version v1.35.1

# List configurations
kubectl unbounded config get

# List versions for a configuration
kubectl unbounded config versions upgrade-test

# Assign a version to a machine
kubectl unbounded config assign upgrade-test --version 3 --machine agent

bcho added 6 commits April 22, 2026 10:44
Remove Shutdown, PowerOff, PowerOn, and RestartService from the
OperationName enum. Agent now silently ignores operations it does not
handle (leaving status untouched for the machina controller) instead
of marking them Failed.
# Conflicts:
#	api/machina/v1alpha3/machine_types.go
#	cmd/agent/internal/daemon/daemon.go
#	cmd/agent/internal/daemon/update.go
#	cmd/agent/internal/phases/nodestart/persist_config.go
#	hack/agent/e2e-kind/e2e.py
…ersion CRDs

Port custom resource type definitions from hbc/agent-op-cr-poc:
- MachineOperation: discrete operations (Reboot, HardReboot) on machines
- MachineConfiguration: deployment-like config profiles with update strategies
- MachineConfigurationVersion: immutable versioned snapshots of configurations
- Machine CR additions: configurationRef, configuration status, NodeUpdated
  condition, and configuration version annotation
The reconciler requires spec.configurationRef to resolve a
MachineConfigurationVersion. Update install_machine_crd to install
MachineConfiguration and MachineConfigurationVersion CRDs, and
update trigger_upgrade to create an MCV CR and set configurationRef
on the Machine CR before bumping the repaveCounter.
bcho added 8 commits April 28, 2026 11:54
Adopt CRD type definitions from #96 (MachineOperation, MachineConfiguration,
MachineConfigurationVersion). Update implementations:

- Rename OperationName -> OperationKind, OperationReboot -> OperationSoftReboot
- Convert RegisterWithTaints from []string to []corev1.Taint in MCV overlay
- Add taint parse/format helpers for kubectl-unbounded
- Remove old operation_types.go (superseded by machineoperation_types.go)
- Add MachineConditionConfigurationPending constant
…tus update

- Skip reconciliation gracefully when Machine CR has no configurationRef
  (e.g. during initial bootstrap before configuration is assigned)
- Re-read Machine CR before final status updates to avoid resourceVersion
  conflicts from concurrent reconciliation triggered by Provisioning phase
  change events
The condition-setting code (condStatus, condReason, SetStatusCondition)
was accidentally removed in a previous edit. Without it, the NodeUpdated
condition stayed at InProgress even after a successful update, causing
the e2e validation to fail.
Move pkg/agent/utilexec back to pkg/agent/internal/utilexec to restore
proper encapsulation. To eliminate the cross-boundary import from
cmd/agent/, expand the executor interface with machineRun and
systemctlRestart methods and implement them on defaultExecutor with
local exec helpers.

Also restore nspawn.conf from origin/main.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants