You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Write an operator's guide to boot interface selection under docs/, explaining the behind-the-scenes flows that decide which NIC a machine boots from -- and why -- across the explored, predicted, and managed stages. Include a flow diagram.
This is the doc-facing capstone of epic #2660 (boot-interface standardization). It should be accurate to the finished epic, so it is scheduled after #2659 and #2668 land (both change selection logic -- see Dependencies).
Audience & scope
For operators / on-call: enough to reason about "why did this machine pick this NIC to boot from, and how do I steer it?" Not an internal design doc -- favor the observable model, the declaration knobs, and the admin endpoints.
Outline (draft)
Core model
MachineBootInterface = (MAC + Redfish interface_id); the "full pair."
primary_interface is the boot interface by construction (no separate is_boot flag); derived projections.
Store A (explored_endpoints, the pre-ownership explored default) vs Store B (machine_interfaces, authoritative once a machine owns the endpoint).
The selection lifecycle (the spine of the doc)
Explore (site-explorer): the explored default -- fetch_host_primary_interface_mac (declared ExpectedHostNic.primary > lowest-PCI DPU host-PF), complete_boot_interfaces, and last-known-good / retained behavior.
Predict (predicted_machine_interfaces, the pre-first-lease window): pick_boot_prediction; how predictions are minted and the admin-primary invariant.
Own (handoff): predicted -> managed; promotion onto machine_interfaces; one_primary_interface_per_machine partial unique index; the NULL-ownership window.
site-explorer <-> machine-controller interactions -- who computes what, when; how the explored default feeds preingestion actions and how the controller takes over post-ownership.
Admin endpoints -- resolve_admin_boot_interface_target, machine_setup, set_dpu_first_boot_order / set_host_boot_order, BIOS/boot-order config, and how an operator overrides a pick.
Refresh the architecture context first (nico_architecture/boot_provisioning_setup + expected_explored_managed, which predate this epic's PRs), then write against verified current code.
Done when
A reviewed docs/ guide covers items 1-7 with a flow diagram, accurate to merged epic code, and an operator can answer "which NIC will this machine boot from, and how do I change it?" from the doc alone.
Goal
Write an operator's guide to boot interface selection under
docs/, explaining the behind-the-scenes flows that decide which NIC a machine boots from -- and why -- across the explored, predicted, and managed stages. Include a flow diagram.This is the doc-facing capstone of epic #2660 (boot-interface standardization). It should be accurate to the finished epic, so it is scheduled after #2659 and #2668 land (both change selection logic -- see Dependencies).
Audience & scope
For operators / on-call: enough to reason about "why did this machine pick this NIC to boot from, and how do I steer it?" Not an internal design doc -- favor the observable model, the declaration knobs, and the admin endpoints.
Outline (draft)
Core model
MachineBootInterface= (MAC + Redfishinterface_id); the "full pair."primary_interfaceis the boot interface by construction (no separateis_bootflag); derived projections.explored_endpoints, the pre-ownership explored default) vs Store B (machine_interfaces, authoritative once a machine owns the endpoint).The selection lifecycle (the spine of the doc)
fetch_host_primary_interface_mac(declaredExpectedHostNic.primary> lowest-PCI DPU host-PF),complete_boot_interfaces, and last-known-good / retained behavior.predicted_machine_interfaces, the pre-first-lease window):pick_boot_prediction; how predictions are minted and the admin-primary invariant.machine_interfaces;one_primary_interface_per_machinepartial unique index; the NULL-ownership window.pick_boot_interfaceprecedence -- declared > DPU-takeover > lowest-MAC non-underlay.site-explorer <-> machine-controller interactions -- who computes what, when; how the explored default feeds preingestion actions and how the controller takes over post-ownership.
Retained boot interfaces -- what "retained" means, when it is kept vs cleared (and the force-delete / re-ingest interaction; power-cycle from feat(site-explorer): power cycle [not just Dell] to apply a queued NIC mode change #2367).
Admin endpoints --
resolve_admin_boot_interface_target,machine_setup,set_dpu_first_boot_order/set_host_boot_order, BIOS/boot-order config, and how an operator overrides a pick.Declared primary precedence (the Honor a host's declared primary interface when picking its boot device #2657 / Resolve the boot interface from predictions in the machine-controller #2658 / Honor a declared primary interface when computing the explored boot default #2662 work) --
ExpectedHostNic.primarywins across all three stores.DPU mode effects -- DpuMode / NicMode / NoDpu (zero-DPU) and how each changes selection, including booting a declared integrated NIC while DPUs stay managed (Boot from a declared integrated NIC while keeping its DPUs managed #2668).
Flow diagram -- explored -> predicted -> managed across the actors (site-explorer, machine-controller, admin API), likely mermaid.
Dependencies (why this is scheduled last)
nic_type->expected_network_segment_type): changes how a host NIC's segment type is declared, which the non-underlay filter in selection reads. The doc's "declaration knobs" section must reflect the final shape.nico_architecture/boot_provisioning_setup+expected_explored_managed, which predate this epic's PRs), then write against verified current code.Done when
A reviewed
docs/guide covers items 1-7 with a flow diagram, accurate to merged epic code, and an operator can answer "which NIC will this machine boot from, and how do I change it?" from the doc alone.Part of #2660.