Skip to content

feat: honor a declared primary NIC in the explored boot default#2721

Merged
chet merged 1 commit into
NVIDIA:mainfrom
chet:explored-default-declared-primary
Jun 22, 2026
Merged

feat: honor a declared primary NIC in the explored boot default#2721
chet merged 1 commit into
NVIDIA:mainfrom
chet:explored-default-declared-primary

Conversation

@chet

@chet chet commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

What

The Store-A / site-explorer counterpart to #2657: a declared ExpectedHostNic.primary now wins in the explored boot default too -- the boot interface site-explorer records in explored_endpoints before any machine owns the endpoint, which a preingestion admin action (machine_setup, set_dpu_first_boot_order, Restore) reads.

fetch_host_primary_interface_mac (api-model/src/site_explorer/mod.rs) filtered NICs to DPU host-PF MACs, then picked lowest-PCI -- so it structurally couldn't name an integrated NIC. A host declaring an integrated NIC as primary got a DPU NIC (or nothing) as its explored default.

Now the picker takes a declared_primary: Option<MacAddress> and returns it when this report has that NIC as a full pair (reusing the existing find_interface_id_for_mac, which scans every system ethernet interface -- integrated NICs included), whatever the NIC type. The site-explorer caller (lib.rs) feeds it the matched expected machine's declared primary via ExpectedMachineData::declared_primary_mac() (the resolver from #2657). Absent a declaration, today's lowest-PCI DPU-PF pick stands, byte-for-byte.

Net: the explored default and the managed store (#2657) agree on the declaration across the ownership handoff, completing "declared primary is load-bearing everywhere."

Notes

  • Independent of Honor a host's declared primary interface when picking its boot device #2657 (different code path); narrow impact -- pre-ownership / preingestion admin actions only.
  • Loosens fetch_host_primary_interface_mac from its DPU-only assumption; the only caller is lib.rs (the exploration loop) and the existing test, both updated. The no-declaration path is unchanged, so DPU-PF-assuming behavior is preserved.

Test plan

  • cargo clippy --all-features --tests clean over the touched crates; cargo +nightly-2026-06-16 fmt --all.
  • Extended test_fetch_host_primary_interface_mac: declared DPU host-PF wins over the lowest-PCI pick; declared integrated NIC becomes the default (the headline case the DPU-only pick could never produce); a declared MAC absent from the report falls back to the DPU pick; no declaration → unchanged.

Part of #2662 (epic #2660).

NVIDIA#2657 made a declared ExpectedHostNic.primary authoritative in
machine_interfaces, the managed store every real boot path reads. This
closes the matching gap in the explored default: the boot interface
site-explorer records in explored_endpoints before any machine owns the
endpoint, which a preingestion admin action (machine_setup, Restore) reads.
That default came from an automatic, DPU-centric pick that filters to DPU
host-PF MACs, so it could never name an integrated NIC -- a host that
declared an integrated NIC as primary got a DPU NIC (or nothing) as its
explored default. Now the declaration wins there too.

fetch_host_primary_interface_mac takes the declared primary and returns it
when this report has that NIC as a full pair (any type -- integrated or DPU
host-PF); absent a declaration, today's lowest-PCI DPU-PF pick stands. The
site-explorer caller feeds it the matched expected machine's declared
primary, reusing NVIDIA#2657's declared_primary_mac(). The explored default and
the managed store now agree on the declaration across the ownership handoff.

- fetch_host_primary_interface_mac gains a declared_primary arg; a declared
  NIC present in the report wins whatever its type, via the existing
  find_interface_id_for_mac (which scans every system ethernet interface).
- The exploration loop passes the matched expected machine's declared
  primary; the explored-default capture (id resolve / last-known-good) is
  untouched.
- No declaration -> the explored default is unchanged from today.

Tests extend test_fetch_host_primary_interface_mac: a declared DPU host-PF
wins over the lowest-PCI pick, a declared integrated NIC becomes the default,
a declared MAC absent from the report falls back, and the no-declaration
pick is unchanged.

Part of NVIDIA#2662 (epic NVIDIA#2660).

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 21, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a9a8e625-c2bf-4d3e-8791-7b8945347ea0

📥 Commits

Reviewing files that changed from the base of the PR and between 611e96b and 796aa33.

📒 Files selected for processing (3)
  • crates/api-model/src/site_explorer/mod.rs
  • crates/site-explorer/src/lib.rs
  • crates/site-explorer/tests/site_explorer.rs

Summary by CodeRabbit

  • Improvements
    • Host boot interface selection now respects operator-declared primary MAC addresses when available
    • System validates declared addresses against available interfaces and falls back to automatic selection if unavailable
    • Enhanced test coverage ensures reliable behavior across multiple selection scenarios

Walkthrough

fetch_host_primary_interface_mac gains a declared_primary: Option<MacAddress> parameter and an early-return branch that returns the declared MAC when find_interface_id_for_mac confirms a valid Redfish interface for it. identify_managed_hosts is updated to pass declared_primary_mac() from the matched expected machine into that method. Tests are expanded to cover four selection scenarios.

Changes

Declared Primary MAC Override in Boot Interface Selection

Layer / File(s) Summary
fetch_host_primary_interface_mac signature and early-return branch
crates/api-model/src/site_explorer/mod.rs
Adds declared_primary: Option<MacAddress> parameter; inserts an early-return path that calls find_interface_id_for_mac and returns the declared MAC when a complete Redfish interface is found, bypassing PCI-path selection.
identify_managed_hosts wiring
crates/site-explorer/src/lib.rs
Derives declared_primary from declared_primary_mac() on the matched expected machine and threads it into the updated fetch_host_primary_interface_mac call site.
Multi-scenario test suite
crates/site-explorer/tests/site_explorer.rs
Expands the single no-declaration assertion to four cases: automatic lowest-PCI DPU host-PF, declared DPU host-PF override, declared integrated NIC override, and an absent declared MAC reverting to automatic selection.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

  • NVIDIA/infra-controller#2690: Introduces ExpectedMachineData::declared_primary_mac(), which this PR consumes in identify_managed_hosts to populate the new declared_primary argument.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the primary change: honoring declared primary NICs in the explored boot default.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem, solution, and implementation details with appropriate context and references.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@chet

chet commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai PTAL, thank you!

@chet

chet commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh pr view 2721 --repo NVIDIA/infra-controller --json title,body,files,additions,deletions 2>/dev/null | head -100 || echo "Cannot access repo"
gh pr diff 2721 --repo NVIDIA/infra-controller 2>/dev/null | head -600 || echo "Cannot fetch diff"

Length of output: 7900


🏁 Script executed:

#!/bin/bash
# Fetch PR files list
gh pr view 2721 --repo NVIDIA/infra-controller --json files --jq '.files[].path' 2>/dev/null || echo "Cannot list files"

Length of output: 317


🏁 Script executed:

#!/bin/bash
# Read find_interface_id_for_mac implementation and any other relevant context
rg -n "find_interface_id_for_mac" --type rust -C5

Length of output: 6800



@chet — clean, focused change. The implementation is correct and the test suite covers all four branches thoroughly. One observation worth considering:

find_interface_id_for_mac imposes a "full pair" requirement (MAC + non-empty Redfish ID)

// find_interface_id_for_mac (line 169-174)
.find(|e| e.mac_address == Some(mac))
.and_then(|e| e.id.as_deref().filter(|id| !id.is_empty()))  // empty id → None

So in fetch_host_primary_interface_mac, a declared MAC that is present in ethernet_interfaces but whose Redfish interface ID is missing or empty will silently fall back to the DPU-PF pick. The inline comment says "as a full pair" but the public doc comment on fetch_host_primary_interface_mac doesn't expose this nuance — a caller reading only the doc comment would expect any in-report MAC to win.

This is deliberately correct — lib.rs calls find_interface_id_for_mac again immediately after (line 1341) and needs the ID for MachineBootInterface, so requiring the ID upfront avoids a two-step failure. But the public doc could be tightened:

/// A declared `ExpectedHostNic.primary` wins when this report has that NIC
/// *as a full pair* (MAC present in `ethernet_interfaces` with a non-empty
/// Redfish interface id), whatever its type ...

Everything else looks production-ready: the if-let chain is idiomatic, the fallback path is byte-for-byte preserved, and the signature change is fully contained within the updated callers. No blocking concerns.

🐇

@chet chet marked this pull request as ready for review June 21, 2026 16:20
@chet chet requested a review from a team as a code owner June 21, 2026 16:20
@github-actions

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 264 6 23 99 6 130
machine-validation-runner 704 34 183 258 35 194
machine_validation 704 34 183 258 35 194
nvmetal-carbide 704 34 183 258 35 194
TOTAL 2382 108 572 879 111 712

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@chet chet merged commit 461797d into NVIDIA:main Jun 22, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants