Skip to content

mana_driver: vf reconfiguration revokes vtl0 vf faster#3164

Open
erfrimod wants to merge 2 commits intomicrosoft:mainfrom
erfrimod:erfrimod/eqe-135-fast-timeout
Open

mana_driver: vf reconfiguration revokes vtl0 vf faster#3164
erfrimod wants to merge 2 commits intomicrosoft:mainfrom
erfrimod:erfrimod/eqe-135-fast-timeout

Conversation

@erfrimod
Copy link
Copy Markdown
Contributor

@erfrimod erfrimod commented Mar 31, 2026

When VF Reconfiguration attempts to revoke the VTL0 VF, it can get stuck attempting to send HWC commands which have no chance of succeeding. This is causing try_notify_guest_and_revoke_vtl0_vf() to timeout, leaving the Guest in an inconsistent state that can cause the Reconfiguration to fail to restore the VTL0 VF.

  • During VF Reconfig, setting hwc_failure to signal that the HWC channel is gone. Avoids waiting for timeouts on HWC commands.
  • test_gdma_reconfig_vf test updated to exercise the new hwc_failure logic.
  • Modify resource destroy to skip teardown for HWC resources when hwc_failure is set. Request would fail, but this way there's only one info trace instead of many ignorable error traces.

@erfrimod erfrimod requested a review from a team as a code owner March 31, 2026 00:06
Copilot AI review requested due to automatic review settings March 31, 2026 00:06
@erfrimod
Copy link
Copy Markdown
Contributor Author

erfrimod commented Mar 31, 2026

Traces from my lab machine. Note the time between EQE and VF in the Guest is now about 1.39888 seconds.
And a full 1 second of that is the VF_DEVICE_DELAY there for older Linux guests.

[20.388549] mana_driver::gdma_driver: INFO  HWC VF reconfiguration event
[20.388797] underhill_core::emuplat::netvsp: INFO  VTL2 VF reconfiguration requested vtl2_vfid=0x1b22c9a5
[20.388870] underhill_core::emuplat::netvsp: WARN  VTL0 VF being removed as a result of VF Reconfiguration. vtl2_vfid=0x1b22c9a5 vtl0_vfid=0x9025b48f
[20.394127] mana_driver::bnic_driver: ERROR  error=Previous hardware failure
[20.394196] netvsp: WARN  Failed setting data path back to synthetic after guest VF was removed. err=Previous hardware failure
[20.394322] netvsp: INFO  sending VF association message available=false
[20.422803] vmbus_client: INFO  received rescind state=Connected channel_id=0xa key={44c4f61d-4444-4400-9d52-802e27ede19f}-{9025b48f-410e-4af3-9674-630776ec0769}-0
[20.423306] vmbus_server: INFO  revoking channel id.offer_id=OfferId(11) key={44c4f61d-4444-4400-9d52-802e27ede19f}-{9025b48f-410e-4af3-9674-630776ec0769}-0
[20.423420] vmbus_server::channels: INFO  rescinding channel from guest channel_id=0x12
[20.423959] vmbus_client: INFO  releasing channel channel_id=0xa key={44c4f61d-4444-4400-9d52-802e27ede19f}-{9025b48f-410e-4af3-9674-630776ec0769}-0
[20.432086] netvsp: INFO  Query data path state
[20.432335] underhill_core::emuplat::netvsp: INFO disconnecting all endpoints{ vtl2_vfid=0x1b22c9a5 num_endpoints=0x1}: Network endpoint disconnected vtl2_vfid=0x1b22c9a5
[20.435049] mana_driver::bnic_driver: ERROR disconnecting all endpoints{ vtl2_vfid=0x1b22c9a5 num_endpoints=0x1}:  error=Previous hardware failure
[20.435140] net_mana: WARN disconnecting all endpoints{ vtl2_vfid=0x1b22c9a5 num_endpoints=0x1}:  failed to stop rx error=Previous hardware failure
[20.435227] mana_driver::resources: INFO disconnecting all endpoints{ vtl2_vfid=0x1b22c9a5 num_endpoints=0x1}:  skipping HWC resource teardown after hardware failure count=0x20
[20.438416] mana_driver::gdma_driver: INFO shutdown vtl2 device{ vtl2_vfid=0x1b22c9a5 keep_vf_alive=false}:  dropping gdma driver self.state_saved=false self.hwc_failure=true
[20.438638] underhill_core::emuplat::netvsp: WARN  Destroying MANA device vtl2_vfid=0x1b22c9a5 error=Previous hardware failure
[20.438737] underhill_core::emuplat::netvsp: INFO  Attempt to reset device via FLR on next teardown. vtl2_vfid=0x1b22c9a5
[20.456663] vmbus_server::channels: INFO  client released channel dropped_ratelimited=0xc offer_id=OfferId(11) key={44c4f61d-4444-4400-9d52-802e27ede19f}-{9025b48f-410e-4af3-9674-630776ec0769}-0
[20.550302] vfio-pci f4f2:00:00.0: All device reset methods disabled by user
[20.551057] user_driver::vfio: INFO new_mana_vfio_device{ vtl2_vfid=0x1b22c9a5 pci_id="f4f2:00:00.0"}:  device arrived pci_id="f4f2:00:00.0" keepalive=false
[20.561731] vfio-pci f4f2:00:00.0: vfio-noiommu device opened by user (tp:51)
[20.563593] underhill_core::emuplat::netvsp: INFO  Creating MANA device vtl2_vfid=0x1b22c9a5 pci_id="f4f2:00:00.0"
[20.569159] mana_driver::gdma_driver: INFO new_mana_device{ vtl2_vfid=0x1b22c9a5 pci_id="f4f2:00:00.0"}:new_gdma_driver:  created HWC eq_id=0x10 msix=0
[20.569404] mana_driver::gdma_driver: INFO new_mana_device{ vtl2_vfid=0x1b22c9a5 pci_id="f4f2:00:00.0"}:new_gdma_driver:  Max VF resources: GdmaQueryMaxResourcesResp { status: 0, max_sq: 4096, max_rq: 4096, max_cq: 4096, max_eq: 256, max_db: 4096, max_mst: 16384, max_cq_mod_ctx: 2, max_mod_cq: 16, max_msix: 6 }
[20.569773] mana_driver::gdma_driver: INFO new_mana_device{ vtl2_vfid=0x1b22c9a5 pci_id="f4f2:00:00.0"}:  GDMA PF capability flags gdma_protocol_ver=0x1 pf_cap_flags1=0x1d pf_cap_flags2=0x0 pf_cap_flags3=0x0 pf_cap_flags4=0x0
[20.570132] mana_driver::mana: INFO new_mana_device{ vtl2_vfid=0x1b22c9a5 pci_id="f4f2:00:00.0"}:  mana_dev_config=ManaQueryDeviceCfgResp { pf_cap_flags1: BasicNicDriverFlags { query_link_status: 1, ethertype_enforcement: 1, query_filter_state: 1, reserved: 1 }, pf_cap_flags2: 0, pf_cap_flags3: 0, pf_cap_flags4: 0, max_num_vports: 1, reserved: 0, max_num_eqs: ffffffff }
[20.570871] underhill_core::emuplat::netvsp: INFO connecting endpoints{ vtl2_vfid=0x1b22c9a5 num_endpoints=0x1}:  Network endpoint connected vtl2_vfid=0x1b22c9a5 mac_address=02-04-2f-0f-3b-e1 adapter_index=1
[20.571059] underhill_core::emuplat::netvsp: INFO  VTL2 device restarted after VF reconfiguration vtl2_vfid=0x1b22c9a5 attempts=0x1
[20.581122] netvsp: INFO  Query data path state is_data_path_switched=false
[20.582643] mana_driver::gdma_driver: INFO  retargeting EQ 1 to cpu: 3
[20.582826] mana_driver::gdma_driver: INFO  retargeting EQ 2 to cpu: 0
[20.584201] mana_driver::gdma_driver: INFO  retargeting EQ 3 to cpu: 1
[20.585701] mana_driver::gdma_driver: INFO  retargeting EQ 0 to cpu: 2
[20.585893] netvsp: INFO  sending VF association message available=true serial_number=0x9025b48f
[21.586765] underhill_core::emuplat::netvsp: INFO  Adding VF to VTL0 vtl2_vfid=0x1b22c9a5 vtl0_vfid=0x9025b48f
[21.588662] vmbus_client: INFO  received offer state=Connected channel_id=0xa interface_id=44c4f61d-4444-4400-9d52-802e27ede19f instance_id=9025b48f-410e-4af3-9674-630776ec0769 subchannel_index=0x0
[21.588867] vmbus_server::channels: INFO  sending offer to guest channel_id=0x12 connection_id=0x2012 key={44c4f61d-4444-4400-9d52-802e27ede19f}-{9025b48f-410e-4af3-9674-630776ec0769}-0
[21.588967] vmbus_server::channels: INFO  new channel offer_id=OfferId(11) key={44c4f61d-4444-4400-9d52-802e27ede19f}-{9025b48f-410e-4af3-9674-630776ec0769}-0 confidential_ring_buffer=false confidential_external_memory=false
[21.616085] vmbus_client: INFO  opening channel on host channel_id=0xa key={44c4f61d-4444-4400-9d52-802e27ede19f}-{9025b48f-410e-4af3-9674-630776ec0769}-0
[21.616515] vmbus_server::channels: INFO  opened channel dropped_ratelimited=0x8 offer_id=0x11 channel_id=0x12 key={44c4f61d-4444-4400-9d52-802e27ede19f}-{9025b48f-410e-4af3-9674-630776ec0769}-0 result=0
[21.787429] mana_driver::mana: INFO  switch data path for mac mac_address=02-04-2f-0f-3b-e1 direction_to_vtl0=0x1 hwc_activity_id=0x7b530030

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves VF Reconfiguration handling for the MANA driver / Underhill NetVSP path by treating the HWC channel as unavailable immediately after the VF reconfig EQE, avoiding long teardown timeouts and reducing noisy failing teardown attempts.

Changes:

  • Set hwc_timeout_in_ms to 0 on VF reconfiguration EQE to make subsequent HWC waits fail fast, and gate timeout reporting to avoid extra work when timeout is 0.
  • Update driver teardown behavior to skip HWC resource teardown after hwc_failure is detected.
  • Update test_gdma_reconfig_vf to validate the new “timeout becomes 0 after EQE 135” behavior and that teardown fails fast.
  • In VF reconfig handling, remove the VTL0 VF via remove_vtl0_vf() and clear saved datapath/filter state.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
vm/devices/net/mana_driver/src/tests.rs Extends VF reconfig test to assert timeout transitions to 0 and hwc_failure behavior during deregister.
vm/devices/net/mana_driver/src/resources.rs Skips HWC teardown commands when hwc_failure is set to avoid repeated failing requests/log spam.
vm/devices/net/mana_driver/src/gdma_driver.rs Sets timeout to 0 on VF reconfig EQE; adds early timeout exit; preserves 0 timeout through deregister; exposes hwc_failure() and test getter.
openhcl/underhill_core/src/emuplat/netvsp.rs Switches VF reconfig VTL0 VF removal path to remove_vtl0_vf() and clears saved direction_to_vtl0 state.

Comment on lines +66 to +68
// When HWC has already failed, skip sending teardown commands for HWC resources:
// DmaRegion, Eq, BnicQueue. HWC requests all fail: "Previous hardware failure".
// Device should reclaim resources on its own reset.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend moving this comment down to the '_ if skip_hwc ...' code because this one line will be easy to miss when skimming the code, so the comment will help draw attention to it

@github-actions
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@ben-zen ben-zen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with the change Brian suggested to the comment location. That's a subtle choice, calling it out with a comment makes the logic clear.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment on lines +68 to +69
tracing::info!(
count = self.resources.len(),
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log field count = self.resources.len() includes MemoryBlock entries, but the message says "skipping HWC resource teardown". Consider either counting only the HWC-backed resources or renaming the field/message so the telemetry can be interpreted accurately during failures.

Suggested change
tracing::info!(
count = self.resources.len(),
let hwc_resource_count = self
.resources
.iter()
.filter(|r| {
matches!(
r,
Resource::DmaRegion { .. }
| Resource::Eq { .. }
| Resource::BnicQueue { .. }
)
})
.count();
tracing::info!(
hwc_resource_count = hwc_resource_count,

Copilot uses AI. Check for mistakes.
self.report_hwc_timeout(wait_failed, interrupt_loss, eqe_wait_result.elapsed as u32)
// Don't report the timeout once VF reconfiguration is pending,
// since the SoC will not respond.
if !self.vf_reconfiguration_pending {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to keep a consistent check. In other places we are checking for hwc_failure. We should do the same here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, if there is hwc_failure, why not return an error?

Copy link
Copy Markdown
Contributor Author

@erfrimod erfrimod Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would lose reports if the wait times out (hwc_failure set), then eqe is found, and wait_failed is false. Soc is alive, but slow. This case is reflected in the check at the end of the function where if a reconfig is not pending, hwc_failure is set back to false. On one hand, making the change is entirely safe because all we lose is a log sent to soc. On the other hand, I think the intent of the log is to help the soc diagnose when responses are slow and timing out.

Edit: It's a little frustrating the hwc_failure is set to true and then back. A stronger design might have multiple states, or maybe a bool to track hwc_timeout that could get cleared...

// When HWC has already failed, skip sending teardown commands for HWC resources:
// DmaRegion, Eq, BnicQueue. HWC requests all fail: "Previous hardware failure".
// Device should reclaim resources on its own reset.
_ if skip_hwc => continue,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need to do anything here. GDMA driver has the logic to handle what to do when HWC has been marked for failure (during reconfig) and will handle this. So, for example, when this code makes a call to disable EQ or DMA region, gdma driver will error out if HWC failure is set to true (which will be in the case of reconfig)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing on my lab machine, this check removed a dozen or so ignorable "previous hardware failure" traces. They are expected, but I would prefer future failure triage doesn't see them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants