Skip to content

burette: add TAP-backed network throughput test#3174

Draft
jstarks wants to merge 4 commits intomicrosoft:mainfrom
jstarks:perf_tests_tap
Draft

burette: add TAP-backed network throughput test#3174
jstarks wants to merge 4 commits intomicrosoft:mainfrom
jstarks:perf_tests_tap

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented Apr 1, 2026

The Consomme userspace networking stack is convenient but does not exercise the kernel datapath, making it a poor predictor of real-world VM networking performance. Add a TAP backend so burette can measure iperf3 throughput over a kernel TAP device in an isolated user/network namespace, giving a more realistic comparison point without requiring root privileges.

The TAP backend runs an iperf3 helper as a mesh child process that calls unshare(CLONE_NEWUSER | CLONE_NEWNET) while still single-threaded, creates and configures the TAP device inside the namespace, then serves iperf3 server instances via mesh RPC. The guest side detects NICs by MAC address rather than relying on device enumeration order, since VMBus GUID ordering is nondeterministic.

TapHandle in net_backend_resources is changed from carrying a device name string to a pre-opened OwnedFd, resolving an existing TODO and letting the opener (openvmm_entry or the burette helper) control namespace and permission details. The tap module is gated on cfg(unix) accordingly.

Select the backend with --backend consomme|tap alongside the existing --nic vmbus|virtio-net flag.

Copilot AI review requested due to automatic review settings April 1, 2026 18:50
@github-actions github-actions bot added the unsafe Related to unsafe code label Apr 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a TAP-backed networking mode to the burette iperf3 throughput benchmark so it can exercise the host kernel datapath (via a TAP device in an isolated user+net namespace), and updates the TAP resource plumbing to pass a pre-opened fd instead of a device-name string.

Changes:

  • Switch TapHandle (net backend resources) to carry a pre-opened OwnedFd, and update net_tap resolver + openvmm_entry parsers to open TAP fds at the caller.
  • Extend burette’s network test to support --backend consomme|tap, launching an iperf3 mesh helper subprocess (with a Linux unshare variant) and adding TAP NIC wiring + MAC-based NIC detection in the guest.
  • Add new burette helper module + Linux-only deps to support namespace/TAP setup and mesh RPC control of iperf3 servers.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
vm/devices/net/net_tap/src/resolver.rs Updates TAP endpoint resolver to wrap a pre-opened TAP fd.
vm/devices/net/net_backend_resources/src/lib.rs Changes TAP handle to OwnedFd and gates TAP module behind cfg(unix).
petri/burette/src/tests/network.rs Adds NetBackend selection, helper-driven iperf3 server execution, and Linux TAP backend VM/guest setup.
petri/burette/src/main.rs Adds helper subprocess entrypoints and exposes --backend CLI flag.
petri/burette/src/iperf_helper.rs New mesh RPC helper subprocess for iperf3 server management + Linux TAP namespace setup.
petri/burette/Cargo.toml Adds mesh + Linux-only networking/TAP-related dependencies.
openvmm/openvmm_entry/src/ttrpc/mod.rs Opens TAP device by name and passes the resulting fd via TapHandle.
openvmm/openvmm_entry/src/lib.rs Opens TAP device by name for CLI endpoint config and passes the fd via TapHandle.
openvmm/openvmm_entry/Cargo.toml Adds net_tap dependency on Unix.
Cargo.lock Updates lockfile for new dependency edges.

Comment on lines +360 to +363

// Brief delay to let the server bind.
std::thread::sleep(std::time::Duration::from_millis(500));

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_iperf3 is async but uses std::thread::sleep, which blocks the executor thread and can skew perf timings or stall unrelated async work. Use an async delay (e.g., pal_async::timer::PolledTimer with a stored/cloned DefaultDriver) or another non-blocking readiness mechanism (poll for port open) instead of blocking sleep.

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +107
IperfRequest::SetupTap(rpc) => {
rpc.handle(async |config| {
linux::setup_tap_device(&config.name, &config.cidr)
.expect("failed to set up TAP device")
})
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TAP setup RPC handler uses expect("failed to set up TAP device"), which will panic the helper process on common failures (permissions, /dev/net/tun missing, ioctl errors). Return an error to the caller instead so the parent can surface a useful message and tear down cleanly.

Copilot uses AI. Check for mistakes.
@jstarks jstarks marked this pull request as ready for review April 1, 2026 21:53
@jstarks jstarks requested a review from a team as a code owner April 1, 2026 21:53
Copilot AI review requested due to automatic review settings April 1, 2026 21:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Comment on lines +489 to +492
c.with_custom_config(|config| match nic {
super::NicBackend::Vmbus => add_tap_nic(config, tap_fd),
super::NicBackend::VirtioNet => add_virtio_tap_nic(config, tap_fd),
})
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tap_fd (OwnedFd) is moved into both match arms inside the with_custom_config closure, which won’t compile because the compiler can’t prove only one branch executes. Restructure so the fd is consumed exactly once (e.g., pick the closure/function based on nic outside the match, or use an if/match that moves tap_fd in a single expression).

Suggested change
c.with_custom_config(|config| match nic {
super::NicBackend::Vmbus => add_tap_nic(config, tap_fd),
super::NicBackend::VirtioNet => add_virtio_tap_nic(config, tap_fd),
})
let add_nic = match nic {
super::NicBackend::Vmbus => add_tap_nic,
super::NicBackend::VirtioNet => add_virtio_tap_nic,
};
c.with_custom_config(move |config| add_nic(config, tap_fd))

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +41
guid.workspace = true
libc.workspace = true
net_backend_resources.workspace = true
net_tap.workspace = true
netvsp_resources.workspace = true
virtio_resources.workspace = true
vm_resource.workspace = true
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guid, virtio_resources, and vm_resource are already in [dependencies], but they’re repeated under [target.'cfg(target_os = "linux")'.dependencies]. This duplication is unnecessary and can be removed (or, if the intent is to make them Linux-only, they should be removed from the non-target section instead).

Suggested change
guid.workspace = true
libc.workspace = true
net_backend_resources.workspace = true
net_tap.workspace = true
netvsp_resources.workspace = true
virtio_resources.workspace = true
vm_resource.workspace = true
libc.workspace = true
net_backend_resources.workspace = true
net_tap.workspace = true
netvsp_resources.workspace = true

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Copilot AI review requested due to automatic review settings April 2, 2026 00:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

}
#[cfg(target_os = "linux")]
NetBackend::Tap => {
let tap_fd = tap_fd.unwrap();
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tap_fd.unwrap() will panic if TAP setup failed or if logic changes in the future. Since this code is part of the test harness setup path, return a structured error instead of panicking (e.g., convert the Option to an error with context).

Suggested change
let tap_fd = tap_fd.unwrap();
let tap_fd = tap_fd
.context("TAP backend selected but TAP fd was not provided")?;

Copilot uses AI. Check for mistakes.
@jstarks jstarks marked this pull request as draft April 2, 2026 02:14
jstarks added 4 commits April 2, 2026 16:35
TapHandle in net_backend_resources carried a device name string and
the resolver opened the TAP device. Change it to carry a pre-opened
OwnedFd so the opener (openvmm_entry or a test harness) controls
namespace and permission details. This resolves a long-standing TODO
and enables fd passing from Kata and similar environments.

Also tighten both TapHandle and net_tap from cfg(unix) to
cfg(target_os = "linux") since TAP uses Linux-specific /dev/net/tun
ioctls.
The Consomme userspace networking stack is convenient but does not
exercise the kernel datapath, making it a poor predictor of real-world
VM networking performance.  Add a TAP backend so burette can measure
iperf3 throughput over a kernel TAP device in an isolated user/network
namespace, giving a more realistic comparison point without requiring
root privileges.

The TAP backend runs an iperf3 helper as a mesh child process that
calls unshare(CLONE_NEWUSER | CLONE_NEWNET) while still single-threaded,
creates and configures the TAP device inside the namespace, then serves
iperf3 server instances via mesh RPC.  The guest side detects NICs by
MAC address rather than relying on device enumeration order, since VMBus
GUID ordering is nondeterministic.

TapHandle in net_backend_resources is changed from carrying a device
name string to a pre-opened OwnedFd, resolving an existing TODO and
letting the opener (openvmm_entry or the burette helper) control
namespace and permission details.  The tap module is gated on
cfg(unix) accordingly.

Select the backend with `--backend consomme|tap` alongside the existing
`--nic vmbus|virtio-net` flag.
- Replace panicking expect/assert with error propagation via
  FailableRpc and handle_failable for iperf3 spawn failures,
  TAP setup errors, and interface name validation
- Fix PCIe port name from 'rp1' to 's0rc0rp1' to match
  with_pcie_root_topology naming scheme
- Tighten TapHandle cfg gate from cfg(unix) to
  cfg(target_os = "linux") since it uses Linux-specific ioctls
The TAP path was still using apk to install iperf3, but the VM now
boots with linux_direct (not Alpine). Attach the erofs image in the
TAP builder and use chroot for iperf3, same as Consomme. This also
removes the use_chroot flag since both backends now use chroot.
Copilot AI review requested due to automatic review settings April 2, 2026 17:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Comment on lines +205 to +206
// Consomme uses this for iperf3; TAP currently uses Alpine + apk
// since the TAP NIC setup needs networking tools not in the erofs.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the TAP backend “uses Alpine + apk”, but this test still boots linux_direct (build_firmware), and the TAP guest networking setup uses ip/ifconfig/udhcpc from the initrd shell. Update/remove this comment so it reflects the actual guest image/tooling path to avoid misleading future maintainers.

Suggested change
// Consomme uses this for iperf3; TAP currently uses Alpine + apk
// since the TAP NIC setup needs networking tools not in the erofs.
// Consomme uses this for iperf3. The TAP backend still boots via
// `linux_direct` (`build_firmware`) and uses networking tools
// (ip/ifconfig/udhcpc) from the initrd shell rather than from this erofs.

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +105
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
return Err(format!(
"iperf3 exited with {}: {}",
output.status,
stderr.trim()
));
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper treats any non-zero iperf3 server exit status as a hard error and discards stdout, but the previous in-process implementation explicitly noted that the server may exit non-zero while still producing valid JSON results. To avoid flakiness, consider parsing stdout when it’s non-empty (and log/warn on the status) instead of failing solely on !status.success().

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants