Skip to content

[Testing] Use diag_client to get OpenHCL logs#3179

Open
tjones60 wants to merge 9 commits intomicrosoft:mainfrom
tjones60:newwmi_diagclient
Open

[Testing] Use diag_client to get OpenHCL logs#3179
tjones60 wants to merge 9 commits intomicrosoft:mainfrom
tjones60:newwmi_diagclient

Conversation

@tjones60
Copy link
Copy Markdown
Contributor

@tjones60 tjones60 commented Apr 2, 2026

See how much faster it is to use diag_client instead of COM3

@tjones60 tjones60 requested a review from a team as a code owner April 2, 2026 07:01
Copilot AI review requested due to automatic review settings April 2, 2026 07:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Petri’s VM backends to collect OpenHCL kmsg logs via diag_client (instead of relying on serial/COM3), and refactors the Hyper-V backend to create VMs using a new high-performance New-CustomVM PowerShell cmdlet.

Changes:

  • Add OpenHCL kmsg log streaming via diag_client over hybrid-vsock in the OpenVMM backend.
  • Refactor Hyper-V VM creation to use New-CustomVM with a new Rust argument model (HyperVNewCustomVMArgs), including guest state + storage/controller setup.
  • Extend VMGS helpers to expose the backing Disk and encryption policy directly.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
petri/src/vm/openvmm/construct.rs Spawns an openhcl-log task using diag_client over the VTL2 hybrid-vsock path; serial setup adjusted.
petri/src/vm/mod.rs Adds Firmware::openhcl_firmware() helper and expands PetriVmgsResource accessors (vmgs(), disk(), encryption_policy()).
petri/src/vm/hyperv/vm.rs Changes Hyper-V VM construction to accept HyperVNewCustomVMArgs and relies on New-CustomVM behavior for COM pipe setup.
petri/src/vm/hyperv/powershell.rs Introduces HyperVNewCustomVMArgs + compatibility logic, and adds run_new_customvm() plus supporting enums/flags.
petri/src/vm/hyperv/mod.rs Reworks Hyper-V backend to build HyperVNewCustomVMArgs, wire up diag-based OpenHCL logging, and use COM pipe paths.
petri/src/vm/hyperv/hyperv.psm1 Adds New-CustomVM implementation and helper conversions (IMC + VTL2 settings + drive resources).
petri/Cargo.toml Adds bitfield-struct dependency for Hyper-V management VTL feature flags.
Cargo.lock Locks bitfield-struct inclusion.

Comment on lines +616 to 621
// manually overrride for testing
let use_serial2 = false;

let serial2 = if use_serial2 {
let (serial2_host, serial2) = self
.create_serial_stream()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use_serial2 test override hard-codes serial2 (OpenHCL serial logging) off. If serial2 logging is intentionally being removed in favor of diag_client, please delete this override plumbing entirely; otherwise wire it to a real runtime/config flag (or restore the previous self.firmware.is_openhcl() gating) so behavior isn’t permanently changed by a local testing constant.

Copilot uses AI. Check for mistakes.
serial_tasks.push(serial0_task);

let serial2 = if self.firmware.is_openhcl() {
// manually overrride for testing
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: “overrride” should be “override”.

Suggested change
// manually overrride for testing
// manually override for testing

Copilot uses AI. Check for mistakes.
Comment on lines +278 to 303
// Attempt to enable COM3 and use that to get KMSG logs, otherwise
// fall back to use diag_client.
let _supports_com3 = {
// Hyper-V VBS VMs don't work with COM3 enabled.
// Hypervisor support is needed for this to work.
let is_not_vbs = !matches!(config.firmware.isolation(), Some(IsolationType::Vbs));

// The Hyper-V serial device for ARM doesn't support additional
// serial ports yet.
let is_x86 = matches!(config.arch, MachineArch::X86_64);

// The registry key to enable additional COM ports is only
// available in newer builds of Windows.
let current_winver = windows_version::OsVersion::current();
tracing::debug!(?current_winver, "host windows version");
// This is the oldest working build used in CI
// TODO: determine the actual minimum version
const COM3_MIN_WINVER: u32 = 27813;
let is_supported_winver = current_winver.build >= COM3_MIN_WINVER;

properties.is_openhcl && is_not_vbs && is_x86 && is_supported_winver
};

// TODO: only increase VTL2 memory on debug builds
vm.set_openhcl_firmware(
&igvm_file,
// don't increase VTL2 memory on CVMs
!matches!(
guest_state_isolation_type,
powershell::HyperVGuestStateIsolationType::Vbs
| powershell::HyperVGuestStateIsolationType::Snp
| powershell::HyperVGuestStateIsolationType::Tdx
),
)
.await?;
// manually overrride for testing
let supports_com3 = false;

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_supports_com3 computes host/version capability, but the result is discarded and then supports_com3 is hard-coded to false. Please either use the computed _supports_com3 value (and remove the testing override), or delete the capability detection entirely; as-is this adds extra host queries/logging and permanently disables the COM3 fast-path.

Copilot uses AI. Check for mistakes.
),
)
.await?;
// manually overrride for testing
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: “overrride” should be “override”.

Suggested change
// manually overrride for testing
// manually override for testing

Copilot uses AI. Check for mistakes.
Comment on lines +338 to +360
let hyperv_args = {
let mut args = HyperVNewCustomVMArgs::from_config(&config, &properties)?;
args.firmware_file = igvm_file.clone();
args.firmware_parameters = openhcl_command_line;
args.guest_state_path = guest_state_path;
args.scsi_controllers = scsi_controllers;
args.ide_controllers = ide_controllers;
args.com_3 = supports_com3;
args.imc_hiv = imc_hiv;
args.management_vtl_settings = management_vtl_settings;
args
};

// The registry key to enable additional COM ports is only
// available in newer builds of Windows.
let current_winver = windows_version::OsVersion::current();
tracing::debug!(?current_winver, "host windows version");
// This is the oldest working build used in CI
// TODO: determine the actual minimum version
const COM3_MIN_WINVER: u32 = 27813;
let is_supported_winver = current_winver.build >= COM3_MIN_WINVER;
let vm = HyperVVM::new(hyperv_args, log_source.clone(), driver.clone()).await?;

is_not_vbs && is_x86 && is_supported_winver
};
if properties.is_openhcl {
// Copy the IGVM file locally, since it may not be accessible by
// Hyper-V (e.g., if it is in a WSL filesystem).
let local_path = igvm_file.as_ref().unwrap();
fs_err::copy(config.firmware.openhcl_firmware().unwrap(), local_path)
.context("failed to copy igvm file")?;
acl_read_for_vm(local_path, Some(*vm.vmid()))
.context("failed to set ACL for igvm file")?;
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HyperVNewCustomVMArgs is populated with firmware_file = igvm_file.clone() and passed into HyperVVM::new(...) (which calls New-CustomVM) before the IGVM is actually copied into igvm_file (the copy happens afterwards). This means New-CustomVM may be given a non-existent firmware path, which can fail VM creation or leave the VM pointing at a missing file. Copy the IGVM (and set ACLs) before calling HyperVVM::new, or defer setting FirmwareFile until after the file exists.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 2, 2026 16:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Comment on lines +64 to +71
futures::future::join_all(
self.inner
.resources
.log_stream_tasks
.into_iter()
.map(|t| t.cancel()),
)
.await;
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teardown now cancels all log_stream_tasks and discards their outputs, which means errors from the logging pipelines will no longer surface (previously t.await? propagated failures). If the intent is to avoid teardown failures, consider still inspecting the cancel().await results and at least logging non-cancellation errors (or explicitly documenting why errors are ignored).

Suggested change
futures::future::join_all(
self.inner
.resources
.log_stream_tasks
.into_iter()
.map(|t| t.cancel()),
)
.await;
let cancel_futs = self
.inner
.resources
.log_stream_tasks
.into_iter()
.map(|t| t.cancel());
let cancel_results = futures::future::join_all(cancel_futs).await;
for res in cancel_results {
if let Err(e) = res {
tracing::warn!(?e, "log stream task did not cancel cleanly");
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +616 to 622
// let use_serial2 = self.firmware.is_openhcl();
// manually overrride for testing
let use_serial2 = false;

let serial2 = if use_serial2 {
let (serial2_host, serial2) = self
.create_serial_stream()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serial2/OpenHCL logging is hard-disabled via a "manually overrride for testing" block. This changes runtime behavior for all OpenHCL VMs and leaves test-only scaffolding (including a misspelling) in production code. If serial2 is being disabled to avoid double-writing the same openhcl log file now that kmsg_log_task is used, gate it cleanly (e.g., choose exactly one log source based on config) and remove the test override/comments.

Copilot uses AI. Check for mistakes.
Comment on lines +201 to +215
// For certain configurations, we need to override the override
// in new_underhill_vm.
//
// TODO: remove this (and OpenHCL override) once host changes
// are saturated.
if !properties.is_isolated
&& !*secure_boot_enabled
&& config.tpm.is_none()
&& !*default_boot_always_attempt
{
append_cmdline(
&mut openhcl_command_line,
format!(
"HCL_DEFAULT_BOOT_ALWAYS_ATTEMPT={}",
if *default_boot_always_attempt { 1 } else { 0 }
),
"HCL_DEFAULT_BOOT_ALWAYS_ATTEMPT=0",
);
};
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openhcl_command_line starts as None for non-OpenHCL UEFI VMs, but this block can still call append_cmdline, which will create a non-empty FirmwareParameters string and pass it to New-CustomVM. That looks OpenHCL-specific and could misconfigure or break plain UEFI VMs. Please gate this override on properties.is_openhcl/config.firmware.is_openhcl() (or only mutate the cmdline when openhcl_command_line is already Some).

Copilot uses AI. Check for mistakes.

/// get the encryption policy of the vmgs
pub fn encryption_policy(&self) -> Option<GuestStateEncryptionPolicy> {
self.vmgs().map(|vmgs| vmgs.encryption_policy)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PetriVmgsResource::encryption_policy() now returns None for Ephemeral, whereas previously Hyper-V setup treated ephemeral VMGS as GuestStateEncryptionPolicy::None(true) (strict) when deriving flags/CLI. This is a behavioral change that can silently disable strict_encryption_policy and omit GuestStateEncryptionPolicy for ephemeral OpenHCL VMs. Consider returning Some(GuestStateEncryptionPolicy::None(true)) for Ephemeral, or provide an explicit API that encodes the intended default for ephemeral guest state.

Suggested change
self.vmgs().map(|vmgs| vmgs.encryption_policy)
match self {
PetriVmgsResource::Disk(vmgs)
| PetriVmgsResource::ReprovisionOnFailure(vmgs)
| PetriVmgsResource::Reprovision(vmgs) => Some(vmgs.encryption_policy),
PetriVmgsResource::Ephemeral => Some(GuestStateEncryptionPolicy::None(true)),
}

Copilot uses AI. Check for mistakes.
Comment on lines 26 to +33
[[profile.default.overrides]]
# use fuzzy-matching for the package() to allow out-of-tree tests to use the
# same profile
filter = 'package(~vmm_tests)'
# Mark VMM tests as heavy and requiring more threads due to their memory and CPU
# usage. For local dev runs, you may need to manually restrict the number of
# threads running via the -j cli arg.
threads-required = 3
threads-required = 2
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the dedicated heavy/very_heavy overrides for vmm_tests and lowers threads-required for all vmm_tests to 2. That changes scheduling semantics substantially and may increase parallelism for the heaviest tests (risking CI instability/OOM) or under-provision threads for tests that expect larger threads-required. Please clarify the motivation and consider keeping the tiered overrides (or adjusting them) rather than deleting them entirely.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants