[Testing] Use diag_client to get OpenHCL logs#3179
[Testing] Use diag_client to get OpenHCL logs#3179tjones60 wants to merge 9 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates Petri’s VM backends to collect OpenHCL kmsg logs via diag_client (instead of relying on serial/COM3), and refactors the Hyper-V backend to create VMs using a new high-performance New-CustomVM PowerShell cmdlet.
Changes:
- Add OpenHCL kmsg log streaming via
diag_clientover hybrid-vsock in the OpenVMM backend. - Refactor Hyper-V VM creation to use
New-CustomVMwith a new Rust argument model (HyperVNewCustomVMArgs), including guest state + storage/controller setup. - Extend VMGS helpers to expose the backing
Diskand encryption policy directly.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| petri/src/vm/openvmm/construct.rs | Spawns an openhcl-log task using diag_client over the VTL2 hybrid-vsock path; serial setup adjusted. |
| petri/src/vm/mod.rs | Adds Firmware::openhcl_firmware() helper and expands PetriVmgsResource accessors (vmgs(), disk(), encryption_policy()). |
| petri/src/vm/hyperv/vm.rs | Changes Hyper-V VM construction to accept HyperVNewCustomVMArgs and relies on New-CustomVM behavior for COM pipe setup. |
| petri/src/vm/hyperv/powershell.rs | Introduces HyperVNewCustomVMArgs + compatibility logic, and adds run_new_customvm() plus supporting enums/flags. |
| petri/src/vm/hyperv/mod.rs | Reworks Hyper-V backend to build HyperVNewCustomVMArgs, wire up diag-based OpenHCL logging, and use COM pipe paths. |
| petri/src/vm/hyperv/hyperv.psm1 | Adds New-CustomVM implementation and helper conversions (IMC + VTL2 settings + drive resources). |
| petri/Cargo.toml | Adds bitfield-struct dependency for Hyper-V management VTL feature flags. |
| Cargo.lock | Locks bitfield-struct inclusion. |
| // manually overrride for testing | ||
| let use_serial2 = false; | ||
|
|
||
| let serial2 = if use_serial2 { | ||
| let (serial2_host, serial2) = self | ||
| .create_serial_stream() |
There was a problem hiding this comment.
The use_serial2 test override hard-codes serial2 (OpenHCL serial logging) off. If serial2 logging is intentionally being removed in favor of diag_client, please delete this override plumbing entirely; otherwise wire it to a real runtime/config flag (or restore the previous self.firmware.is_openhcl() gating) so behavior isn’t permanently changed by a local testing constant.
| serial_tasks.push(serial0_task); | ||
|
|
||
| let serial2 = if self.firmware.is_openhcl() { | ||
| // manually overrride for testing |
There was a problem hiding this comment.
Typo in comment: “overrride” should be “override”.
| // manually overrride for testing | |
| // manually override for testing |
| // Attempt to enable COM3 and use that to get KMSG logs, otherwise | ||
| // fall back to use diag_client. | ||
| let _supports_com3 = { | ||
| // Hyper-V VBS VMs don't work with COM3 enabled. | ||
| // Hypervisor support is needed for this to work. | ||
| let is_not_vbs = !matches!(config.firmware.isolation(), Some(IsolationType::Vbs)); | ||
|
|
||
| // The Hyper-V serial device for ARM doesn't support additional | ||
| // serial ports yet. | ||
| let is_x86 = matches!(config.arch, MachineArch::X86_64); | ||
|
|
||
| // The registry key to enable additional COM ports is only | ||
| // available in newer builds of Windows. | ||
| let current_winver = windows_version::OsVersion::current(); | ||
| tracing::debug!(?current_winver, "host windows version"); | ||
| // This is the oldest working build used in CI | ||
| // TODO: determine the actual minimum version | ||
| const COM3_MIN_WINVER: u32 = 27813; | ||
| let is_supported_winver = current_winver.build >= COM3_MIN_WINVER; | ||
|
|
||
| properties.is_openhcl && is_not_vbs && is_x86 && is_supported_winver | ||
| }; | ||
|
|
||
| // TODO: only increase VTL2 memory on debug builds | ||
| vm.set_openhcl_firmware( | ||
| &igvm_file, | ||
| // don't increase VTL2 memory on CVMs | ||
| !matches!( | ||
| guest_state_isolation_type, | ||
| powershell::HyperVGuestStateIsolationType::Vbs | ||
| | powershell::HyperVGuestStateIsolationType::Snp | ||
| | powershell::HyperVGuestStateIsolationType::Tdx | ||
| ), | ||
| ) | ||
| .await?; | ||
| // manually overrride for testing | ||
| let supports_com3 = false; | ||
|
|
There was a problem hiding this comment.
_supports_com3 computes host/version capability, but the result is discarded and then supports_com3 is hard-coded to false. Please either use the computed _supports_com3 value (and remove the testing override), or delete the capability detection entirely; as-is this adds extra host queries/logging and permanently disables the COM3 fast-path.
| ), | ||
| ) | ||
| .await?; | ||
| // manually overrride for testing |
There was a problem hiding this comment.
Typo in comment: “overrride” should be “override”.
| // manually overrride for testing | |
| // manually override for testing |
| let hyperv_args = { | ||
| let mut args = HyperVNewCustomVMArgs::from_config(&config, &properties)?; | ||
| args.firmware_file = igvm_file.clone(); | ||
| args.firmware_parameters = openhcl_command_line; | ||
| args.guest_state_path = guest_state_path; | ||
| args.scsi_controllers = scsi_controllers; | ||
| args.ide_controllers = ide_controllers; | ||
| args.com_3 = supports_com3; | ||
| args.imc_hiv = imc_hiv; | ||
| args.management_vtl_settings = management_vtl_settings; | ||
| args | ||
| }; | ||
|
|
||
| // The registry key to enable additional COM ports is only | ||
| // available in newer builds of Windows. | ||
| let current_winver = windows_version::OsVersion::current(); | ||
| tracing::debug!(?current_winver, "host windows version"); | ||
| // This is the oldest working build used in CI | ||
| // TODO: determine the actual minimum version | ||
| const COM3_MIN_WINVER: u32 = 27813; | ||
| let is_supported_winver = current_winver.build >= COM3_MIN_WINVER; | ||
| let vm = HyperVVM::new(hyperv_args, log_source.clone(), driver.clone()).await?; | ||
|
|
||
| is_not_vbs && is_x86 && is_supported_winver | ||
| }; | ||
| if properties.is_openhcl { | ||
| // Copy the IGVM file locally, since it may not be accessible by | ||
| // Hyper-V (e.g., if it is in a WSL filesystem). | ||
| let local_path = igvm_file.as_ref().unwrap(); | ||
| fs_err::copy(config.firmware.openhcl_firmware().unwrap(), local_path) | ||
| .context("failed to copy igvm file")?; | ||
| acl_read_for_vm(local_path, Some(*vm.vmid())) | ||
| .context("failed to set ACL for igvm file")?; |
There was a problem hiding this comment.
HyperVNewCustomVMArgs is populated with firmware_file = igvm_file.clone() and passed into HyperVVM::new(...) (which calls New-CustomVM) before the IGVM is actually copied into igvm_file (the copy happens afterwards). This means New-CustomVM may be given a non-existent firmware path, which can fail VM creation or leave the VM pointing at a missing file. Copy the IGVM (and set ACLs) before calling HyperVVM::new, or defer setting FirmwareFile until after the file exists.
| futures::future::join_all( | ||
| self.inner | ||
| .resources | ||
| .log_stream_tasks | ||
| .into_iter() | ||
| .map(|t| t.cancel()), | ||
| ) | ||
| .await; |
There was a problem hiding this comment.
teardown now cancels all log_stream_tasks and discards their outputs, which means errors from the logging pipelines will no longer surface (previously t.await? propagated failures). If the intent is to avoid teardown failures, consider still inspecting the cancel().await results and at least logging non-cancellation errors (or explicitly documenting why errors are ignored).
| futures::future::join_all( | |
| self.inner | |
| .resources | |
| .log_stream_tasks | |
| .into_iter() | |
| .map(|t| t.cancel()), | |
| ) | |
| .await; | |
| let cancel_futs = self | |
| .inner | |
| .resources | |
| .log_stream_tasks | |
| .into_iter() | |
| .map(|t| t.cancel()); | |
| let cancel_results = futures::future::join_all(cancel_futs).await; | |
| for res in cancel_results { | |
| if let Err(e) = res { | |
| tracing::warn!(?e, "log stream task did not cancel cleanly"); | |
| } | |
| } |
| // let use_serial2 = self.firmware.is_openhcl(); | ||
| // manually overrride for testing | ||
| let use_serial2 = false; | ||
|
|
||
| let serial2 = if use_serial2 { | ||
| let (serial2_host, serial2) = self | ||
| .create_serial_stream() |
There was a problem hiding this comment.
The serial2/OpenHCL logging is hard-disabled via a "manually overrride for testing" block. This changes runtime behavior for all OpenHCL VMs and leaves test-only scaffolding (including a misspelling) in production code. If serial2 is being disabled to avoid double-writing the same openhcl log file now that kmsg_log_task is used, gate it cleanly (e.g., choose exactly one log source based on config) and remove the test override/comments.
| // For certain configurations, we need to override the override | ||
| // in new_underhill_vm. | ||
| // | ||
| // TODO: remove this (and OpenHCL override) once host changes | ||
| // are saturated. | ||
| if !properties.is_isolated | ||
| && !*secure_boot_enabled | ||
| && config.tpm.is_none() | ||
| && !*default_boot_always_attempt | ||
| { | ||
| append_cmdline( | ||
| &mut openhcl_command_line, | ||
| format!( | ||
| "HCL_DEFAULT_BOOT_ALWAYS_ATTEMPT={}", | ||
| if *default_boot_always_attempt { 1 } else { 0 } | ||
| ), | ||
| "HCL_DEFAULT_BOOT_ALWAYS_ATTEMPT=0", | ||
| ); | ||
| }; | ||
| } |
There was a problem hiding this comment.
openhcl_command_line starts as None for non-OpenHCL UEFI VMs, but this block can still call append_cmdline, which will create a non-empty FirmwareParameters string and pass it to New-CustomVM. That looks OpenHCL-specific and could misconfigure or break plain UEFI VMs. Please gate this override on properties.is_openhcl/config.firmware.is_openhcl() (or only mutate the cmdline when openhcl_command_line is already Some).
|
|
||
| /// get the encryption policy of the vmgs | ||
| pub fn encryption_policy(&self) -> Option<GuestStateEncryptionPolicy> { | ||
| self.vmgs().map(|vmgs| vmgs.encryption_policy) |
There was a problem hiding this comment.
PetriVmgsResource::encryption_policy() now returns None for Ephemeral, whereas previously Hyper-V setup treated ephemeral VMGS as GuestStateEncryptionPolicy::None(true) (strict) when deriving flags/CLI. This is a behavioral change that can silently disable strict_encryption_policy and omit GuestStateEncryptionPolicy for ephemeral OpenHCL VMs. Consider returning Some(GuestStateEncryptionPolicy::None(true)) for Ephemeral, or provide an explicit API that encodes the intended default for ephemeral guest state.
| self.vmgs().map(|vmgs| vmgs.encryption_policy) | |
| match self { | |
| PetriVmgsResource::Disk(vmgs) | |
| | PetriVmgsResource::ReprovisionOnFailure(vmgs) | |
| | PetriVmgsResource::Reprovision(vmgs) => Some(vmgs.encryption_policy), | |
| PetriVmgsResource::Ephemeral => Some(GuestStateEncryptionPolicy::None(true)), | |
| } |
| [[profile.default.overrides]] | ||
| # use fuzzy-matching for the package() to allow out-of-tree tests to use the | ||
| # same profile | ||
| filter = 'package(~vmm_tests)' | ||
| # Mark VMM tests as heavy and requiring more threads due to their memory and CPU | ||
| # usage. For local dev runs, you may need to manually restrict the number of | ||
| # threads running via the -j cli arg. | ||
| threads-required = 3 | ||
| threads-required = 2 |
There was a problem hiding this comment.
This removes the dedicated heavy/very_heavy overrides for vmm_tests and lowers threads-required for all vmm_tests to 2. That changes scheduling semantics substantially and may increase parallelism for the heaviest tests (risking CI instability/OOM) or under-provision threads for tests that expect larger threads-required. Please clarify the motivation and consider keeping the tiered overrides (or adjusting them) rather than deleting them entirely.
See how much faster it is to use diag_client instead of COM3