diff --git a/README.md b/README.md index adf5654d..55916ec5 100644 --- a/README.md +++ b/README.md @@ -215,6 +215,37 @@ wxc-exec.exe --debug config.json See [docs/diagnostics.md](docs/diagnostics.md) for full diagnostics reference. +## Telemetry (Experimental) + +MXC supports optional TraceLogging ETW telemetry for execution observability. When enabled, structured events (`MXC.Execution` and `MXC.Error`) are emitted to the local ETW subsystem via the Rust [`tracelogging`](https://crates.io/crates/tracelogging) crate. Every event includes common fields (Version, Channel, IsDebugging, `UTCReplace_AppSessionGuid`) as Part C custom event data. + +Telemetry is **experimental** and requires: +1. The `--experimental` CLI flag +2. `"experimental": { "telemetry": { "enabled": true } }` in the JSON config + +On non-Windows platforms, all telemetry functions are no-ops. + +### Data Collection + +The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft's privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices. + +#### How to turn telemetry off + +Telemetry is **off by default**. MXC emits telemetry only when **both** of the following are set, so no action is required to keep it disabled: + +1. The `--experimental` CLI flag is passed, **and** +2. `"experimental": { "telemetry": { "enabled": true } }` is present in the JSON config. + +Omitting either (the default) turns telemetry off entirely. On non-Windows platforms all telemetry functions are no-ops. + +#### What official builds send + +Official/shipped Microsoft builds set a TraceLogging provider group GUID at build time and route `MXC.Execution` and `MXC.Error` events to Microsoft through the UTC pipeline when telemetry is enabled. **Local and open-source builds send nothing to Microsoft by default** — the public source ships without a provider group GUID, so events are emitted to the local ETW subsystem only and are not routed to any Microsoft collection pipeline. Internal builds that set the `MXC_TELEMETRY_PROVIDER_GROUP_GUID` environment variable at build time enable the Microsoft-routed path. + +No PII is collected. Events contain only execution metrics (duration, backend type, exit code) and a bounded error category (`error_type`). Free-form error message text is never emitted, so paths, usernames, and credentials cannot leak through telemetry. If you use the SDK to build applications, you are responsible for providing appropriate telemetry notices to your own users. + +Privacy information can be found at https://privacy.microsoft.com and in the Microsoft privacy statement at https://go.microsoft.com/fwlink/?LinkID=824704. + ## Documentation | Document | Description | @@ -231,6 +262,7 @@ See [docs/diagnostics.md](docs/diagnostics.md) for full diagnostics reference. | [docs/macos-support/seatbelt-backend.md](docs/macos-support/seatbelt-backend.md) | Seatbelt backend (macOS) | | [docs/windows-sandbox/windows-sandbox.md](docs/windows-sandbox/windows-sandbox.md) | Windows Sandbox backend | | [docs/state-aware-lifecycle/mxc-state-aware-sandbox-api.md](docs/state-aware-lifecycle/mxc-state-aware-sandbox-api.md) | State-aware sandbox lifecycle API | +| [docs/telemetry/telemetry.md](docs/telemetry/telemetry.md) | TraceLogging telemetry architecture | ## Contributing @@ -238,4 +270,4 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines. ## License -See [LICENSE.md](LICENSE.md) for details. \ No newline at end of file +See [LICENSE.md](LICENSE.md) for details. diff --git a/docs/schema.md b/docs/schema.md index 49ebd302..2ea3187f 100644 --- a/docs/schema.md +++ b/docs/schema.md @@ -85,6 +85,9 @@ production configs and the dev schema when working on experimental features: "launchMethod": "exec", // "exec" or "open" (LaunchServices, for Apple-constrained apps) "nestedPty": true, // Allow inner process to allocate its own pty (posix_openpt) "keychainAccess": false // Allow Keychain via securityd / trustd / cfprefsd / lsd.* + }, + "telemetry": { // Telemetry (experimental, Windows only) + "enabled": true // Emit TraceLogging ETW events via pure Rust tracelogging crate } } } diff --git a/docs/telemetry/telemetry.md b/docs/telemetry/telemetry.md new file mode 100644 index 00000000..acdadf34 --- /dev/null +++ b/docs/telemetry/telemetry.md @@ -0,0 +1,197 @@ +# MXC Telemetry — Pure Rust TraceLogging Architecture + +MXC uses the Rust [`tracelogging`](https://crates.io/crates/tracelogging) crate +(published by Microsoft) for TraceLogging ETW telemetry. No C++ shim, WIL, or +FFI is required. + +## Overview + +``` +┌──────────────────────────────────────────────────────┐ +│ wxc_common::telemetry │ +│ (Rust — config resolution, sanitisation, types) │ +│ │ +│ init() / log_execution() / log_error() / shutdown() │ +└───────────────┬──────────────────────────────────────┘ + │ Direct Rust function calls + ▼ +┌──────────────────────────────────────────────────────┐ +│ mxc_telemetry (Rust crate) │ +│ src/lib.rs — define_provider! + write_event! │ +│ │ +│ Windows: ETW events via tracelogging crate │ +│ Linux/macOS: no-op stubs │ +└──────────────────────────────────────────────────────┘ +``` + +## Why the Rust `tracelogging` Crate (Not WIL C++ Shim) + +An earlier design used a WIL C++ shim compiled via the `cc` crate. PR review +feedback correctly noted that the WIL dependency added C++ compilation, NuGet +download, FFI unsafety, and blocked non-Windows contributors from building the +crate. The Rust `tracelogging` crate provides the core ETW primitives needed, +and the small set of WIL features MXC actually uses can be replicated with +Rust constants and `write_event!` struct fields. + +### Feature comparison + +| Feature | WIL (`wil/TraceLogging.h`) | Rust `tracelogging` crate | MXC approach | +|---|---|---|---| +| **Provider group GUID** | `TraceLoggingOptionMicrosoftTelemetry()` | `group_id("...")` in `define_provider!` | `build.rs` generates `provider_def.rs` with/without `group_id` based on env var | +| **Sampling keywords** | `MICROSOFT_KEYWORD_MEASURES` named constant | Raw `u64` in `keyword(...)` | `const MICROSOFT_KEYWORD_MEASURES: u64 = 0x0000_4000_0000_0000` | +| **Common event fields** | `_GENERIC_PARTB_FIELDS_ENABLED` pattern | `struct("Name", { ... })` in `write_event!` | `struct("COMMON_MXC_PARAMS", { Version, Channel, IsDebugging, UTCReplace_AppSessionGuid })` | +| **Provider lifecycle** | `IMPLEMENT_TRACELOGGING_CLASS` singleton | `define_provider!` static + `register()`/`unregister()` | `OnceLock` for version/channel, manual lifecycle | +| **Privacy Data Tags** | `TelemetryPrivacyDataTag(PDT_*)` | `u64("PartA_PrivTags", &val)` field | `PDT_PRODUCT_AND_SERVICE_USAGE` on all events | +| **Activity tracking** | `DEFINE_TELEMETRY_ACTIVITY` | Manual `Opcode` | Not needed for current events | + +The remaining gap (activity tracking) is not needed for current events. +If needed later, it can be added incrementally. + +## Common Event Fields (Part C) + +Every MXC telemetry event includes a `COMMON_MXC_PARAMS` struct grouping +shared Part C custom event fields: + +| Field | Type | Description | +|-------|------|-------------| +| `Version` | string | MXC crate version from `CARGO_PKG_VERSION` | +| `Channel` | string | `"dev"` for debug builds, `"release"` for release | +| `IsDebugging` | bool | `cfg!(debug_assertions)` — true for debug builds | +| `UTCReplace_AppSessionGuid` | bool | Always `true` — tells UTC to replace the app session GUID with a per-session identifier for privacy | + +## Events + +### MXC.Execution + +Emitted when a one-shot execution completes (success or failure). It is also +emitted on early-exit failures in the one-shot executors — configuration, +policy, and backend-init failures that terminate before a runner produces a +result (with `mxc.exit_code` = 1 and `mxc.outcome` = `failure`). + +> **Note:** The state-aware lifecycle (`provision` / `start` / `exec` / +> `stop` / `deprovision`) is not yet instrumented; only the one-shot path +> emits telemetry. + +| Field | Type | Description | +|-------|------|-------------| +| `mxc.backend` | string | Containment backend name | +| `mxc.exit_code` | int32 | Process exit code | +| `mxc.outcome` | string | `"success"` or `"failure"` | +| `mxc.duration_ms` | uint64 | Total execution time | +| `mxc.failure_reason` | string | Failure category (if applicable) | + +### MXC.Error + +Emitted on execution errors. + +| Field | Type | Description | +|-------|------|-------------| +| `mxc.backend` | string | Containment backend name | +| `mxc.error_type` | string | Error category (`config_error`, `process_error`, etc.) | +| `mxc.exit_code` | int32 | Process exit code | + +> **No free-form error text is emitted.** Error messages can contain paths, +> usernames, or credentials, so `MXC.Error` deliberately carries only the +> bounded `error_type` category and the numeric `exit_code` — never the +> message string itself. + +## Cross-Platform Behaviour + +| Platform | Behaviour | +|----------|-----------| +| Windows | Full ETW telemetry via `tracelogging` crate | +| Linux | No-op — all telemetry functions return immediately | +| macOS | No-op — all telemetry functions return immediately | + +## Private GUID Substitution (Internal Builds) + +MXC supports an optional Microsoft telemetry group GUID for internal builds. +The mechanism is public; only the GUID value is private. + +### How it works + +``` +build.rs execution flow +======================== + +1. Check MXC_TELEMETRY_PROVIDER_GROUP_GUID env var + ├── NOT set → generate: define_provider!(MXC_PROVIDER, "Microsoft.MXC"); + └── SET → generate: define_provider!(MXC_PROVIDER, "Microsoft.MXC", + group_id("{guid}")); + +2. lib.rs includes the generated provider_def.rs via include!() +``` + +The provider GUID is **not** specified in either branch. The `tracelogging` +crate's `define_provider!` macro derives it deterministically from the provider +name using the standard ETW name-hash algorithm (the same algorithm used by +``, WIL's `IMPLEMENT_TRACELOGGING_CLASS`, and .NET's +`EventSource`). For `"Microsoft.MXC"` the derived GUID is +`{7f10def4-a258-5fea-510e-2c3bb976687f}`. Keeping the name and GUID in lockstep +this way prevents drift and avoids hard-coding a literal that could collide +with another team's GUID. + +### CI pipeline steps + +Internal Microsoft builds set `MXC_TELEMETRY_PROVIDER_GROUP_GUID` to the real +Microsoft telemetry group GUID before `cargo build` on Windows, so events route +through the telemetry pipeline. Community forks that lack access to the private +GUID do not set this variable — the provider is registered without a group GUID +(plain ETW only). + +> **Follow-up:** The provider group GUID is now provided by a secret variable +> on the official Windows build pipeline, so official builds can route events +> through the telemetry pipeline. The build has always honored the variable +> (see *Local developer testing* below); public builds and community forks, +> which do not have access to the variable, continue to register the provider +> without a group GUID (plain ETW only). + +### Local developer testing + +```powershell +# Test with a dummy group GUID (not the real one) +$env:MXC_TELEMETRY_PROVIDER_GROUP_GUID = '00000000-1111-2222-3333-444444444444' +cargo build -p mxc_telemetry + +# Test without (public build) +Remove-Item Env:\MXC_TELEMETRY_PROVIDER_GROUP_GUID +cargo build -p mxc_telemetry +``` + +### What's public vs. private + +| Item | Public? | Why | +|------|---------|-----| +| Provider name `"Microsoft.MXC"` | ✅ | Standard ETW naming | +| Provider GUID `{7f10def4-a258-5fea-510e-2c3bb976687f}` | ✅ | Derived from the name; identifies the provider, harmless | +| `build.rs` env var mechanism | ✅ | Mechanism is public | +| `MXC_TELEMETRY_PROVIDER_GROUP_GUID` env var name | ✅ | Key is public; value is private | +| Actual Microsoft telemetry group GUID | ❌ | Private — set in CI only | + +## SDK License Override (EULA for npm Package) + +The public GitHub repo ships `sdk/LICENSE.md` as a plain MIT license. For +internal npm publishes, a separate EULA containing a **Section 2 — DATA** +clause (covering telemetry disclosure, opt-out, and GDPR) will be updated at +pack/publish time. + +### How it works + +``` +1. CI pipeline (or local script) sets MXC_LICENSE_OVERRIDE env var + pointing to the markdown file of the EULA including additional telemetry language. + Note that the new EULA will include language outlining what data can be collected but + will otherwise remain MIT licensed. + +2. A license-override script (added in a follow-up build-integration PR) runs: + ├── MXC_LICENSE_OVERRIDE is set: + │ ├── Back up sdk/LICENSE.md → sdk/LICENSE.md.public + │ └── Copy new EULA over sdk/LICENSE.md + └── MXC_LICENSE_OVERRIDE is NOT set: + └── Restore sdk/LICENSE.md from .public backup (if exists) + +3. npm pack / npm publish picks up the new EULA as the LICENSE.md + in the published package (sdk/package.json "files" includes LICENSE.md). + +4. After publish, the revert path restores the original EULA document. +``` diff --git a/schemas/dev/mxc-config.schema.0.8.0-dev.json b/schemas/dev/mxc-config.schema.0.8.0-dev.json index d0ed5ffc..25c50033 100644 --- a/schemas/dev/mxc-config.schema.0.8.0-dev.json +++ b/schemas/dev/mxc-config.schema.0.8.0-dev.json @@ -161,6 +161,17 @@ ], "description": "Seatbelt backend config (pre-promotion alias)." }, + "telemetry": { + "anyOf": [ + { + "$ref": "#/definitions/Telemetry" + }, + { + "type": "null" + } + ], + "description": "Telemetry configuration." + }, "test": { "anyOf": [ { @@ -749,6 +760,19 @@ }, "type": "object" }, + "Telemetry": { + "description": "Telemetry configuration (`experimental.telemetry`).", + "properties": { + "enabled": { + "description": "Explicit telemetry override. `true` = force on, `false` = force off, omitted = disabled (default off).", + "type": [ + "boolean", + "null" + ] + } + }, + "type": "object" + }, "TestFeature": { "description": "Placeholder experimental feature.", "properties": { diff --git a/sdk/src/generated/wire.ts b/sdk/src/generated/wire.ts index 5d1486c7..4e0026ed 100644 --- a/sdk/src/generated/wire.ts +++ b/sdk/src/generated/wire.ts @@ -59,6 +59,10 @@ export interface Experimental { * Seatbelt backend config (pre-promotion alias). */ seatbelt?: Seatbelt | null; + /** + * Telemetry configuration. + */ + telemetry?: Telemetry | null; /** * Placeholder feature for testing experimental infrastructure. */ @@ -357,6 +361,17 @@ export interface Seatbelt { profileOverride?: string | null; } +/** + * Telemetry configuration (`experimental.telemetry`). + */ +export interface Telemetry { + /** + * Explicit telemetry override. `true` = force on, `false` = force off, omitted = disabled (default off). + */ + enabled?: boolean | null; + [k: string]: unknown; +} + /** * Placeholder experimental feature. */ diff --git a/sdk/src/types.ts b/sdk/src/types.ts index 65a0d7c7..d4b1caa8 100644 --- a/sdk/src/types.ts +++ b/sdk/src/types.ts @@ -256,6 +256,17 @@ export interface PortMapping { protocol?: 'tcp'; } +/** + * Telemetry configuration for experimental TraceLogging ETW support. + */ +export interface TelemetryConfig { + /** + * Explicit telemetry override. `true` = force on, `false` = force off, + * `undefined` = off (default). + */ + enabled?: boolean; +} + /** * Main WXC configuration */ @@ -292,6 +303,8 @@ export interface ContainerConfig { experimental?: { /** WSLC SDK configuration for Linux containers from Windows */ wslc?: WslcConfig; + /** Telemetry configuration for experimental TraceLogging ETW support */ + telemetry?: TelemetryConfig; }; /** macOS Seatbelt sandbox configuration (macOS only) */ seatbelt?: SeatbeltConfig; diff --git a/src/Cargo.lock b/src/Cargo.lock index 0c693c46..95df47f1 100644 --- a/src/Cargo.lock +++ b/src/Cargo.lock @@ -1443,6 +1443,14 @@ dependencies = [ "wxc_common", ] +[[package]] +name = "mxc_telemetry" +version = "0.7.0" +dependencies = [ + "tracelogging", + "uuid", +] + [[package]] name = "nanvix_binaries" version = "0.7.0" @@ -2226,6 +2234,21 @@ dependencies = [ "syn", ] +[[package]] +name = "tracelogging" +version = "1.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c0015caf14cad7613b7bbbb9ee44399ad9a694307be545d8af4e2711178e547e" +dependencies = [ + "tracelogging_macros", +] + +[[package]] +name = "tracelogging_macros" +version = "1.2.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "95e2d891464ff33bc1814c4cbbb251bae7800458b1efdb6ac8b7c01ee6382563" + [[package]] name = "tracing" version = "0.1.44" @@ -2944,6 +2967,7 @@ dependencies = [ "base64", "getrandom 0.2.17", "libc", + "mxc_telemetry", "nanvix_common", "schemars", "semver", diff --git a/src/Cargo.toml b/src/Cargo.toml index 84e7ba36..390e3b80 100644 --- a/src/Cargo.toml +++ b/src/Cargo.toml @@ -32,6 +32,7 @@ members = [ "testing/wxc_ui_probe", "tools/mxc_diagnostic_console", "tools/mxc_schema_gen", + "mxc_telemetry", ] exclude = ["testing/fuzz"] resolver = "3" @@ -52,6 +53,10 @@ edition = "2021" license = "MIT" [workspace.dependencies] +# Pure-Rust ETW TraceLogging provider used by `mxc_telemetry` (Windows-only; +# the crate compiles to no-ops on other targets). Pinned at the workspace +# level so every consumer resolves the same version. +tracelogging = "1.2" windows = { version = "0.62", features = [ "Win32_Foundation", "Win32_Security", @@ -109,6 +114,7 @@ isolation_session_bindings = { path = "backends/isolation_session/bindings" } mxc_pty = { path = "core/mxc_pty" } flatbuffers = "25" sandbox_spec = { path = "core/generated/base_container_specification" } +mxc_telemetry = { path = "mxc_telemetry" } widestring = "1" url = "2" winreg = "0.55" diff --git a/src/core/lxc/src/main.rs b/src/core/lxc/src/main.rs index cee7d9d4..5357b713 100644 --- a/src/core/lxc/src/main.rs +++ b/src/core/lxc/src/main.rs @@ -10,6 +10,7 @@ use wxc_common::config_parser::load_request; use wxc_common::logger::{Logger, Mode}; use wxc_common::models::{ContainmentBackend, ExecutionRequest, ScriptResponse}; use wxc_common::script_runner::{handle_dry_run_exit, ScriptRunner}; +use wxc_common::telemetry; #[cfg(target_os = "linux")] use bwrap_common::bwrap_runner::BubblewrapScriptRunner; @@ -215,6 +216,18 @@ fn main() { request.testing_features_enabled = cli.allow_testing_features; request.dry_run = cli.dry_run; + // ── Telemetry init (experimental) ─────────────────────────────── + let telemetry_active = if request.experimental_enabled { + request + .experimental + .telemetry + .as_ref() + .map(|c| telemetry::init(c, &mut logger)) + .unwrap_or(false) + } else { + false + }; + log_request(&request, &mut logger); // Dispatch by containment backend. On Linux, Bubblewrap is now the @@ -242,6 +255,11 @@ fn main() { eprintln!( "Error: Hyperlight backend requires x86_64 (Hyperlight needs KVM or WHP)" ); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } } @@ -257,6 +275,11 @@ fn main() { #[cfg(not(feature = "microvm"))] { eprintln!("Error: MicroVM backend not compiled in (build with --features microvm)"); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } } @@ -268,6 +291,11 @@ fn main() { #[cfg(not(target_os = "linux"))] { eprintln!("Error: Bubblewrap backend is only available on Linux"); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } } @@ -298,6 +326,14 @@ fn main() { display_script_results(&response, &mut logger); + // ── Telemetry emit (experimental) ─────────────────────────────── + telemetry::emit_completion( + telemetry_active, + &request.containment, + &response, + run_elapsed, + ); + print!("{}", response.standard_out); eprint!("{}", response.standard_err); process::exit(response.exit_code); diff --git a/src/core/wxc/src/main.rs b/src/core/wxc/src/main.rs index 9ae68a28..97991b03 100644 --- a/src/core/wxc/src/main.rs +++ b/src/core/wxc/src/main.rs @@ -32,6 +32,7 @@ use wxc_common::script_runner::{handle_dry_run_exit, ScriptRunner}; use wxc_common::state_aware_dispatch::dispatch_state_aware; use wxc_common::state_aware_dispatch::{resolve_backend, run_state_aware, DispatchOutcome}; use wxc_common::state_aware_request::{MxcRequest, ParsedStateAwareRequest, Phase}; +use wxc_common::telemetry; #[derive(Parser)] #[command(name = "wxc-exec", about = "Windows Container Executor")] @@ -706,6 +707,18 @@ fn main() { request.testing_features_enabled = cli.allow_testing_features; request.dry_run = cli.dry_run; + // ── Telemetry init (experimental) ─────────────────────────────── + let telemetry_active = if request.experimental_enabled { + request + .experimental + .telemetry + .as_ref() + .map(|c| telemetry::init(c, &mut logger)) + .unwrap_or(false) + } else { + false + }; + // Apply the CLI command-line override to one-shot requests. State-aware // exec is handled above before dispatch. let command_override = match command_override_from_cli( @@ -716,6 +729,11 @@ fn main() { Err(e) => { eprintln!("Request error\ninvalid CLI command override: {e}"); eprint!("{}", logger.get_buffer()); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::ConfigError, + ); process::exit(1); } }; @@ -728,6 +746,11 @@ fn main() { "Error: no command to run. Provide `process.commandLine` in the policy or pass the command as arguments after the config path." ); eprint!("{}", logger.get_buffer()); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::ConfigError, + ); process::exit(1); } @@ -857,6 +880,11 @@ fn main() { } } eprint!("{}", logger.get_buffer()); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } } @@ -885,23 +913,48 @@ fn main() { #[cfg(not(feature = "wslc"))] { eprintln!("Error: WSLC backend not compiled. Rebuild with --features wslc."); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } } ContainmentBackend::Lxc => { eprintln!("Error: LXC backend not available on Windows"); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } ContainmentBackend::Bubblewrap => { eprintln!("Error: Bubblewrap backend not available on Windows"); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } ContainmentBackend::Seatbelt => { eprintln!("Error: Seatbelt backend is only available on macOS (use mxc-exec-mac)"); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } ContainmentBackend::Vm => { eprintln!("Error: VM backend not yet implemented"); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } ContainmentBackend::MicroVm => { @@ -916,6 +969,11 @@ fn main() { #[cfg(not(feature = "microvm"))] { eprintln!("Error: MicroVM backend not compiled in (build with --features microvm)"); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } } @@ -936,6 +994,11 @@ fn main() { eprintln!( "Error: Hyperlight backend requires x86_64 (Hyperlight needs KVM or WHP)" ); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } } @@ -970,6 +1033,11 @@ fn main() { eprintln!( "Error: IsolationSession backend not compiled. Rebuild with --features isolation_session." ); + telemetry::emit_early_exit( + telemetry_active, + &request.containment, + telemetry::FailureReason::InitError, + ); process::exit(1); } } @@ -996,6 +1064,14 @@ fn main() { display_script_results(&response, &mut logger); + // ── Telemetry emit (experimental) ─────────────────────────────── + telemetry::emit_completion( + telemetry_active, + &request.containment, + &response, + run_elapsed, + ); + // Close diagnostic pipe. logger.close_diagnostics(); diff --git a/src/core/wxc_common/Cargo.toml b/src/core/wxc_common/Cargo.toml index bca51a80..09f2e3ee 100644 --- a/src/core/wxc_common/Cargo.toml +++ b/src/core/wxc_common/Cargo.toml @@ -21,6 +21,7 @@ semver = "1" schemars = { version = "0.8", optional = true } nanvix_common = { path = "../../backends/nanvix/common", optional = true } uuid = { workspace = true, optional = true } +mxc_telemetry = { workspace = true } [target.'cfg(target_os = "windows")'.dependencies] windows = { workspace = true } diff --git a/src/core/wxc_common/src/config_parser.rs b/src/core/wxc_common/src/config_parser.rs index 56b25257..115aa646 100644 --- a/src/core/wxc_common/src/config_parser.rs +++ b/src/core/wxc_common/src/config_parser.rs @@ -10,8 +10,8 @@ use crate::logger::Logger; use crate::models::{ ContainerPolicy, ContainmentBackend, ExecutionRequest, ExperimentalConfig, IsolationSessionConfig, LifecycleConfig, LxcConfig, NetworkEnforcementMode, NetworkPolicy, - PortMapping, ProxyAddress, ProxyConfig, SeatbeltConfig, TestFeatureConfig, UiPolicy, - WindowsSandboxConfig, WslcConfig, + PortMapping, ProxyAddress, ProxyConfig, SeatbeltConfig, TelemetryConfig, TestFeatureConfig, + UiPolicy, WindowsSandboxConfig, WslcConfig, }; use crate::mxc_error::MxcError; use crate::state_aware_request::{MxcRequest, ParsedStateAwareRequest, Phase}; @@ -973,11 +973,15 @@ fn convert_wire_config( logger.log_line(&msg); return Err(WxcError::ConfigParse(msg)); } + let telemetry = raw_exp.telemetry.map(|raw_t| TelemetryConfig { + enabled: raw_t.enabled, + }); ExperimentalConfig { test, windows_sandbox, wslc, isolation_session, + telemetry, } } else { ExperimentalConfig::default() @@ -3873,4 +3877,45 @@ mod tests { load_request(&encoded, &mut logger, true).unwrap(); } + + // ── Telemetry ──────────────────────────────────────────────────── + + #[test] + fn telemetry_not_set() { + let json = r#"{"process":{"commandLine":"echo hi"}}"#; + let encoded = base64_encode(json.as_bytes()); + let mut logger = test_logger(); + let req = load_request(&encoded, &mut logger, true).unwrap(); + assert!(req.experimental.telemetry.is_none()); + } + + #[test] + fn telemetry_enabled_true() { + let json = r#"{"process":{"commandLine":"echo hi"},"experimental":{"telemetry":{"enabled":true}}}"#; + let encoded = base64_encode(json.as_bytes()); + let mut logger = test_logger(); + let req = load_request(&encoded, &mut logger, true).unwrap(); + let telem = req.experimental.telemetry.expect("telemetry should be set"); + assert_eq!(telem.enabled, Some(true)); + } + + #[test] + fn telemetry_enabled_false() { + let json = r#"{"process":{"commandLine":"echo hi"},"experimental":{"telemetry":{"enabled":false}}}"#; + let encoded = base64_encode(json.as_bytes()); + let mut logger = test_logger(); + let req = load_request(&encoded, &mut logger, true).unwrap(); + let telem = req.experimental.telemetry.expect("telemetry should be set"); + assert_eq!(telem.enabled, Some(false)); + } + + #[test] + fn telemetry_empty_object() { + let json = r#"{"process":{"commandLine":"echo hi"},"experimental":{"telemetry":{}}}"#; + let encoded = base64_encode(json.as_bytes()); + let mut logger = test_logger(); + let req = load_request(&encoded, &mut logger, true).unwrap(); + let telem = req.experimental.telemetry.expect("telemetry should be set"); + assert_eq!(telem.enabled, None); + } } diff --git a/src/core/wxc_common/src/lib.rs b/src/core/wxc_common/src/lib.rs index cbb6382d..5c264bfd 100644 --- a/src/core/wxc_common/src/lib.rs +++ b/src/core/wxc_common/src/lib.rs @@ -20,6 +20,7 @@ pub mod script_runner; pub mod state_aware_backend; pub mod state_aware_dispatch; pub mod state_aware_request; +pub mod telemetry; pub mod ui_policy; pub mod validator; diff --git a/src/core/wxc_common/src/models.rs b/src/core/wxc_common/src/models.rs index 9ccee184..6b82c07e 100644 --- a/src/core/wxc_common/src/models.rs +++ b/src/core/wxc_common/src/models.rs @@ -654,6 +654,17 @@ pub struct ExperimentalConfig { /// Isolation Session backend (experimental). #[serde(rename = "isolation_session")] pub isolation_session: Option, + /// Telemetry configuration (experimental). + pub telemetry: Option, +} + +/// Telemetry configuration parsed from the JSON config `experimental.telemetry` section. +#[derive(Debug, Clone, Default, Serialize, Deserialize)] +#[serde(default)] +pub struct TelemetryConfig { + /// Explicit telemetry override. + /// `Some(true)` = force on, `Some(false)` = force off, `None` = disabled (default off). + pub enabled: Option, } #[derive(Debug, Clone, Default, Serialize, Deserialize)] diff --git a/src/core/wxc_common/src/telemetry/events.rs b/src/core/wxc_common/src/telemetry/events.rs new file mode 100644 index 00000000..77ab9a17 --- /dev/null +++ b/src/core/wxc_common/src/telemetry/events.rs @@ -0,0 +1,89 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +//! TraceLogging ETW event emission for MXC telemetry. +//! +//! Event-specific data types and emission functions. The actual ETW +//! write is delegated to the `mxc_telemetry` crate, which adds +//! common fields automatically. + +/// Bounded set of failure categories for error classification. +/// Prevents free-form strings that could contain PII. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum FailureReason { + ConfigError, + PolicyError, + ProcessError, + Timeout, + InitError, + Unknown, +} + +impl FailureReason { + pub fn as_str(&self) -> &'static str { + match self { + Self::ConfigError => "config_error", + Self::PolicyError => "policy_error", + Self::ProcessError => "process_error", + Self::Timeout => "timeout", + Self::InitError => "init_error", + Self::Unknown => "unknown", + } + } +} + +impl std::fmt::Display for FailureReason { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.write_str(self.as_str()) + } +} + +/// Data for an MXC.Execution ETW event. +pub struct ExecutionEvent<'a> { + pub backend: &'a str, + pub exit_code: i32, + pub outcome: &'a str, + pub duration_ms: u64, + pub failure_reason: Option, +} + +/// Log an MXC.Execution ETW event. +/// +/// Delegates to the `mxc_telemetry` provider which adds common fields +/// (Version, Channel, IsDebugging, UTCReplace_AppSessionGuid). +pub fn log_execution(event: &ExecutionEvent<'_>) { + let failure_str = event.failure_reason.map(|r| r.as_str()).unwrap_or(""); + + mxc_telemetry::log_execution( + event.backend, + event.exit_code, + event.outcome, + event.duration_ms, + failure_str, + ); +} + +/// Log an MXC.Error ETW event. +/// +/// To avoid leaking PII (paths, usernames, credentials embedded in error +/// strings), MXC deliberately does **not** emit the free-form error message. +/// The event carries only the bounded `error_type` category and the process +/// `exit_code`. +pub fn log_error(backend: &str, error_type: FailureReason, exit_code: i32) { + mxc_telemetry::log_error(backend, error_type.as_str(), exit_code); +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn failure_reason_as_str() { + assert_eq!(FailureReason::ConfigError.as_str(), "config_error"); + assert_eq!(FailureReason::PolicyError.as_str(), "policy_error"); + assert_eq!(FailureReason::ProcessError.as_str(), "process_error"); + assert_eq!(FailureReason::Timeout.as_str(), "timeout"); + assert_eq!(FailureReason::InitError.as_str(), "init_error"); + assert_eq!(FailureReason::Unknown.as_str(), "unknown"); + } +} diff --git a/src/core/wxc_common/src/telemetry/mod.rs b/src/core/wxc_common/src/telemetry/mod.rs new file mode 100644 index 00000000..e05ef10a --- /dev/null +++ b/src/core/wxc_common/src/telemetry/mod.rs @@ -0,0 +1,219 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +//! TraceLogging ETW telemetry for MXC. +//! +//! Provides structured event emission for execution observability +//! and adoption metrics. Events are emitted to the local ETW subsystem +//! via the `mxc_telemetry` crate (pure Rust, using the `tracelogging` +//! crate). Every event includes common fields (Version, Channel, +//! IsDebugging, UTCReplace_AppSessionGuid) as Part C custom event data. +//! +//! On non-Windows platforms, all telemetry functions are no-ops. + +pub mod events; + +use std::time::Duration; + +use crate::logger::Logger; +use crate::models::{ContainmentBackend, FailurePhase, ScriptResponse, TelemetryConfig}; + +pub use events::{log_error, log_execution, ExecutionEvent, FailureReason}; + +/// MXC version string, set at compile time. +const MXC_VERSION: &str = env!("CARGO_PKG_VERSION"); + +/// Build channel — `"dev"` for debug builds, `"release"` for release builds. +#[cfg(debug_assertions)] +const MXC_CHANNEL: &str = "dev"; +#[cfg(not(debug_assertions))] +const MXC_CHANNEL: &str = "release"; + +/// Returns the MXC version string. +pub fn version() -> &'static str { + MXC_VERSION +} + +/// Resolve whether telemetry is enabled for this invocation. +/// +/// Resolution: +/// - `experimental.telemetry.enabled` in JSON config — explicit override. +/// - Default: off (telemetry requires explicit opt-in). +/// +/// Note: Consent is the SDK consumer's responsibility. MXC does not implement +/// consent prompts or persistent consent storage. +pub fn is_enabled(config: &TelemetryConfig) -> bool { + config.enabled.unwrap_or(false) +} + +/// Initialize the TraceLogging ETW provider. +/// +/// If telemetry is enabled, registers the `Microsoft.MXC` provider with ETW. +/// Returns `true` if telemetry was activated, `false` if disabled or if +/// registration failed. +/// +/// Registration failures never affect execution: they are logged as a +/// diagnostic via the supplied [`Logger`] (so the failure is visible on the +/// console when running with diagnostics) and otherwise swallowed — the caller +/// simply proceeds with telemetry inactive. ETW is Windows-only; on other +/// platforms `mxc_telemetry::init` is a no-op stub that always returns `false`, +/// which is expected rather than a failure, so no diagnostic is emitted there. +pub fn init(config: &TelemetryConfig, logger: &mut Logger) -> bool { + if !is_enabled(config) { + return false; + } + + let activated = mxc_telemetry::init(MXC_VERSION, MXC_CHANNEL); + if !activated && cfg!(target_os = "windows") { + logger + .log_line("telemetry: ETW provider registration failed; continuing without telemetry"); + } + activated +} + +/// Unregister the TraceLogging ETW provider. +/// +/// Should be called before process exit if `init()` returned `true`. +/// On early-exit paths where `shutdown()` cannot be called, the OS +/// will clean up the provider registration at process termination. +pub fn shutdown() { + mxc_telemetry::shutdown(); +} + +/// Classify a failed execution into a bounded [`FailureReason`]. +fn classify_failure(phase: &FailurePhase) -> FailureReason { + match phase { + FailurePhase::LaunchFailed | FailurePhase::BackendUnavailable => FailureReason::InitError, + FailurePhase::Timeout => FailureReason::Timeout, + FailurePhase::ProcessExited | FailurePhase::None => FailureReason::ProcessError, + } +} + +/// Emit completion telemetry for a finished execution and shut the provider +/// down. No-op when `active` is `false`. +/// +/// This is the single shared emit path for the `wxc` and `lxc` executors: +/// it records an `MXC.Execution` event and, for failures that carry an error +/// message, an `MXC.Error` event (category + exit code only — never the +/// message text), then calls [`shutdown`]. +pub fn emit_completion( + active: bool, + containment: &ContainmentBackend, + response: &ScriptResponse, + elapsed: Duration, +) { + if !active { + return; + } + + let backend = containment.wire_name(); + let failed = response.exit_code != 0; + let outcome = if failed { "failure" } else { "success" }; + let failure_reason = failed.then(|| classify_failure(&response.failure_phase)); + + log_execution(&ExecutionEvent { + backend, + exit_code: response.exit_code, + outcome, + duration_ms: elapsed.as_millis() as u64, + failure_reason, + }); + + // The presence of an error message signals an infrastructure error (as + // opposed to a script that merely exited non-zero). We use it only as a + // boolean signal — the message text itself is never emitted. + if failed && !response.error_message.is_empty() { + log_error( + backend, + classify_failure(&response.failure_phase), + response.exit_code, + ); + } + + shutdown(); +} + +/// Emit failure telemetry for an early-exit path that terminates **before** a +/// runner produces a [`ScriptResponse`], then shut the provider down. No-op +/// when `active` is `false`. +/// +/// One-shot executors validate configuration and select a backend before +/// running; failures there call `process::exit` directly and would otherwise +/// bypass [`emit_completion`] entirely. This records an `MXC.Execution` event +/// (exit code 1, `failure` outcome) plus an `MXC.Error` event carrying the +/// bounded `reason` category and exit code, so config/policy/init failures are +/// observable. `duration_ms` is reported as `0` because no execution occurred. +pub fn emit_early_exit(active: bool, containment: &ContainmentBackend, reason: FailureReason) { + if !active { + return; + } + + let backend = containment.wire_name(); + + log_execution(&ExecutionEvent { + backend, + exit_code: 1, + outcome: "failure", + duration_ms: 0, + failure_reason: Some(reason), + }); + + log_error(backend, reason, 1); + + shutdown(); +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn is_enabled_explicit_true() { + let config = TelemetryConfig { + enabled: Some(true), + }; + assert!(is_enabled(&config)); + } + + #[test] + fn is_enabled_explicit_false() { + let config = TelemetryConfig { + enabled: Some(false), + }; + assert!(!is_enabled(&config)); + } + + #[test] + fn is_enabled_default_off() { + let config = TelemetryConfig::default(); + assert!(!is_enabled(&config)); + } + + #[test] + fn version_is_not_empty() { + assert!(!version().is_empty()); + } + + #[test] + fn classify_failure_maps_all_phases() { + // Backend/launch failures classify as init errors. + assert_eq!( + classify_failure(&FailurePhase::LaunchFailed), + FailureReason::InitError + ); + assert_eq!( + classify_failure(&FailurePhase::BackendUnavailable), + FailureReason::InitError + ); + // A process that ran and exited (or an unclassified failure) is a + // process error. + assert_eq!( + classify_failure(&FailurePhase::ProcessExited), + FailureReason::ProcessError + ); + assert_eq!( + classify_failure(&FailurePhase::None), + FailureReason::ProcessError + ); + } +} diff --git a/src/core/wxc_common/src/wire.rs b/src/core/wxc_common/src/wire.rs index 4e36cc49..f6eeab02 100644 --- a/src/core/wxc_common/src/wire.rs +++ b/src/core/wxc_common/src/wire.rs @@ -385,6 +385,18 @@ pub struct Experimental { /// Seatbelt backend config (pre-promotion alias). #[serde(alias = "macos_sandbox")] pub seatbelt: Option, + /// Telemetry configuration. + pub telemetry: Option, +} + +/// Telemetry configuration (`experimental.telemetry`). +#[derive(Debug, Clone, Serialize, Deserialize)] +#[cfg_attr(feature = "schema-gen", derive(schemars::JsonSchema))] +#[serde(rename_all = "camelCase")] +pub struct Telemetry { + /// Explicit telemetry override. `true` = force on, `false` = force off, + /// omitted = disabled (default off). + pub enabled: Option, } /// Placeholder experimental feature. diff --git a/src/mxc_telemetry/Cargo.toml b/src/mxc_telemetry/Cargo.toml new file mode 100644 index 00000000..5b3e29e6 --- /dev/null +++ b/src/mxc_telemetry/Cargo.toml @@ -0,0 +1,18 @@ +[package] +name = "mxc_telemetry" +version.workspace = true +edition.workspace = true +description = "Pure Rust TraceLogging ETW telemetry for MXC" + +[target.'cfg(windows)'.dependencies] +tracelogging = { workspace = true } + +# `uuid` is used only by the provider-GUID validation in `provider_codegen.rs`, +# which is `include!()`'d into `build.rs` (build-dependency) and the crate's +# `#[cfg(test)]` module (dev-dependency). It is intentionally NOT a normal +# runtime dependency. +[build-dependencies] +uuid = { workspace = true } + +[dev-dependencies] +uuid = { workspace = true } diff --git a/src/mxc_telemetry/build.rs b/src/mxc_telemetry/build.rs new file mode 100644 index 00000000..b42375ec --- /dev/null +++ b/src/mxc_telemetry/build.rs @@ -0,0 +1,46 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +//! Build script for `mxc_telemetry`. +//! +//! Generates `provider_def.rs` containing the `define_provider!` invocation. +//! The `MXC_TELEMETRY_PROVIDER_GROUP_GUID` environment variable controls +//! whether a `group_id(...)` option is included — internal Microsoft builds +//! set this to the Microsoft telemetry group GUID so events route through the +//! telemetry pipeline, while public/OSS builds omit it (plain ETW only). +//! +//! The provider GUID itself is **not** specified here. The `tracelogging` +//! crate derives it deterministically from the provider name +//! (`"Microsoft.MXC"`) using the standard ETW name-hash algorithm — the same +//! algorithm used by ``, WIL's +//! `IMPLEMENT_TRACELOGGING_CLASS`, and .NET's `EventSource`. For +//! `"Microsoft.MXC"` the derived GUID is +//! `{7f10def4-a258-5fea-510e-2c3bb976687f}`. Keeping the name and GUID in +//! lockstep this way prevents drift and avoids hard-coding a literal. +//! +//! The pure code-generation logic lives in `provider_codegen.rs` so it can be +//! unit-tested from `lib.rs` (Cargo never runs build-script test modules). + +include!("provider_codegen.rs"); + +fn main() { + println!("cargo::rerun-if-env-changed=MXC_TELEMETRY_PROVIDER_GROUP_GUID"); + + let out = std::path::PathBuf::from(std::env::var("OUT_DIR").unwrap()); + + // The `tracelogging` provider only emits on Windows; on every other target + // the crate compiles to no-ops. Honor (and validate) the group GUID only + // for Windows builds so a stray or malformed environment value cannot break + // cross-platform builds — e.g. a CI host that exports the variable globally + // while cross-compiling the Linux/macOS binaries. + let target_os = std::env::var("CARGO_CFG_TARGET_OS").unwrap_or_default(); + let group_guid = if target_os == "windows" { + std::env::var("MXC_TELEMETRY_PROVIDER_GROUP_GUID").ok() + } else { + None + }; + + let provider_def = generate_provider_def(group_guid.as_deref()); + + std::fs::write(out.join("provider_def.rs"), provider_def).unwrap(); +} diff --git a/src/mxc_telemetry/provider_codegen.rs b/src/mxc_telemetry/provider_codegen.rs new file mode 100644 index 00000000..cbb27486 --- /dev/null +++ b/src/mxc_telemetry/provider_codegen.rs @@ -0,0 +1,70 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +// Pure provider-definition code generation shared between `build.rs` and the +// crate's unit tests. +// +// Cargo never runs `#[cfg(test)]` modules inside a build script, so the logic +// lives here and is pulled into both `build.rs` and a `#[cfg(test)]` module in +// `lib.rs` via `include!`. That keeps the GUID validation and code-generation +// behaviour unit-testable with `cargo test`. + +/// Parses `s` as a strict, canonical hyphenated GUID +/// (`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`) and returns its lowercase canonical +/// form. +/// +/// Validation is delegated to the `uuid` crate. Because `uuid`'s parser is +/// lenient (it also accepts braced `{...}`, `urn:uuid:`, and unhyphenated +/// 32-hex forms), we additionally require the input to already be in the +/// canonical hyphenated shape (case-insensitively). This keeps the accepted +/// grammar identical to the original hand-rolled validator and guarantees the +/// returned string is a bare hyphenated GUID — safe to interpolate into the +/// generated Rust source that is `include!()`'d. +fn canonicalize_guid(s: &str) -> Option { + let canonical = uuid::Uuid::try_parse(s).ok()?.as_hyphenated().to_string(); + s.eq_ignore_ascii_case(&canonical).then_some(canonical) +} + +/// Validates that `s` is a well-formed, canonical hyphenated GUID +/// (`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`). Prevents code injection via the +/// environment variable since the value is interpolated into generated Rust +/// source that is `include!()`'d. +/// +/// Only referenced from the unit tests; `generate_provider_def` calls +/// `canonicalize_guid` directly. `allow(dead_code)` keeps the build script +/// (which `include!`s this file but never calls the helper) warning-clean. +#[allow(dead_code)] +fn is_valid_guid(s: &str) -> bool { + canonicalize_guid(s).is_some() +} + +/// Generate the `tracelogging::define_provider!` invocation that is written to +/// `provider_def.rs`. +/// +/// When `group_guid` is a non-empty, well-formed GUID the provider joins that +/// ETW provider group (internal Microsoft builds route through the telemetry +/// pipeline); otherwise a plain provider definition is produced (public/OSS +/// builds — local ETW only). The GUID is emitted in its canonical lowercase +/// hyphenated form. +/// +/// # Panics +/// +/// Panics if `group_guid` is `Some(non-empty)` but not a valid GUID, so a +/// malformed value fails the build rather than emitting invalid generated +/// source. +fn generate_provider_def(group_guid: Option<&str>) -> String { + match group_guid { + Some(guid) if !guid.is_empty() => { + let canonical = canonicalize_guid(guid) + .expect("MXC_TELEMETRY_PROVIDER_GROUP_GUID is not a valid GUID"); + format!( + "tracelogging::define_provider!(\ + MXC_PROVIDER, \"Microsoft.MXC\", \ + group_id(\"{canonical}\"));\n" + ) + } + _ => "tracelogging::define_provider!(\ + MXC_PROVIDER, \"Microsoft.MXC\");\n" + .to_string(), + } +} diff --git a/src/mxc_telemetry/src/lib.rs b/src/mxc_telemetry/src/lib.rs new file mode 100644 index 00000000..93e552a7 --- /dev/null +++ b/src/mxc_telemetry/src/lib.rs @@ -0,0 +1,345 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +//! Pure Rust TraceLogging ETW telemetry for MXC. +//! +//! This crate provides the `Microsoft.MXC` ETW TraceLogging provider using the +//! [`tracelogging`](https://crates.io/crates/tracelogging) crate — no C++ shim +//! or WIL dependency required. +//! +//! # Platform behaviour +//! +//! - **Windows**: Events are emitted via the ETW `EventWriteTransfer` API. +//! Each event includes a `COMMON_MXC_PARAMS` struct containing Version, +//! Channel, IsDebugging, and `UTCReplace_AppSessionGuid`. +//! - **Non-Windows**: All functions are no-ops — the `tracelogging` crate +//! compiles but produces no-op implementations on non-Windows targets. +//! +//! # Thread safety +//! +//! Provider state (version/channel strings) is stored in a `OnceLock` and is +//! immutable after first initialisation. The `tracelogging` provider is +//! internally synchronised. + +// --------------------------------------------------------------------------- +// Windows provider implementation +// --------------------------------------------------------------------------- + +#[cfg(target_os = "windows")] +mod provider { + use std::sync::Mutex; + use std::sync::OnceLock; + + // Provider definition generated by build.rs — includes `id(...)` and + // optionally `group_id(...)` for internal Microsoft builds. + include!(concat!(env!("OUT_DIR"), "/provider_def.rs")); + + /// Sampling keyword for Measures-level telemetry. + /// Same value as WIL's `traceloggingconfig.h` `MICROSOFT_KEYWORD_MEASURES`. + pub(crate) const MICROSOFT_KEYWORD_MEASURES: u64 = 0x0000_4000_0000_0000; + + /// Privacy data tag for Product and Service Usage. + /// Same value as WIL's `MicrosoftTelemetry.h` `PDT_ProductAndServiceUsage`. + /// Applied via a `PartA_PrivTags` field per the `TelemetryPrivacyDataTag` pattern. + pub(crate) const PDT_PRODUCT_AND_SERVICE_USAGE: u64 = 0x0000_0000_0200_0000; + + /// Cached provider state, set once at init and read on every event. + struct ProviderState { + version: String, + channel: String, + } + + static STATE: OnceLock = OnceLock::new(); + + /// Tracks whether the ETW provider is currently registered. + /// + /// A `Mutex` (rather than an `AtomicBool`) is used so the wrapper's view of + /// registration is updated **under the same lock that serialises the + /// underlying `register`/`unregister` ETW calls**. This prevents a + /// concurrent `init`/`shutdown` race from leaving the wrapper flag and the + /// provider's real state out of sync (e.g. flag set to registered after a + /// `register()` that actually failed, or a double register/unregister). + static REGISTERED: Mutex = Mutex::new(false); + + /// Register the ETW provider and cache version/channel strings. + /// + /// Returns `true` if the provider is registered on return. A `true` return + /// means the provider is registered — it does *not* guarantee an ETW + /// session is actively listening. Returns `false` if the underlying + /// `register()` call reported a non-zero Win32 status, in which case the + /// provider is left unregistered so a later call may retry and callers + /// treat telemetry as inactive. + /// + /// **Note:** Version and channel are captured on the **first** call only. + /// Subsequent calls (even after [`shutdown`]) reuse the original values + /// because the backing `OnceLock` cannot be reset. + pub fn init(version: &str, channel: &str) -> bool { + STATE.get_or_init(|| ProviderState { + version: version.to_owned(), + channel: channel.to_owned(), + }); + + let mut registered = REGISTERED.lock().unwrap_or_else(|e| e.into_inner()); + if *registered { + // Already registered — the tracelogging crate panics on double + // register, so we must not call `register()` again. + return true; + } + + // SAFETY: MXC_PROVIDER is a process-lifetime static. MXC is an + // executable (not a DLL), so unload ordering is not a concern. + let status = unsafe { MXC_PROVIDER.register() }; + if status != 0 { + // Registration failed; leave `*registered` false so the wrapper + // state matches reality and a later attempt can retry. + return false; + } + + *registered = true; + true + } + + /// Unregister the ETW provider. + pub fn shutdown() { + let mut registered = REGISTERED.lock().unwrap_or_else(|e| e.into_inner()); + if *registered { + MXC_PROVIDER.unregister(); + *registered = false; + } + } + + /// Emit an `MXC.Execution` ETW event. + pub fn log_execution( + backend: &str, + exit_code: i32, + outcome: &str, + duration_ms: u64, + failure_reason: &str, + ) { + let state = match STATE.get() { + Some(s) => s, + None => return, + }; + + // Compile-time flag — true for debug builds, false for release. + // This is NOT a runtime IsDebuggerPresent() check; it mirrors + // the MXC_CHANNEL logic for cross-platform consistency. + let is_debug_build = cfg!(debug_assertions); + + tracelogging::write_event!( + MXC_PROVIDER, + "MXC.Execution", + // Informational: every completion (success or failure) is a routine + // "what happened" record, not a fault. Severity is reserved for + // MXC.Error (Warning) and any future provider malfunction. + level(Informational), + keyword(MICROSOFT_KEYWORD_MEASURES), + u64("PartA_PrivTags", &PDT_PRODUCT_AND_SERVICE_USAGE), + struct("COMMON_MXC_PARAMS", { + str8("Version", &state.version), + str8("Channel", &state.channel), + bool8("IsDebugging", &is_debug_build), + bool8("UTCReplace_AppSessionGuid", &true), + }), + str8("mxc.backend", backend), + i32("mxc.exit_code", &exit_code), + str8("mxc.outcome", outcome), + u64("mxc.duration_ms", &duration_ms), + str8("mxc.failure_reason", failure_reason), + ); + } + + /// Emit an `MXC.Error` ETW event. + /// + /// By design this event carries **no free-form error text** — only the + /// bounded `error_type` category and the process `exit_code`. This keeps + /// telemetry free of PII (paths, usernames, credentials) that error + /// strings can contain. + pub fn log_error(backend: &str, error_type: &str, exit_code: i32) { + let state = match STATE.get() { + Some(s) => s, + None => return, + }; + + let is_debug_build = cfg!(debug_assertions); + + tracelogging::write_event!( + MXC_PROVIDER, + "MXC.Error", + // Warning, not Error/Critical: this reports an expected operational + // failure of a *sandboxed run* (e.g. the user's script failed, a + // backend was unavailable, a missing/rejected config) — not a + // defect in MXC itself. Downstream pipelines treat Error/Critical as + // product faults that feed reliability alerting, so those levels are + // reserved for genuine provider/telemetry malfunctions. + level(Warning), + keyword(MICROSOFT_KEYWORD_MEASURES), + u64("PartA_PrivTags", &PDT_PRODUCT_AND_SERVICE_USAGE), + struct("COMMON_MXC_PARAMS", { + str8("Version", &state.version), + str8("Channel", &state.channel), + bool8("IsDebugging", &is_debug_build), + bool8("UTCReplace_AppSessionGuid", &true), + }), + str8("mxc.backend", backend), + str8("mxc.error_type", error_type), + i32("mxc.exit_code", &exit_code), + ); + } +} + +// --------------------------------------------------------------------------- +// Non-Windows no-op stubs +// --------------------------------------------------------------------------- + +#[cfg(not(target_os = "windows"))] +mod provider { + pub fn init(_version: &str, _channel: &str) -> bool { + false + } + + pub fn shutdown() {} + + pub fn log_execution( + _backend: &str, + _exit_code: i32, + _outcome: &str, + _duration_ms: u64, + _failure_reason: &str, + ) { + } + + pub fn log_error(_backend: &str, _error_type: &str, _exit_code: i32) {} +} + +pub use provider::*; + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +#[cfg(test)] +mod tests { + use super::*; + use std::sync::Mutex; + + /// The provider uses global state (`OnceLock`), so tests that call + /// `init`/`shutdown` must not run concurrently. + static TEST_LOCK: Mutex<()> = Mutex::new(()); + + #[test] + fn init_shutdown_roundtrip() { + let _lock = TEST_LOCK.lock().unwrap_or_else(|e| e.into_inner()); + let ok = init("0.0.0-test", "dev"); + if cfg!(target_os = "windows") { + assert!(ok, "init should succeed on Windows"); + } else { + assert!(!ok, "init should be a no-op on non-Windows"); + } + shutdown(); + } + + #[test] + fn double_init_is_safe() { + let _lock = TEST_LOCK.lock().unwrap_or_else(|e| e.into_inner()); + let _ = init("0.0.0-test", "dev"); + let _ = init("0.0.0-test", "dev"); + shutdown(); + } + + #[test] + fn shutdown_without_init() { + let _lock = TEST_LOCK.lock().unwrap_or_else(|e| e.into_inner()); + shutdown(); + } + + #[test] + fn log_execution_after_init() { + let _lock = TEST_LOCK.lock().unwrap_or_else(|e| e.into_inner()); + let _ = init("0.0.0-test", "dev"); + log_execution("test_backend", 0, "success", 100, ""); + shutdown(); + } + + #[test] + fn log_error_after_init() { + let _lock = TEST_LOCK.lock().unwrap_or_else(|e| e.into_inner()); + let _ = init("0.0.0-test", "dev"); + log_error("test_backend", "config_error", 1); + shutdown(); + } + + #[test] + fn log_without_init() { + let _lock = TEST_LOCK.lock().unwrap_or_else(|e| e.into_inner()); + log_execution("test_backend", 0, "success", 50, "none"); + log_error("test_backend", "unknown", 1); + } + + #[test] + fn log_after_shutdown() { + let _lock = TEST_LOCK.lock().unwrap_or_else(|e| e.into_inner()); + let _ = init("0.0.0-test", "dev"); + shutdown(); + log_execution("test_backend", 1, "failure", 200, "timeout"); + log_error("test_backend", "process_error", 1); + } + + #[test] + fn handles_empty_strings() { + let _lock = TEST_LOCK.lock().unwrap_or_else(|e| e.into_inner()); + let _ = init("", ""); + log_execution("", 0, "", 0, ""); + log_error("", "", 0); + shutdown(); + } +} + +// --------------------------------------------------------------------------- +// Provider code-generation tests +// --------------------------------------------------------------------------- +// +// The build script's GUID validation and `define_provider!` code generation +// live in `provider_codegen.rs`. Cargo never runs `#[cfg(test)]` modules inside +// a build script, so we `include!` the same source here to exercise it under +// `cargo test`. +#[cfg(test)] +mod provider_codegen_tests { + include!("../provider_codegen.rs"); + + #[test] + fn absent_guid_yields_plain_provider() { + let def = generate_provider_def(None); + assert!(def.contains("\"Microsoft.MXC\"")); + assert!(!def.contains("group_id")); + } + + #[test] + fn empty_guid_yields_plain_provider() { + let def = generate_provider_def(Some("")); + assert!(!def.contains("group_id")); + } + + #[test] + fn valid_guid_adds_group_id() { + let def = generate_provider_def(Some("7f10def4-a258-5fea-510e-2c3bb976687f")); + assert!(def.contains("group_id(\"7f10def4-a258-5fea-510e-2c3bb976687f\")")); + } + + #[test] + #[should_panic(expected = "not a valid GUID")] + fn invalid_guid_panics() { + let _ = generate_provider_def(Some("not-a-valid-guid")); + } + + #[test] + fn guid_validation_accepts_and_rejects() { + assert!(is_valid_guid("7f10def4-a258-5fea-510e-2c3bb976687f")); + // Missing dashes. + assert!(!is_valid_guid("7f10def4a2585fea510e2c3bb976687f")); + // Non-hex characters. + assert!(!is_valid_guid("zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz")); + // Wrong segment lengths. + assert!(!is_valid_guid("7f10def-a258-5fea-510e-2c3bb976687f")); + } +} diff --git a/src/testing/wxc_e2e_tests/src/lib.rs b/src/testing/wxc_e2e_tests/src/lib.rs index 0f390b24..6839fb28 100644 --- a/src/testing/wxc_e2e_tests/src/lib.rs +++ b/src/testing/wxc_e2e_tests/src/lib.rs @@ -438,7 +438,7 @@ fn command_result(label: &str, output: Output, wall_time_ms: u128) -> CommandRes } } -/// Run `wxc-exec.exe` with the supplied config file and extra arguments. +/// Run `wxc-exec.exe` with a config file from `tests/configs/` and extra arguments. pub fn run_wxc_config(config_file: &str, extra_args: &[&str]) -> CommandResult { let exe = find_binary("wxc-exec.exe").expect("wxc-exec.exe should be available"); let config = test_configs_dir().join(config_file); @@ -448,6 +448,16 @@ pub fn run_wxc_config(config_file: &str, extra_args: &[&str]) -> CommandResult { run_executable(config_file, &exe, args) } +/// Run `wxc-exec.exe` with a config file from `tests/examples/` and extra arguments. +pub fn run_wxc_example(config_file: &str, extra_args: &[&str]) -> CommandResult { + let exe = find_binary("wxc-exec.exe").expect("wxc-exec.exe should be available"); + let config = examples_dir().join(config_file); + let mut args: Vec = extra_args.iter().map(|arg| (*arg).to_string()).collect(); + args.push(config.display().to_string()); + + run_executable(config_file, &exe, args) +} + /// Run `wxc-exec.exe` with a state-aware request envelope. The JSON value is /// serialised, base64-encoded, and passed via `--config-base64`. Used by the /// state-aware smoke tests. diff --git a/src/testing/wxc_e2e_tests/tests/e2e_windows.rs b/src/testing/wxc_e2e_tests/tests/e2e_windows.rs index 29840994..ef91e103 100644 --- a/src/testing/wxc_e2e_tests/tests/e2e_windows.rs +++ b/src/testing/wxc_e2e_tests/tests/e2e_windows.rs @@ -18,7 +18,7 @@ use wxc_e2e_tests::{ assert_exit, assert_pwsh, assert_python, assert_success, assert_success_or_skip_missing_prerequisite, examples_dir, find_binary, has_daemon, has_hyperlight_snapshot, has_nanvix_binaries, has_test_driver, has_windows_sandbox_feature, - has_wxc_exe, repo_root, run_test_driver, run_wxc_config, run_wxc_config_value, + has_wxc_exe, repo_root, run_test_driver, run_wxc_config, run_wxc_config_value, run_wxc_example, run_wxc_state_aware, test_configs_dir, TempDirs, }; @@ -442,6 +442,41 @@ fn test_on_repeat() { }); } +// --------------------------------------------------------------------------- +// Telemetry tests +// --------------------------------------------------------------------------- + +fn telemetry_enabled() { + let result = run_wxc_example("28_telemetry_enabled.json", &["--debug", "--experimental"]); + assert_success_or_skip_missing_prerequisite(&result); +} + +fn telemetry_disabled() { + // Run a basic config without telemetry — verifies the disabled path doesn't + // regress when telemetry code is linked in. + assert_wxc_success("basic_processcontainer.json", &["--debug"]); +} + +#[test] +#[ignore] // Requires AppContainer support +fn test_telemetry_enabled() { + if !cached_has_wxc_exe() { + return; + } + assert_python(); + with_test_lock(telemetry_enabled); +} + +#[test] +#[ignore] // Requires AppContainer support +fn test_telemetry_disabled() { + if !cached_has_wxc_exe() { + return; + } + assert_python(); + with_test_lock(telemetry_disabled); +} + // --------------------------------------------------------------------------- // Windows Sandbox suite // --------------------------------------------------------------------------- diff --git a/tests/examples/28_telemetry_enabled.json b/tests/examples/28_telemetry_enabled.json new file mode 100644 index 00000000..7ac84537 --- /dev/null +++ b/tests/examples/28_telemetry_enabled.json @@ -0,0 +1,12 @@ +{ + "$schema": "../../schemas/dev/mxc-config.schema.0.8.0-dev.json", + "containment": "processcontainer", + "process": { + "commandLine": "cmd.exe /c echo Hello from telemetry-enabled sandbox" + }, + "experimental": { + "telemetry": { + "enabled": true + } + } +} diff --git a/tests/scripts/run_telemetry_etw_smoke_test.ps1 b/tests/scripts/run_telemetry_etw_smoke_test.ps1 new file mode 100644 index 00000000..7c024394 --- /dev/null +++ b/tests/scripts/run_telemetry_etw_smoke_test.ps1 @@ -0,0 +1,213 @@ +<# +.SYNOPSIS + ETW capture smoke test for MXC telemetry. + +.DESCRIPTION + Starts an ETW trace session targeting the MXC public provider GUID, + runs wxc-exec with telemetry enabled, stops the session, and verifies + that at least one event was captured. + + This test uses the PUBLIC provider GUID (already in the open-source + code) — it does NOT depend on or reveal the private telemetry group GUID. + + Requires: Administrator privileges (for ETW session creation), + wxc-exec.exe built, logman.exe (ships with Windows). + + Run from the repo root. +#> + +[CmdletBinding()] +param( + [switch]$SkipClean +) + +$ErrorActionPreference = 'Stop' + +# MXC public provider name. The provider GUID is derived deterministically from +# this name by `tracelogging::define_provider!` using the standard ETW name-hash +# algorithm (the same algorithm used by , WIL's +# IMPLEMENT_TRACELOGGING_CLASS, and .NET's EventSource). We compute the GUID from +# the name here rather than hard-coding a literal, so the test stays in lockstep +# with the provider name and never embeds a magic constant. +$providerName = 'Microsoft.MXC' + +function Get-TraceLoggingProviderGuid { + param([Parameter(Mandatory)][string]$Name) + + # EventSource/TraceLogging name->GUID: SHA1 over a fixed namespace seed + # followed by the UTF-16BE bytes of the upper-cased name; first 16 bytes of + # the digest become the GUID with the version nibble forced to 5. + $seed = [byte[]]@( + 0x48, 0x2C, 0x2D, 0xB2, 0xC3, 0x90, 0x47, 0xC8, + 0x87, 0xF8, 0x1A, 0x15, 0xBF, 0xC1, 0x30, 0xFB + ) + $nameBytes = [System.Text.Encoding]::BigEndianUnicode.GetBytes($Name.ToUpperInvariant()) + $buffer = New-Object byte[] ($seed.Length + $nameBytes.Length) + [Array]::Copy($seed, 0, $buffer, 0, $seed.Length) + [Array]::Copy($nameBytes, 0, $buffer, $seed.Length, $nameBytes.Length) + + $sha1 = [System.Security.Cryptography.SHA1]::Create() + try { + $hash = $sha1.ComputeHash($buffer) + } finally { + $sha1.Dispose() + } + + $guidBytes = New-Object byte[] 16 + [Array]::Copy($hash, 0, $guidBytes, 0, 16) + $guidBytes[7] = ($guidBytes[7] -band 0x0F) -bor 0x50 + return '{' + ([guid]::new($guidBytes)).ToString() + '}' +} + +$providerGuid = Get-TraceLoggingProviderGuid -Name $providerName +$sessionName = 'MxcTelemetryTest' +$repoRoot = Split-Path -Parent (Split-Path -Parent $PSScriptRoot) + +Write-Host "=== MXC ETW Capture Smoke Test ===" -ForegroundColor Cyan +Write-Host "Provider: $providerName $providerGuid" + +# --------------------------------------------------------------------------- +# Pre-flight: elevation check +# --------------------------------------------------------------------------- +$identity = [Security.Principal.WindowsIdentity]::GetCurrent() +$principal = New-Object Security.Principal.WindowsPrincipal($identity) +if (-not $principal.IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)) { + Write-Host "SKIPPED: this test requires Administrator privileges for ETW session creation." -ForegroundColor Yellow + exit 0 +} + +# --------------------------------------------------------------------------- +# Pre-flight: locate wxc-exec.exe +# --------------------------------------------------------------------------- +$srcDir = Join-Path $repoRoot 'src' +$candidates = @( + (Join-Path $srcDir 'target\debug\wxc-exec.exe'), + (Join-Path $srcDir 'target\release\wxc-exec.exe'), + (Join-Path $srcDir 'target\x86_64-pc-windows-msvc\debug\wxc-exec.exe'), + (Join-Path $srcDir 'target\x86_64-pc-windows-msvc\release\wxc-exec.exe'), + (Join-Path $srcDir 'target\aarch64-pc-windows-msvc\debug\wxc-exec.exe'), + (Join-Path $srcDir 'target\aarch64-pc-windows-msvc\release\wxc-exec.exe') +) +$wxcExe = $candidates | Where-Object { Test-Path $_ } | Select-Object -First 1 +if (-not $wxcExe) { + Write-Host "SKIPPED: wxc-exec.exe not found. Build first with build.bat." -ForegroundColor Yellow + exit 0 +} +Write-Host "Using wxc-exec: $wxcExe" + +# --------------------------------------------------------------------------- +# Pre-flight: locate telemetry example config +# --------------------------------------------------------------------------- +$configFile = Join-Path $repoRoot 'tests\examples\28_telemetry_enabled.json' +if (-not (Test-Path $configFile)) { + throw "Config not found: $configFile" +} + +# --------------------------------------------------------------------------- +# Setup: ETL output path +# --------------------------------------------------------------------------- +$etlDir = Join-Path $env:TEMP 'mxc_etw_test' +if (Test-Path $etlDir) { Remove-Item -Recurse -Force $etlDir } +New-Item -ItemType Directory -Path $etlDir -Force | Out-Null +$etlFile = Join-Path $etlDir 'mxc_trace.etl' + +# --------------------------------------------------------------------------- +# Step 1: Start ETW trace session +# --------------------------------------------------------------------------- +Write-Host "`n--- Starting ETW trace session '$sessionName' ---" -ForegroundColor Yellow + +# Remove any stale session from a previous interrupted run. +logman stop $sessionName -ets 2>$null | Out-Null +logman delete $sessionName -ets 2>$null | Out-Null + +logman create trace $sessionName -ets -o "$etlFile" -p $providerGuid 2>&1 | Out-Host +if ($LASTEXITCODE -ne 0) { + throw "Failed to create ETW trace session" +} +Write-Host "ETW session started, writing to $etlFile" + +# --------------------------------------------------------------------------- +# Step 2: Run wxc-exec with telemetry enabled +# --------------------------------------------------------------------------- +Write-Host "`n--- Running wxc-exec with telemetry ---" -ForegroundColor Yellow + +try { + # Run with --experimental to enable the telemetry section. The provider is + # registered during init (before execution); the MXC.Execution / MXC.Error + # events are emitted on completion, after the runner returns. The sandbox + # itself may fail (e.g. AppContainer prerequisites), but completion + # telemetry still fires for the failure, so events should be captured. + $proc = Start-Process -FilePath $wxcExe ` + -ArgumentList "--debug", "--experimental", $configFile ` + -PassThru -NoNewWindow -Wait + Write-Host "wxc-exec exited with code $($proc.ExitCode)" +} catch { + Write-Host "wxc-exec failed to run: $_" -ForegroundColor Yellow + # Continue — even a crash after init may have emitted events. +} + +# Brief pause for ETW buffers to flush. +Start-Sleep -Seconds 2 + +# --------------------------------------------------------------------------- +# Step 3: Stop ETW trace session +# --------------------------------------------------------------------------- +Write-Host "`n--- Stopping ETW trace session ---" -ForegroundColor Yellow +logman stop $sessionName -ets 2>&1 | Out-Host + +# --------------------------------------------------------------------------- +# Step 4: Validate captured events +# --------------------------------------------------------------------------- +Write-Host "`n--- Validating captured events ---" -ForegroundColor Yellow + +if (-not (Test-Path $etlFile)) { + throw "ETL file not found: $etlFile" +} + +$etlSize = (Get-Item $etlFile).Length +Write-Host "ETL file size: $etlSize bytes" + +if ($etlSize -eq 0) { + Write-Host "FAILED: ETL file is empty — no events captured." -ForegroundColor Red + Write-Host "All prerequisites were met (admin, wxc-exec present, ETW session created)," -ForegroundColor Red + Write-Host "so the provider should have emitted at least one completion event." -ForegroundColor Red + exit 1 +} + +# Convert .etl to XML for inspection. +$xmlFile = Join-Path $etlDir 'mxc_trace.xml' +tracerpt "$etlFile" -o "$xmlFile" -of XML -y 2>&1 | Out-Host + +if (-not (Test-Path $xmlFile)) { + throw "tracerpt failed to produce XML output" +} + +$xmlContent = Get-Content -Path $xmlFile -Raw +$eventCount = ([regex]::Matches($xmlContent, '