Skip to content

fix(backends): stop swallowing WinRT/decode failures that corrupt sandbox behavior#570

Open
MGudgin wants to merge 2 commits into
mainfrom
user/gudge/fix-backend-silent-failures
Open

fix(backends): stop swallowing WinRT/decode failures that corrupt sandbox behavior#570
MGudgin wants to merge 2 commits into
mainfrom
user/gudge/fix-backend-silent-failures

Conversation

@MGudgin

@MGudgin MGudgin commented Jun 25, 2026

Copy link
Copy Markdown
Member

Summary

Fixes three silent failure-swallowing bugs in the IsolationSession and Windows
Sandbox backends, surfaced by an idiomatic-axis audit. In each case a failed
WinRT accessor or a non-UTF-8 byte was coerced to a default, turning a real
failure into a false success or lost output.

Details

  • isolation_session/manager.rsIsError() failures treated as success.
    The three err.IsError().unwrap_or(false) lifecycle checks (AddUserAsync /
    AddUserAsync2 / RunProcessWithOptionsAsync) now propagate the WinRT
    accessor failure via map_err(lifecycle_err)? instead of coercing it to
    false and continuing down the success path (which then tried to use
    AgentUserName() / Process() anyway).
  • isolation_session/manager.rs — process handles silently disabling relays.
    OutputHandle() / ErrorHandle() / InputHandle() now propagate getter
    failures instead of .unwrap_or(0). Downstream code treats 0 as "stream
    absent", so a getter failure previously skipped the corresponding stdout/
    stderr/stdin relay, making a process appear to run while producing no output.
    A genuinely returned 0 still means "absent" and is preserved.
  • isolation_session/error.rs — fabricated HRESULT: 0 losing stale-id
    classification.
    format_iso_error now reads Code() first and propagates
    its failure honestly, rather than defaulting to 0, which skipped the
    ERROR_NOT_FOUND check and downgraded a stale sandbox to a generic
    backend_error. Message/Remediation remain best-effort (cosmetic).
  • windows_sandbox_runner.rs — daemon output dropped on non-UTF-8. Decode
    stdout/stderr with String::from_utf8_lossy(...).into_owned() instead of
    from_utf8(...).unwrap_or_default(), so a single invalid byte no longer
    discards the entire stream.

Tests

  • Added parse_daemon_response_preserves_non_utf8_output, asserting the lossy
    decode keeps the surrounding bytes (and a U+FFFD replacement char) instead of
    emptying the stream.
  • cargo test -p isolation_session_common -p windows_sandbox_common — 107 + 17
    pass.
  • cargo clippy -p isolation_session_common -p windows_sandbox_common --all-targets -- -D warnings and cargo fmt -p isolation_session_common -p windows_sandbox_common -- --check both clean. Both crates are Windows-only
    and were verified on a Windows host.

A follow-up PR will address the non-behavioral idiomatic findings from the same
audit (RAII/ownership wrappers, typed errors over String, PathBuf over
String for paths, etc.).

Microsoft Reviewers: Open in CodeFlow

…dbox behavior

This PR fixes three silent failure-swallowing bugs in the IsolationSession and
Windows Sandbox backends where a failed accessor or a non-UTF-8 byte was coerced
to a default, turning a real failure into a false success or lost output.

Details

* isolation_session/manager.rs: the three `err.IsError().unwrap_or(false)`
  lifecycle checks (AddUserAsync / AddUserAsync2 / RunProcessWithOptionsAsync)
  now propagate the WinRT accessor failure via `map_err(lifecycle_err)?` instead
  of coercing it to `false` and proceeding down the success path.
* isolation_session/manager.rs: the `OutputHandle()`/`ErrorHandle()`/
  `InputHandle()` getters now propagate failures instead of `.unwrap_or(0)`,
  which downstream treats as "no handle" and silently skips the stdio relay. A
  genuinely returned 0 still means "absent stream" and is preserved.
* isolation_session/error.rs: `format_iso_error` reads `Code()` first and
  propagates its failure honestly rather than fabricating `code = 0`, which
  skipped the `ERROR_NOT_FOUND` check and downgraded a stale sandbox to a
  generic backend error. `Message`/`Remediation` stay best-effort (cosmetic).
* windows_sandbox_runner.rs: daemon stdout/stderr are decoded with
  `String::from_utf8_lossy` instead of `from_utf8(...).unwrap_or_default()`, so
  a single invalid byte no longer discards the entire stream.

Tests

* Added `parse_daemon_response_preserves_non_utf8_output` asserting the lossy
  decode keeps surrounding bytes (+ U+FFFD) instead of emptying the stream.
* `cargo test -p isolation_session_common -p windows_sandbox_common` — 107 + 17
  pass.
* `cargo clippy -p isolation_session_common -p windows_sandbox_common
  --all-targets -- -D warnings` and `cargo fmt ... -- --check` both clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Generated-with: claude-opus-4.8
@MGudgin MGudgin requested a review from a team as a code owner June 25, 2026 17:51
Copilot AI review requested due to automatic review settings June 25, 2026 17:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens error handling in the Windows Sandbox and IsolationSession backends by removing “false success” paths caused by swallowing WinRT accessor failures and non‑UTF‑8 output bytes, preserving correct failure semantics and daemon output.

Changes:

  • Windows Sandbox daemon response parsing now uses lossy UTF‑8 decoding so invalid bytes don’t drop entire stdout/stderr streams.
  • IsolationSession lifecycle checks now propagate WinRT IsError() getter failures instead of coercing them to “no error”.
  • IsolationSession process stdio handle getters and IsoSessionError::Code() handling now propagate accessor failures instead of silently defaulting (which previously disabled relays / misclassified stale IDs).
Show a summary per file
File Description
src/backends/windows_sandbox/common/src/windows_sandbox_runner.rs Preserve daemon stdout/stderr even when non‑UTF‑8 bytes are present; adds a regression test.
src/backends/isolation_session/common/src/manager.rs Stop coercing WinRT accessor failures (including handle getters) into defaults that mask backend failures.
src/backends/isolation_session/common/src/error.rs Avoid fabricating HRESULT 0; ensure stale-id classification isn’t silently lost on accessor failure.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 4

Comment on lines +349 to +351
let stdout_handle_val = process
.OutputHandle()
.map_err(|e| lifecycle_err(format!("get OutputHandle failed: {}", e)))?;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3fccb46 — prefixed with the op name: RunProcessWithOptionsAsync: get OutputHandle failed (and Error/Input), matching the other error strings in this function.

Comment on lines +352 to +354
let stderr_handle_val = process
.ErrorHandle()
.map_err(|e| lifecycle_err(format!("get ErrorHandle failed: {}", e)))?;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3fccb46 (same fix applied to all three getters).

Comment on lines +355 to +357
let stdin_handle_val = process
.InputHandle()
.map_err(|e| lifecycle_err(format!("get InputHandle failed: {}", e)))?;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3fccb46 (same fix applied to all three getters).

Comment on lines +69 to +77
let code = match err.Code() {
Ok(c) => c.0 as u32,
Err(e) => {
return IsolationSessionError::Lifecycle(format!(
"{} failed: could not read error code (HRESULT accessor failed: {})",
op, e
));
}
};

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3fccb46 — the Code()-failure path now folds in the best-effort Message() text. I left Remediation() out of this degraded path on purpose: it's the least useful of the three for a code-read failure and adds another fallible getter call when we're already in a failure path; Message() carries the signal.

…) on Code() failure

Addresses Copilot review feedback on PR #570.

* Prefix the OutputHandle/ErrorHandle/InputHandle getter error strings with the operation name (RunProcessWithOptionsAsync) to match the other error strings in the function.

* On a Code() accessor failure in format_iso_error, fold in the best-effort Message() text so the failure stays diagnosable. Remediation() is intentionally left out of this degraded path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Generated-with: claude-opus-4.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants