Skip to content

feat: native OS window screenshot for Tauri and Electron#262

Closed
goosewobbler wants to merge 32 commits into
mainfrom
feat/native-screenshots
Closed

feat: native OS window screenshot for Tauri and Electron#262
goosewobbler wants to merge 32 commits into
mainfrom
feat/native-screenshots

Conversation

@goosewobbler
Copy link
Copy Markdown
Contributor

Summary

  • Adds browser.tauri.nativeScreenshot() and browser.electron.nativeScreenshot() — captures the full OS window including title bar, close/minimize buttons, and window decorations (not just the webview content)
  • Tauri: implemented via a new POST /wdio/native-screenshot endpoint in the embedded WebDriver server using the xcap crate (macOS + Windows; unsupported_operation on Linux)
  • Electron: implemented via screencapture -R (macOS) and PowerShell PrintWindow (Windows) — no native Node module, no desktopCapturer (avoids macOS Screen Recording permission in CI)
  • Returns Buffer of PNG bytes; Tauri requires the embedded driver provider

E2E verification — three-layer approach

Layer Runs when Catches
1. PNG structural + dimensions every PR broken capture / webview-only
2. OCR via tesseract.js every PR wrong window / missing fixture content
3. Vision LLM (Ollama, OLLAMA_API_KEY) merge to main only subtle rendering regressions tied to runtime state

Layer 3 requires OLLAMA_API_KEY set as a CI secret (gate via github.event_name == 'push' && github.ref == 'refs/heads/main'). The visionEnabled() guard ensures specs pass on PR runs without the key.

Test plan

  • cargo check passes on macOS (aarch64-apple-darwin) — verified locally
  • pnpm --filter @wdio/tauri-service typecheck — clean
  • pnpm --filter @wdio/electron-service typecheck — clean
  • pnpm --filter e2e typecheck — clean
  • Run e2e/test/tauri/native-screenshot.spec.ts locally on macOS — layers 1+2 pass without API key
  • Run e2e/test/electron/native-screenshot.spec.ts locally on macOS — layers 1+2 pass without API key
  • Add OLLAMA_API_KEY + OLLAMA_BASE_URL to merge-to-main CI job for layer 3

🤖 Generated with Claude Code

goosewobbler and others added 4 commits May 6, 2026 11:40
Add POST /wdio/native-screenshot to the embedded WebDriver server.
Uses xcap 0.9 to capture the full OS window (title bar + decorations)
on macOS and Windows; returns unsupported_operation on Linux.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add nativeScreenshot() to DirectEvalClient, wire it through the
provider-gated command, inject onto the browser object, and declare
it on TauriServiceAPI. Only works with the embedded driver provider.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add nativeScreenshot() using screencapture (macOS) and PowerShell
PrintWindow (Windows). Wired into getElectronAPI() requiring CDP.
Type declared on ElectronServiceAPI in native-types.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add screenshotChecks.ts (PNG structural + tesseract.js OCR) and
visionAssert.ts (Ollama-compatible vision LLM, merge-to-main only).
Specs for both Tauri and Electron assert chrome was captured and that
screenshot content matches known fixture text.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Release Preview — no release

No bump label detected.
Reason: No release labels found (need bump:* or release:stable)
Note: Add bump:patch, bump:minor, or bump:major to trigger a release.


Updated automatically by ReleaseKit

… in nativeScreenshot

The Electron main process supports ESM, so use await import() rather
than require(). Relies on awaitPromise: true in the CDP callFunctionOn
call so the async callback result is awaited before returning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

Adds browser.tauri.nativeScreenshot() and browser.electron.nativeScreenshot() commands that capture the full OS window including title bar and decorations. All previously identified blocking issues — HWND little-endian byte order, GetHdc/ReleaseHdc pairing, PowerShell path escaping, spawnSync exit-code checking, incomplete visionEnabled() guard, and the ambiguous title-only window lookup — have been addressed in this revision.

  • Tauri (Rust): new POST /wdio/native-screenshot endpoint using xcap, gated to macOS/Windows at the Cargo level; Linux returns unsupported_operation; window lookup now filters by PID first, then title.
  • Electron (Node): screencapture -R on macOS and a PowerShell PrintWindow/BitBlt path on Windows, with correct little-endian HWND reading, ReleaseHdc before $b.Save, forward-slash path normalization, and bounded spawnSync timeouts.
  • E2E tests: three-layer verification (PNG structure → OCR via tesseract.js → vision LLM behind visionEnabled()) with appropriate platform/CI skips.

Confidence Score: 5/5

Safe to merge; all previously identified blocking issues in the Electron Windows capture path have been fixed and the Tauri xcap path is well-scoped to macOS/Windows.

The prior round resolved every critical bug: HWND byte order, GetHdc pairing, path escaping, spawnSync error propagation, and the incomplete visionEnabled guard. What remains is a minor cleanup edge case in the finally block.

packages/electron-service/src/commands/nativeScreenshot.ts — the finally/unlinkSync edge case noted above; otherwise all changed files look correct.

Important Files Changed

Filename Overview
packages/electron-service/src/commands/nativeScreenshot.ts New command implementing native Electron window screenshot; previous issues with HWND byte order, GetHdc release, path escaping, and spawnSync error handling are addressed; minor cleanup concern in finally block
packages/tauri-plugin-webdriver/src/platform/macos.rs Implements xcap-based native screenshot for macOS, correctly filtering by PID before title, with a fallback to first process-owned window
packages/tauri-plugin-webdriver/src/platform/windows.rs Implements xcap-based native screenshot for Windows with PID-first matching, title-substring fallback for CI, and detailed diagnostic error output when no window is found
packages/tauri-service/src/commands/nativeScreenshot.ts Routes nativeScreenshot to the embedded Tauri WebDriver server; enforces the embedded-provider requirement and caches the DirectEvalClient per browser instance
packages/tauri-plugin-webdriver/src/server/handlers/native_screenshot.rs New Axum handler for POST /wdio/native-screenshot; window-not-found returns 404 with available labels, unsupported platform returns error via WebDriverErrorResponse::into_response
e2e/lib/visionAssert.ts visionEnabled() now requires both the API key and OLLAMA_BASE_URL; regex tightened to /^(YES
e2e/lib/screenshotChecks.ts Full 8-byte PNG magic validation, OCR helpers via tesseract.js, and assertCapturesChrome; OCR worker lazily initialised and correctly terminated

Sequence Diagram

sequenceDiagram
    participant Test as E2E Test
    participant Service as electron/tauri Service
    participant App as Electron/Tauri App
    participant OS as OS (screencapture / xcap)

    Test->>Service: browser.electron.nativeScreenshot()
    Service->>App: CDP execute: getBounds() + getNativeWindowHandle()
    App-->>Service: "{ bounds, nativeHandle, gpuCompositing }"
    alt macOS
        Service->>OS: spawnSync screencapture -R x,y,w,h out.png
        OS-->>Service: PNG file written
    else Windows (GPU)
        Service->>OS: spawnSync powershell PrintWindow(PW_RENDERFULLCONTENT)
        OS-->>Service: PNG file written
    else Windows (no GPU)
        Service->>OS: spawnSync powershell BitBlt(GetWindowDC)
        OS-->>Service: PNG file written (may be blank under WARP)
    end
    Service->>Service: readFileSync(out) + unlinkSync(out)
    Service-->>Test: Buffer (PNG bytes)

    Test->>Service: browser.tauri.nativeScreenshot()
    Service->>App: "POST /wdio/native-screenshot {window_label}"
    App->>OS: xcap::Window::capture_image()
    OS-->>App: RgbaImage
    App->>App: encode to PNG bytes
    App-->>Service: image/png response
    Service-->>Test: Buffer (PNG bytes)
Loading

Fix All in Claude Code Fix All in Cursor

Reviews (15): Last reviewed commit: "test(e2e): skip tauri native-screenshot ..." | Re-trigger Greptile

Comment thread packages/electron-service/src/commands/nativeScreenshot.ts Outdated
Comment thread packages/tauri-plugin-webdriver/src/platform/macos.rs
Comment thread e2e/lib/screenshotChecks.ts Outdated
Comment thread e2e/lib/visionAssert.ts Outdated
- Check full 8-byte PNG signature instead of first 2 bytes
- Use strict YES/NO regex and exact equality in visionAssert
- Validate spawnSync exit code and error before reading output
- Match windows by PID first, then title, to avoid cross-process capture

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread packages/electron-service/src/commands/nativeScreenshot.ts Outdated
goosewobbler and others added 2 commits May 6, 2026 14:14
- Revert dynamic import() to require() in Electron nativeScreenshot:
  CDP callFunctionOn evaluates code in Electron's CJS main-process context
  where ESM dynamic import() is unavailable. Added biome-ignore comments.
- Add embedded-only guard to Tauri native screenshot spec:
  nativeScreenshot() only works with the embedded provider; skip cleanly
  when running with official or CrabNebula providers.
- Remove assertCapturesChrome from Tauri spec:
  Tauri on macOS uses fullSizeContentView so the native screenshot and
  webview screenshot share the same pixel dimensions. OCR is the reliable
  verification layer on Tauri.
- Fix Windows PowerShell path escaping:
  out.replace(/\\/g, '\\\\') inside a PS single-quoted string produced
  double backslashes. Use forward slashes instead (valid on Windows).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Assert nativePng !== webviewPng in Tauri spec: xcap and WebKit's
  DevTools screenshot protocol produce different PNG bytes even when
  capturing the same content, so byte equality would only hold if
  nativeScreenshot incorrectly re-emits the webview screenshot.
- Add 30s timeout to visionAssert API call: without it a slow or
  unreachable Ollama endpoint would hang the merge-to-main job
  indefinitely.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread packages/electron-service/src/commands/nativeScreenshot.ts Outdated
goosewobbler and others added 17 commits May 6, 2026 16:05
$g.GetHdc() suspends GDI+'s internal state for the Graphics object.
Calling $b.Save() while the HDC is still checked out causes GDI+ to
throw "A generic error occurred in GDI+". Release the HDC and dispose
the Graphics object before saving the PNG.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- electron-service: nativeScreenshot CDP callback now returns only
  WindowInfo (bounds + hex HWND); all screencapture/PowerShell I/O
  runs in the WDIO process where ESM imports are available. Fixes
  "ReferenceError: require is not defined" in all Electron CI jobs.

- tauri-plugin-webdriver: when xcap::Window::all() PID filter returns
  empty (common in Windows CI virtual display sessions), fall back to
  matching by window title. Fixes "no window found for this process".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…path

toString('hex') emits raw LE bytes and BigInt('0x...') then
re-interprets them as BE, producing a wrong handle value.
Use readBigUInt64LE(0).toString() to decode the LE Buffer to
the actual HWND integer, pass the decimal string directly to
PowerShell's [IntPtr] cast — no further conversion needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- e2e/lib/visionAssert.ts: visionEnabled() now requires OLLAMA_BASE_URL
  in addition to an API key. Previously a job with only OLLAMA_API_KEY
  set would pass the guard, hit https://ollama.com/v1 (not an API
  server), and throw — failing the spec instead of silently skipping
  Layer 3.

- electron-service/nativeScreenshot.ts: add spawnSync timeouts.
  screencapture gets 10 s; PowerShell gets 30 s to account for Add-Type
  JIT compilation on first call.

- e2e/test/tauri/native-screenshot.spec.ts: Layer 1 now calls
  assertCapturesChrome(nativeDims, webviewDims) so a blank or webview-
  only capture that happens to be a valid PNG is caught by the height
  comparison, not just the byte-equality check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- electron/native-screenshot.spec.ts: OCR was asserting 'e2e test app'
  but the fixture heading is '🚀 Electron Builder E2E' — Tesseract
  reliably finds 'electron' and 'builder' from that text. Change
  assertion to those two tokens.

- tauri/native-screenshot.spec.ts: assertCapturesChrome (native height
  > webview height) fails on macOS because Tauri v2 uses
  fullSizeContentView by default, making the title bar an overlay rather
  than adding height. Gate the check to Windows only, where the title
  bar genuinely adds height above the content area.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
readFileSync after a successful spawnSync could throw (e.g. disk full,
race condition), leaving the temp PNG on disk. Wrap in try/finally so
unlinkSync runs regardless.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Updated the openai package from version 4.104.0 to 6.34.0 in e2e/package.json.
- Refactored OCR worker management in screenshotChecks.ts to use a promise for worker initialization, ensuring proper handling of the worker's lifecycle.
- Updated the OCR assertion in native-screenshot.spec.ts to dynamically match the app name from the APP environment variable, ensuring accurate recognition for different Electron applications.
- Cleaned up PowerShell DllImport syntax in nativeScreenshot.ts for consistency and clarity.
…omScreen

- Replaced PrintWindow with CopyFromScreen for capturing screenshots on Windows to avoid deadlocks on CI runners.
- Improved PowerShell script clarity by updating DllImport syntax and ensuring proper bitmap dimensions are calculated before saving the image.
… compositing disabled

- Introduced a conditional argument `--disable-gpu-compositing` for Electron appArgs when running in CI environments to ensure reliable screenshot capturing.
- Updated the native screenshot method to return GPU compositing status, allowing for better handling of screenshot capture based on the environment.
- Enhanced documentation to address common issues related to black window captures on Windows CI/virtual machines.
…pture

PrintWindow(WM_PRINT=0) produces a blank capture for Chromium windows
even with --disable-gpu-compositing because Chromium's HWND procedure
paints via BeginPaint/EndPaint and ignores the WM_PRINT HDC. Switch to
CopyFromScreen (GDI framebuffer read) when the flag is detected; the
software compositor BitBlt's rendered frames directly to the GDI screen
buffer, making it readable via CopyFromScreen. BringWindowToTop +
SetForegroundWindow + 200ms sleep ensure the last frame has flushed
before the read.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ow before CopyFromScreen

BringWindowToTop + SetForegroundWindow triggered a DWM recomposition
cycle on the Hyper-V virtual adapter. CopyFromScreen called during that
cycle blocks waiting for DWM (running on WARP/software D3D) to flush the
frame — which can take 30+ seconds on CI. Removing those calls leaves the
display in a stable state so CopyFromScreen completes immediately.

Also switch from GetWindowRect P/Invoke to Electron's win.getBounds()
(already in windowInfo.bounds) to eliminate the Add-Type -TypeDefinition
compilation step, which avoids antivirus scanning overhead on CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s CI

CopyFromScreen returned a valid PNG but blank content on GitHub Actions
Windows runners. The desktop redirection surface read by GetDC(NULL) is
not fully backed on Hyper-V virtual display adapters without a hardware
GPU, so the desktop framebuffer doesn't reflect window content.

Switch to GetWindowDC(hwnd) + BitBlt: with --disable-gpu-compositing,
Chromium's SoftwareOutputDeviceWin presents each frame via BitBlt to the
window's own HDC, which lives in the per-window DWM redirection bitmap.
Reading from GetWindowDC bypasses the desktop framebuffer and captures
the actual rendered content.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PowerShell parser rejects 0x00CC0020u with a ParserError at char 592 —
the trailing u is C# syntax for unsigned int and is not valid PowerShell.
PowerShell coerces the plain literal 0x00CC0020 to uint when passing to
the [DllImport]-declared BitBlt(uint dwRop) parameter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Modern Chromium presents frames via DirectComposition even with
--disable-gpu-compositing, so the content never reaches any GDI
surface — every GDI-based capture (CopyFromScreen / GetDC(NULL),
PrintWindow(WM_PRINT), BitBlt from GetWindowDC) returns a blank
PNG on Hyper-V/WARP environments like GitHub Actions.

Switch the no-GPU branch to ffmpeg's ddagrab muxer, which uses
the DXGI Desktop Duplication API to read what DWM is actually
presenting. This works on WARP (software D3D), captures DComp
output, and is preinstalled on the GitHub Actions windows-2022
runner image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ffmpeg is not preinstalled on the GitHub Actions windows-2022 image,
so nativeScreenshot's ddagrab capture path (used when Chromium runs
with --disable-gpu-compositing under WARP) failed with ENOENT. Install
via chocolatey before running the e2e suite.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ffmpeg rejected -frames:v as an input option. -framerate is an input
option (configures the ddagrab device); -frames:v and -vf are output
options and must come after -i.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
goosewobbler and others added 7 commits May 7, 2026 02:26
The plain chocolatey 'ffmpeg' package ships BtbN's essentials build,
which omits the ddagrab DXGI duplication input device. ffmpeg-full
wraps the GPL build which includes ddagrab.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both chocolatey 'ffmpeg' (essentials) and 'ffmpeg-full' (years-stale,
pre-FFmpeg-6.1) ship without the ddagrab DXGI duplication input device.
Download BtbN's master GPL build directly which is verified to include
ddagrab. Also dump '-version' and '-f ddagrab -h' so the job logs prove
the indev is present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BtbN's GPL build cross-compiled with mingw doesn't include
--enable-d3d11va, which ddagrab depends on, so the indev was missing.
GyanD's 'full' build has d3d11va enabled and ships ddagrab. Distributed
as 7z; 7-Zip is preinstalled on the windows-2022 runner image.

Also fail fast at install time if ddagrab is missing from the device
list, so we don't have to drill through e2e logs to diagnose this again.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
No widely-available Windows ffmpeg build (BtbN GPL, Gyan full,
chocolatey ffmpeg/ffmpeg-full) ships with the ddagrab indev — none
of them configure with --enable-d3d11va. Rather than building ffmpeg
ourselves on every CI run, force Chromium to use a GDI presentation
path via --disable-direct-composition, which puts the rendered
frames in the per-window DWM redirection bitmap where BitBlt from
GetWindowDC can read them.

- e2e: add --disable-direct-composition alongside --disable-gpu-compositing
  in ciAppArgs so all Windows CI runs share the same flag set.
- electron-service: revert the no-GPU branch from ffmpeg ddagrab back
  to BitBlt(GetWindowDC). Document why both flags are required.
- ci: remove the ffmpeg install step (no longer needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After exhausting GDI capture paths and ffmpeg-with-ddagrab alternatives
without finding one that works on WARP, accept the limitation: there is
no reliable nativeScreenshot capture method for Hyper-V VMs without a
real GPU. The feature still works on macOS and on Windows machines with
hardware GPU (developer machines, self-hosted runners with graphics).

- e2e: skip the spec on Windows CI alongside the existing Linux skip.
- e2e config: revert --disable-direct-composition (caused PowerShell
  ETIMEDOUT — Chromium gets stuck in a paint-pending state and BitBlt
  blocks waiting for the paint).
- electron-service: keep the GetWindowDC+BitBlt fallback so the API
  doesn't crash if anyone hits this path, but document that it returns
  blank under WARP.
- docs: explain every method we tried and why each fails, so future
  contributors don't redo this investigation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kup failure

The Windows xcap-based handler was returning a bare 'no window found
for this process' with no context, which makes CI failures impossible
to diagnose without local Windows access. Two changes:

1. Title-fallback now matches case-insensitively after trimming, and
   uses substring containment instead of exact equality. CI environments
   sometimes append marker text to the visible title (e.g. an automation
   suffix) that Tauri's own .title() doesn't include.

2. The 'no window found' error now includes our PID, the title we were
   looking for, and the (pid, title) of every window xcap returned —
   so the next CI failure tells us exactly why the match failed instead
   of forcing another round of guesswork.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diagnostic output from the previous run confirms xcap (EnumWindows) only
sees 2 windows on the GitHub Actions Windows runner — the agent itself
and one nameless window owned by a different PID. The Tauri WebView2
window is not enumerable in this VM environment, so the embedded
provider's nativeScreenshot endpoint can't find anything to capture.

Same root cause as the Electron skip: no real graphical session, just
Hyper-V/WARP. The feature still works on macOS and on Windows machines
with a real desktop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@goosewobbler
Copy link
Copy Markdown
Contributor Author

Closing in favour of using @wdio/visual-service — see VRT spike findings and PR #123 on the example repo.

The spike validated that @wdio/visual-service works out of the box across all three Electron variants and all three Tauri providers (with documented exclusions for crabnebula+macOS-CI and official+linux). It covers the in-app UI surface — which is what 99% of users actually want to test — and the existing mock APIs cover assertions about native UI behaviour (menu/tray/dialog creation). CrabNebula's macOS driver also captures the OS window with native chrome for free.

The remaining "native chrome only" use cases this PR was originally chasing (multi-window-in-one-capture, OS-level menu/tray pixel diffs) turned out to be either niche or already covered. Combined with the inability to make the capture work on Windows CI (Hyper-V/WARP can't satisfy any GDI / desktop-duplication path), the feature isn't worth the carrying cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant