fix(build): honor --ep; require explicit device+ep in resolve_device; support cross-compile#947
Conversation
…ilability `winml build` dropped --ep when auto-generating a config — it passed only device/precision to generate_build_config, so the requested EP never reached config generation. With an explicit --device that isn't present, config gen then resolved the device with ep=None and failed immediately with a device-availability error, silently ignoring --ep (e.g. `winml build -m microsoft/resnet-50 --ep openvino --device npu --compile`). - Forward `ep` into generate_build_config so the EP shapes the generated config. - Add an early, compile-gated EP-availability check after config validation: a compile build physically instantiates the EP, so an unavailable EP/device now fails fast (before export) instead of deep in the compile stage. A no-compile build only produces a portable, analyzed ONNX and may target a device absent on the build machine (cross-compile), so it is allowed to proceed. - Stop the optimize-stage device resolution from availability-failing: it only needs a concrete device name for the analyzer's rule-data lookup, so it no longer passes ep or rejects an absent explicit device. Verified end-to-end: `--compile --device npu --ep openvino` (no NPU) now fails before export with the EP respected; `--no-compile` proceeds to export the portable model. Related to the EP-threading family (#931).
resolve_device(device, *, ep) now takes both device and ep as required args, so an omission is a call error instead of a silent ep=None / device="auto". Several callers still relied on the old defaults (latent TypeErrors) — pass ep=None explicitly where no EP filter is intended: - analyzer: auto-device resolution for rule-file selection - build: the ep-is-None auto-EP resolution - eval: device resolution - compile stage: device resolution (EP comes from ep_config) - test_device.py: resolve_device call sites Also keeps the build command forwarding --ep to generate_build_config, and drops the compile-availability gate / optimize-stage tweak from earlier on this branch — requiring explicit device+ep is the cleaner, centralized enforcement.
build resolved device/EP via resolve_device(), which availability-cross-checks: an explicit device absent on the machine (e.g. --device npu on a non-NPU box) raised even for --no-compile — which only produces a portable, analyzed ONNX and never needs the device present. Switch both call sites to resolve_check_device_ep(): with an explicit device+ep it validates statically (no availability probe), so a cross-target --no-compile build now runs export -> optimize -> analyze -> quantize to completion. The auto-select (ep is None) block still availability-checks (resolve_check_device_ep routes to resolve_device when ep is None) and now gets device + available EPs from a single call. Verified: `winml build -m microsoft/resnet-50 --ep openvino --device npu --no-compile` (no NPU present) completes end-to-end.
timenick
left a comment
There was a problem hiding this comment.
Solid, well-scoped fix. The root cause (build dropping --ep into generate_build_config) is addressed and the surrounding cleanup is consistent: making device + ep required on resolve_device turns the silent ep=None / device="auto" defaults into call errors, and every caller that relied on those defaults is updated to pass both explicitly. I checked all resolve_device / resolve_check_device_ep call sites at this head — they all forward ep, so there’s no latent TypeError from a missed caller. The auto-EP block’s available_eps[0] is safe since resolve_device only returns a device with ≥ 1 available EP (else it raises), and routing build through resolve_check_device_ep for the static device+ep validation is the right call for the cross-compile scenarios. Nice regression test for the --ep forwarding. LGTM.
🤖 Generated with GitHub Copilot CLI
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall this is a well-motivated three-part fix: (1) forward --ep into generate_build_config, (2) make both params required in resolve_device/resolve_check_device_ep to turn silent bugs into call errors, (3) route the optimize stage through resolve_check_device_ep so cross-target --no-compile builds work. All 138 tests pass.
General: In the static path of resolve_check_device_ep (explicit device + explicit ep), the second tuple element is list(EP_SUPPORTED_DEVICES[ep_name]) -- the EP's statically declared supported devices. In the dynamic path it returns runtime-filtered devices. All callers use _ so no active bug, but future callers will see inconsistent semantics. Consider documenting or unifying the two paths.
|
|
||
| def resolve_check_device_ep( | ||
| *, device: str = "auto", ep: EPNameOrAlias | None = None | ||
| *, device: str, ep: EPNameOrAlias | None |
There was a problem hiding this comment.
The docstring says 'Ideal for commands that do not need the device + ep actually exists on the system.' But static validation (no availability probe) only applies when BOTH device is explicit (not auto) AND ep is non-None. When either is absent the function delegates to resolve_device which does a full availability cross-check. A caller passing --device npu without --ep on a machine without NPU will still get a ValueError. The docstring should clarify this for users relying on this for cross-compile.
|
|
||
| try: | ||
| resolved_device, _ = _resolve_device(device=device) | ||
| resolved_device, _, available_eps = resolve_check_device_ep(device=device, ep=ep) |
There was a problem hiding this comment.
Minor nit: inside the 'if ep is None:' block, passing ep=ep (which is provably None here) is technically correct but slightly opaque. Writing ep=None explicitly would remove any ambiguity.
| """--ep forwarding into config generation + the compile EP-availability gate.""" | ||
|
|
||
| def _base_args(self, cfg: str, tmp_path: Path) -> list[str]: | ||
| return ["-c", cfg, "-m", "microsoft/resnet-50", "-o", str(tmp_path / "out")] |
There was a problem hiding this comment.
_base_args is never called by any test in this class -- dead helper. If intended for future tests add a TODO comment; otherwise remove it. The args it builds also mix -c and -m which would not match the auto-config path the class is testing.
| ["-m", "microsoft/resnet-50", "--ep", "openvino", "-o", str(tmp_path / "out")] | ||
| ) | ||
| assert result.exit_code == 0, result.output | ||
| assert mock_gen.call_args.kwargs["ep"] == "openvino" |
There was a problem hiding this comment.
Missing regression test for the cross-compile static path: the PR's headline feature (--device npu --ep openvino --no-compile completing on a machine without NPU) is verified empirically but has no unit test. A test that mocks resolve_check_device_ep to return a static result and confirms the optimize stage proceeds would guard against future regressions (e.g. accidentally restoring resolve_device in the optimize path).
Summary
Several related EP/device-handling fixes in
winml build, found afterwinml build --ep openvino --device npu --compilesilently ignored--ep.1.
winml buildnow honors--epThe command passed only
device/precisiontogenerate_build_config— neverep— so the requested EP was silently dropped (and, with an explicit absent--device, it failed for the wrong reason). Nowepis forwarded.2.
resolve_devicerequires explicitdevice+epBoth are now required (no
device="auto"default;epwas already required), so an omission is a call error instead of a silentep=None/device="auto". Several callers relied on the old defaults (latentTypeErrors + 8 failingtest_device.pytests) — fixed to pass both explicitly:ep is None) blockThis closes the class of bug behind #931.
3. Cross-compile support via
resolve_check_device_epwinml buildnow resolves throughresolve_check_device_ep, which validates an explicit device+ep statically (no availability cross-check). A build can target a device that isn't on the build machine; the outcome depends on whether compilation is actually needed:--no-compile, target device absent--compile, EP has no EPContext (e.g. MIGraphX, DML)--compile, EP has EPContext, device absent (e.g. OpenVINO+NPU)Verification
Empirical builds of
microsoft/resnet-50on a machine without an NPU confirm all three rows (exit 0 / 0 / 1, failing only at the compile stage for row 3). Unit suites green:tests/unit/{commands,sysinfo,compiler,analyze,config}(2500+ passed);test_build.py+test_device.pyupdated for the new resolver contract.Follow-up
--compile-on-absent-device case fail fast (before export) instead of at the compile stage. A gate was intentionally deferred so it can be conditioned correctly onconfig.compile(must not regress cross-compile / no-EPContext builds).Related: #931 (fixed in #941).