[UR][L0] Restrict USM residency to peers with enabled P2P access#21889
[UR][L0] Restrict USM residency to peers with enabled P2P access#21889ldorau wants to merge 4 commits intointel:syclfrom
Conversation
7e50ca5 to
d0b4788
Compare
There was a problem hiding this comment.
Pull request overview
Implements peer-access–driven memory residency management for the Level Zero v2 adapter, wiring ext_oneapi_enable_peer_access/disable through UR to update USM pool residency, and adjusting SYCL’s peer-access API to avoid cross-platform usage.
Changes:
- Add L0 v2 peer-access implementation that toggles per-device peer state and propagates residency updates to all tracked contexts.
- Extend USM pool/provider plumbing to support runtime resident-device changes and add pool-manager iteration helpers.
- Update SYCL peer-access enable/disable to validate platforms; add initial (currently placeholder) UR tests.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp | Adds multi-device residency tests for peer-access (currently placeholders). |
| unified-runtime/source/common/ur_pool_manager.hpp | Adds descriptor helpers and pool-manager iteration with descriptor access. |
| unified-runtime/source/common/backtrace_lin.cpp | Introduces a constant for max backtrace frames. |
| unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp | New L0 v2 implementation for peer access enable/disable/info and context propagation. |
| unified-runtime/source/adapters/level_zero/v2/usm.hpp | Exposes USM pool API to change resident devices. |
| unified-runtime/source/adapters/level_zero/v2/usm.cpp | Updates provider creation to use peer-enabled residency model and adds residency-change plumbing. |
| unified-runtime/source/adapters/level_zero/v2/memory.cpp | Switches P2P eligibility check to the new “enabled peers” model. |
| unified-runtime/source/adapters/level_zero/v2/context.hpp | Adds APIs to query enabled peer relationships and to propagate residency changes. |
| unified-runtime/source/adapters/level_zero/v2/context.cpp | Removes precomputed P2P tables; tracks contexts; adds peer-access query helpers and residency propagation. |
| unified-runtime/source/adapters/level_zero/usm_p2p.cpp | Updates v1 behavior to log enable/disable as ignored (always enabled). |
| unified-runtime/source/adapters/level_zero/platform.hpp | Updates platform comment to reflect v2 peer-access usage of tracked contexts. |
| unified-runtime/source/adapters/level_zero/platform.cpp | Initializes per-device peer tables based on L0 P2P capability/properties. |
| unified-runtime/source/adapters/level_zero/device.hpp | Adds peer-status table to devices and stream operators. |
| unified-runtime/source/adapters/level_zero/device.cpp | Implements stream operators for device id and peer status. |
| unified-runtime/source/adapters/level_zero/context.cpp | Minor comment adjustment around context tracking in v1. |
| unified-runtime/source/adapters/level_zero/CMakeLists.txt | Moves usm_p2p.cpp into the v2 adapter build. |
| sycl/source/device.cpp | Adds same-platform validation for enable/disable peer access calls. |
| .github/copilot-instructions.md | Expands repository instructions/documentation for Copilot usage. |
d0b4788 to
24eaab3
Compare
174e3f1 to
e2e57bf
Compare
e2e57bf to
8346b35
Compare
8346b35 to
9e50ee8
Compare
9e50ee8 to
65c706f
Compare
65c706f to
a2b54f4
Compare
a2b54f4 to
08a08ab
Compare
1c04c0f to
b3434ed
Compare
b3434ed to
efb4316
Compare
|
Please review @intel/llvm-reviewers-runtime @intel/unified-runtime-reviewers @intel/unified-runtime-reviewers-level-zero |
1d79218 to
948b3b7
Compare
e06653a to
ff8d598
Compare
- Skip peers with disabled P2P in makeProvider (USM pool creation) - Add urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp - Track per-device peer status in ur_device_handle_t_::peers[] - Update existing USM pool residency on P2P enable/disable Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
- Fill in three placeholder multi-device tests in memory_residency.cpp - Tests verify P2P-driven residency: absent-on-peer without P2P, enable/disable state machine checks, end-to-end data transfer Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
|
Please review @lslusarczyk |
|
@intel/llvm-gatekeepers please consider merging |
|
Do not merge yet - waiting for review from @lslusarczyk. |
|
Additional SYCL e2e test is in the separate PR yet: #21944 |
|
@intel/llvm-gatekeepers please consider merging |
lslusarczyk
left a comment
There was a problem hiding this comment.
a few cosmetics comments, apply or ignore - as you wish
change is OK
There was a problem hiding this comment.
both functions differ by just a type of call
please create template helper function, templated by detail::UrApiKind::urUsmP2PDisablePeerAccessExp / detail::UrApiKind::urUsmP2PEnablePeerAccessExp
use helper function in both functions
There was a problem hiding this comment.
no reason to leave *phContext = nullptr outside 'try {' and no reason keeping other statements before ZE2UR_CALL inside 'try'
please either move try to the beginning or move try just before ZE2UR_CALL
Extract common logic from ext_oneapi_enable_peer_access and ext_oneapi_disable_peer_access into a templated p2pAccessHelper function to avoid code duplication. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The disablePeerAccessStateMachineAndSourceAllocationPersists test was failing intermittently because deferred frees from the preceding test complete asynchronously, causing UR_DEVICE_INFO_GLOBAL_MEM_FREE to report more free memory than the baseline captured at the start of the test. Remove the unreliable source-device free-memory assertion and the allocation it required, keeping only the state-machine checks (disable succeeds, double-disable returns UR_RESULT_ERROR_INVALID_OPERATION). The source-device allocation property is already covered by allocatingDeviceMemoryWillResultInOOM which runs first in isolation.
|
Do not merge it yet, please |
Co-authored-by: Łukasz Ślusarczyk <lukasz.slusarczyk at intel.com>
Co-authored-by: Lukasz Dorau <lukasz.dorau at intel.com>