Skip to content

[UR][L0] Restrict USM residency to peers with enabled P2P access#21889

Open
ldorau wants to merge 4 commits intointel:syclfrom
ldorau:URL0_Memory_resident_limit_to_enabled_peers
Open

[UR][L0] Restrict USM residency to peers with enabled P2P access#21889
ldorau wants to merge 4 commits intointel:syclfrom
ldorau:URL0_Memory_resident_limit_to_enabled_peers

Conversation

@ldorau
Copy link
Copy Markdown
Contributor

@ldorau ldorau commented Apr 28, 2026

  • Skip peers with disabled P2P in makeProvider (USM pool creation)
  • Add urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp
  • Track per-device peer status in ur_device_handle_t_::peers[]
  • Update existing USM pool residency on P2P enable/disable

Co-authored-by: Łukasz Ślusarczyk <lukasz.slusarczyk at intel.com>
Co-authored-by: Lukasz Dorau <lukasz.dorau at intel.com>

@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 7e50ca5 to d0b4788 Compare April 28, 2026 07:23
@ldorau ldorau requested a review from Copilot April 28, 2026 07:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements peer-access–driven memory residency management for the Level Zero v2 adapter, wiring ext_oneapi_enable_peer_access/disable through UR to update USM pool residency, and adjusting SYCL’s peer-access API to avoid cross-platform usage.

Changes:

  • Add L0 v2 peer-access implementation that toggles per-device peer state and propagates residency updates to all tracked contexts.
  • Extend USM pool/provider plumbing to support runtime resident-device changes and add pool-manager iteration helpers.
  • Update SYCL peer-access enable/disable to validate platforms; add initial (currently placeholder) UR tests.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Adds multi-device residency tests for peer-access (currently placeholders).
unified-runtime/source/common/ur_pool_manager.hpp Adds descriptor helpers and pool-manager iteration with descriptor access.
unified-runtime/source/common/backtrace_lin.cpp Introduces a constant for max backtrace frames.
unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp New L0 v2 implementation for peer access enable/disable/info and context propagation.
unified-runtime/source/adapters/level_zero/v2/usm.hpp Exposes USM pool API to change resident devices.
unified-runtime/source/adapters/level_zero/v2/usm.cpp Updates provider creation to use peer-enabled residency model and adds residency-change plumbing.
unified-runtime/source/adapters/level_zero/v2/memory.cpp Switches P2P eligibility check to the new “enabled peers” model.
unified-runtime/source/adapters/level_zero/v2/context.hpp Adds APIs to query enabled peer relationships and to propagate residency changes.
unified-runtime/source/adapters/level_zero/v2/context.cpp Removes precomputed P2P tables; tracks contexts; adds peer-access query helpers and residency propagation.
unified-runtime/source/adapters/level_zero/usm_p2p.cpp Updates v1 behavior to log enable/disable as ignored (always enabled).
unified-runtime/source/adapters/level_zero/platform.hpp Updates platform comment to reflect v2 peer-access usage of tracked contexts.
unified-runtime/source/adapters/level_zero/platform.cpp Initializes per-device peer tables based on L0 P2P capability/properties.
unified-runtime/source/adapters/level_zero/device.hpp Adds peer-status table to devices and stream operators.
unified-runtime/source/adapters/level_zero/device.cpp Implements stream operators for device id and peer status.
unified-runtime/source/adapters/level_zero/context.cpp Minor comment adjustment around context tracking in v1.
unified-runtime/source/adapters/level_zero/CMakeLists.txt Moves usm_p2p.cpp into the v2 adapter build.
sycl/source/device.cpp Adds same-platform validation for enable/disable peer access calls.
.github/copilot-instructions.md Expands repository instructions/documentation for Copilot usage.

Comment thread sycl/source/device.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/source/common/backtrace_lin.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from d0b4788 to 24eaab3 Compare April 28, 2026 07:53
@ldorau ldorau requested a review from Copilot April 28, 2026 07:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread unified-runtime/source/common/ur_pool_manager.hpp
Comment thread unified-runtime/source/adapters/level_zero/usm_p2p.cpp
Comment thread sycl/source/device.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/memory.cpp
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch 2 times, most recently from 174e3f1 to e2e57bf Compare April 28, 2026 09:58
@ldorau ldorau requested a review from Copilot April 28, 2026 10:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/platform.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/device.hpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from e2e57bf to 8346b35 Compare April 28, 2026 12:47
@ldorau ldorau requested a review from Copilot April 28, 2026 12:49
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 8346b35 to 9e50ee8 Compare April 28, 2026 12:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/memory.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp Outdated
Comment thread unified-runtime/source/common/backtrace.hpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 9e50ee8 to 65c706f Compare April 28, 2026 13:34
@ldorau ldorau requested a review from Copilot April 28, 2026 13:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 65c706f to a2b54f4 Compare April 28, 2026 14:24
@ldorau ldorau requested a review from Copilot April 28, 2026 14:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from a2b54f4 to 08a08ab Compare April 28, 2026 14:50
@ldorau ldorau requested a review from Copilot April 28, 2026 14:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 1c04c0f to b3434ed Compare April 29, 2026 12:23
@ldorau ldorau requested a review from Copilot April 29, 2026 12:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from b3434ed to efb4316 Compare April 29, 2026 13:23
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Apr 29, 2026

Please review @intel/llvm-reviewers-runtime @intel/unified-runtime-reviewers @intel/unified-runtime-reviewers-level-zero

Copy link
Copy Markdown
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl/* changes LGTM.

Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/usm_p2p.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
@ldorau ldorau marked this pull request as draft May 4, 2026 09:19
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch 2 times, most recently from 1d79218 to 948b3b7 Compare May 6, 2026 14:14
@ldorau ldorau requested a review from kswiecicki May 6, 2026 14:15
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch 2 times, most recently from e06653a to ff8d598 Compare May 7, 2026 08:47
- Skip peers with disabled P2P in makeProvider (USM pool creation)
- Add urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp
- Track per-device peer status in ur_device_handle_t_::peers[]
- Update existing USM pool residency on P2P enable/disable

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
- Fill in three placeholder multi-device tests in memory_residency.cpp
- Tests verify P2P-driven residency: absent-on-peer without P2P,
  enable/disable state machine checks, end-to-end data transfer

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
@ldorau ldorau marked this pull request as ready for review May 8, 2026 09:27
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 8, 2026

Please review @lslusarczyk

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@intel/llvm-gatekeepers please consider merging

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 8, 2026

Do not merge yet - waiting for review from @lslusarczyk.

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 8, 2026

Additional SYCL e2e test is in the separate PR yet: #21944

@github-actions
Copy link
Copy Markdown
Contributor

@intel/llvm-gatekeepers please consider merging

Copy link
Copy Markdown
Contributor

@lslusarczyk lslusarczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few cosmetics comments, apply or ignore - as you wish

change is OK

Comment thread sycl/source/device.cpp Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both functions differ by just a type of call
please create template helper function, templated by detail::UrApiKind::urUsmP2PDisablePeerAccessExp / detail::UrApiKind::urUsmP2PEnablePeerAccessExp

use helper function in both functions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason to leave *phContext = nullptr outside 'try {' and no reason keeping other statements before ZE2UR_CALL inside 'try'

please either move try to the beginning or move try just before ZE2UR_CALL

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ldorau added 2 commits May 11, 2026 13:07
Extract common logic from ext_oneapi_enable_peer_access and
ext_oneapi_disable_peer_access into a templated p2pAccessHelper
function to avoid code duplication.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The disablePeerAccessStateMachineAndSourceAllocationPersists test was
failing intermittently because deferred frees from the preceding test
complete asynchronously, causing UR_DEVICE_INFO_GLOBAL_MEM_FREE to
report more free memory than the baseline captured at the start of the
test.

Remove the unreliable source-device free-memory assertion and the
allocation it required, keeping only the state-machine checks (disable
succeeds, double-disable returns UR_RESULT_ERROR_INVALID_OPERATION).
The source-device allocation property is already covered by
allocatingDeviceMemoryWillResultInOOM which runs first in isolation.
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 11, 2026

Do not merge it yet, please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants