Pre-submission Checklist
GPU Hardware
Intel Arc Pro B70
DRI Devices Information
0 crw-rw----+ 1 root video 226, 0 May 2 23:05 /dev/dri/card0
0 crw-rw----+ 1 root video 226, 1 May 2 22:31 /dev/dri/card1
0 crw-rw-rw- 1 root render 226, 128 May 2 22:31 /dev/dri/renderD128
0 crw-rw-rw- 1 root render 226, 129 May 2 22:31 /dev/dri/renderD129
/dev/dri/by-path:
total 0
0 lrwxrwxrwx 1 root root 8 May 2 22:31 pci-0000:43:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root 13 May 2 22:31 pci-0000:43:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root 8 May 2 22:31 pci-0000:47:00.0-card -> ../card0
0 lrwxrwxrwx 1 root root 13 May 2 22:31 pci-0000:47:00.0-render -> ../renderD129
GPU Detailed Information (lspci output)
47:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics] (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1701
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 60
NUMA node: 1
Region 0: Memory at 7e800000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at 7e000000000 (64-bit, prefetchable) [size=32G]
Expansion ROM at 82000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Intel Capabilities v1
CapA: Peg60Dis- Peg12Dis- Peg11Dis- Peg10Dis- PeLWUDis- DmiWidth=x4
EccDis- ForceEccEn- VTdDis- DmiG2Dis- PegG2Dis- DDRMaxSize=Unlimited
1NDis- CDDis- DDPCDis- X2APICEn- PDCDis- IGDis- CDID=0 CRID=0
DDROCCAP+ OCEn- DDRWrtVrefEn+ DDR3LEn+
CapB: ImguDis- OCbySSKUCap- OCbySSKUEn- SMTCap- CacheSzCap 0x0
SoftBinCap- DDR3MaxFreqWithRef100=Disabled PegG3Dis-
PkgTyp- AddGfxEn- AddGfxCap- PegX16Dis- DmiG3Dis- GmmDis-
DDR3MaxFreq=2932MHz LPDDR3En-
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee1e000 Data: 0020
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [110 v1] Null
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 32GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [220 v1] Virtual Resizable BAR
BAR 2: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration- 10BitTagReq+ IntMsgNum 0
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
IOVSta: Migration-
Initial VFs: 4, Total VFs: 4, Number of VFs: 0, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: e223
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000000000000 (64-bit, prefetchable)
Region 2: Memory at 0000000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: xe
Kernel modules: xe
43:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics] (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1701
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 58
NUMA node: 1
Region 0: Memory at 7f800000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at 7f000000000 (64-bit, prefetchable) [size=32G]
Expansion ROM at 82400000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Intel Capabilities v1
CapA: Peg60Dis- Peg12Dis- Peg11Dis- Peg10Dis- PeLWUDis- DmiWidth=x4
EccDis- ForceEccEn- VTdDis- DmiG2Dis- PegG2Dis- DDRMaxSize=Unlimited
1NDis- CDDis- DDPCDis- X2APICEn- PDCDis- IGDis- CDID=0 CRID=0
DDROCCAP+ OCEn- DDRWrtVrefEn+ DDR3LEn+
CapB: ImguDis- OCbySSKUCap- OCbySSKUEn- SMTCap- CacheSzCap 0x0
SoftBinCap- DDR3MaxFreqWithRef100=Disabled PegG3Dis-
PkgTyp- AddGfxEn- AddGfxCap- PegX16Dis- DmiG3Dis- GmmDis-
DDR3MaxFreq=2932MHz LPDDR3En-
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee1c000 Data: 0020
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [110 v1] Null
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 32GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [220 v1] Virtual Resizable BAR
BAR 2: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration- 10BitTagReq+ IntMsgNum 0
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
IOVSta: Migration-
Initial VFs: 4, Total VFs: 4, Number of VFs: 0, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: e223
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000000000000 (64-bit, prefetchable)
Region 2: Memory at 0000000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: xe
Kernel modules: xe
Driver Version
26.05.37020.3-2
Installed GPU Driver Packages
No response
Driver Installation Details
Kernel driver (xe): ships in-kernel with Linux 6.19. Installed via Manjaro's kernel manager:
sudo mhwd-kernel -i linux619
# Reboot, select 6.19 from GRUB
Userspace runtime + Level Zero (Manjaro official extra repo, Arch packages):
sudo pacman -S intel-compute-runtime level-zero-loader vulkan-intel \
intel-media-driver intel-gpu-tools linux-firmware
Resulting versions:
│ Package │ Version │
│ intel-compute-runtime (NEO) │ 26.05.37020.3-2 │
│ level-zero-loader │ 1.27.0-2 │
│ vulkan-intel (Mesa) │ 26.0.2-1 │
│ linux-firmware-meta (incl. bmg_guc_70.bin, bmg_huc.bin) │ 20260309-1 │
│ linux619 kernel │ 6.19.8-1-MANJARO │
oneAPI toolkit: installed via Intel' s official offline installer (not the Arch package, which is stuck
at 2025.0.4):
cd /tmp
wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/99f4837a-25b7-425d-a897-60af022676ea/
intel-oneapi-base-toolkit-2025.3.2.21_offline.sh
sudo sh intel-oneapi-base-toolkit-2025.3.2.21_offline.sh -a --silent --cli --eula accept
- Installed to /opt/intel/oneapi/2025.3/ (alongside an earlier 2025.0/ from pacman -S
intel-oneapi-basekit)
- Activated per-shell via source /opt/intel/oneapi/setvars.sh (setvars.sh auto-resolves latest →
2025.3)
- which icpx → /opt/intel/oneapi/compiler/2025.3/bin/icpx (version 2025.3.3.20260319)
Verification after install:
$ uname -r
6.19.8-1-MANJARO
$ lspci -k | grep -A1 -i battlemage
0000:43:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics]
Kernel driver in use: xe
0000:47:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics]
Kernel driver in use: xe
$ sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero V2, Intel(R) Graphics
[0xe223] 20.2.0 [1.14.37020]
[level_zero:gpu][level_zero:1] Intel(R) oneAPI Unified Runtime over Level-Zero V2, Intel(R) Graphics
[0xe223] 20.2.0 [1.14.37020]
No PPAs, custom builds, or out-of-tree modules. No DKMS. Stock Manjaro repos for the KMD/NEO/loader
stack + Intel' s official offline installer for oneAPI 2025.3.
Linux Distribution
Arch Linux
Other Linux Distribution
No response
Kernel Version & Boot Parameters
6.19.8-1-MANJARO
BOOT_IMAGE=/@/boot/vmlinuz-6.19-x86_64 root=UUID=3c85b363-532c-4253-bf74-194699d38c4e rw rootflags=subvol=@ quiet splash apparmor=1 security=apparmor udev.log_priority=3
xe 4222976 59
intel_vsec 28672 2 pmt_telemetry,xe
drm_ttm_helper 20480 1 xe
ttm 126976 2 drm_ttm_helper,xe
i2c_algo_bit 24576 2 igb,xe
drm_suballoc_helper 16384 1 xe
drm_buddy 32768 1 xe
video 81920 1 xe
gpu_sched 69632 1 xe
drm_gpuvm 57344 1 xe
drm_exec 12288 2 drm_gpuvm,xe
drm_gpusvm_helper 40960 1 xe
drm_display_helper 286720 1 xe
cec 98304 2 drm_display_helper,xe
Actual Behavior
Constructing a SYCL Level Zero context that spans both discrete Arc Pro B70 GPUs throws
UR_RESULT_ERROR_UNKNOWN (2147483646) and terminates. Single-device context construction succeeds,
and sycl-ls enumerates both cards correctly. Symptom shape matches #916 (regression in NEO 26.x for
multi-device L0 contexts), but on Battlemage with a different error code, suggesting failure happens
earlier — during urContextCreate itself rather than first USM allocation. Cross-reference:
intel/llm-scaler#382 (same hardware, downstream symptom inside the Intel-published llm-scaler-vllm
container).
sycl::_V1::exception: level_zero backend failed with error: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
sycl::_V1::context::context(std::vectorsycl::device, ...)
+0x32f783 (throw site, libsycl.so.8)
std::terminate
UR_LOADER_DEBUG=1 trace shows both adapters load and libze_intel_gpu.so.1 loads; no zeDevice* or
zeContext* calls appear in the trace before the abort — failure occurs inside the v2 adapter's
urContextCreate path before it issues a Level Zero call.
[INFO]: loaded adapter (libur_adapter_level_zero.so.0) from .../libur_adapter_level_zero.so.0
[INFO]: loaded adapter (libur_adapter_level_zero_v2.so.0) from .../libur_adapter_level_zero_v2.so.0
zeInit with flags value of 1
[ze_loader] Loader Version 1.27.0
[ze_loader] Loading Driver libze_intel_gpu.so.1 succeeded
[crash]
Expected Behavior
Multi-device context constructs without error; subsequent allocations / queues work, as sycl-ls
already shows both devices are valid.
Reproduction Rate
Always reproduces - 100%
Steps to Reproduce
Build llama.cpp with the SYCL backend (commit d05fe1d or newer).
With both B70s visible to the runtime, run ./llama-ls-sycl-device.
(Equivalent) inside intel/llm-scaler-vllm:latest, run python -c "import torch; torch.xpu.device_count()".
(Equivalent, minimal) construct sycl::context(std::vector<sycl::device>{dev0, dev1}) from any
SYCL hello-world.
Is this a regression?
Last Known Working Driver Version
No response
First Known Failing Driver Version
No response
API Call Logs
No response
strace Logs
No response
System Logs / dmesg Output
No response
Backtrace (if crash or hang occurred)
No response
Source Code / Reproducer
No response
Command Line / Application Details
No response
oneAPI Version (if applicable)
No response
Screenshots / Video
No response
Additional Notes
Workarounds tried that do NOT help (all crash identically)
Variable / change
Effect
ZE_P2P_DISABLE=1
no change
VLLM_SKIP_P2P_CHECK=1
no change
NEOReadDebugKeys=1 CreateMultipleRootDevices=2
no change
ONEAPI_DEVICE_SELECTOR=*:*
no change
ZE_FLAT_DEVICE_HIERARCHY={FLAT,COMPOSITE,COMBINED}
no change
UR_LOADER_USE_LEVEL_ZERO_V2=0
not recognised by loader
UR_L0_ADAPTER_VERSION=1
not recognised
SYCL_UR_USE_LEVEL_ZERO_V2=0
no change (still crashes via v2 path)
IOMMU disabled in BIOS
no change
Move card to different PCIe slot (NUMA + IOMMU groups change)
no change
All of the above combined
no change
Single-device workaround that works
ONEAPI_DEVICE_SELECTOR=level_zero:0 — application sees one card, no multi-device context
constructed, runs cleanly. Identical workaround to #916 .
Notes
Same crash reproduces inside the Intel-published intel/llm-scaler-vllm:latest container, so it is
not local environment contamination — see also Unable to run Qwen3.6-35B-A3B FP8 on 2x B70s using llm-scaler-vllm:0.14.0-b8.2 llm-scaler#382
(UR_RESULT_ERROR_OUT_OF_RESOURCES on the same 2×B70 setup, plausibly the same root cause surfacing
one call later).
Pattern (single-device OK, ONEAPI_DEVICE_SELECTOR=level_zero:0 works, broke in 26.x) matches
[GSD-12641] USM device allocation fails with UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY when Level Zero context spans 2 discrete Arc A770 GPUs (regression from 25.40 to 26.09) #916 (2× Arc A770), but the error code here is UR_RESULT_ERROR_UNKNOWN
rather than OUT_OF_DEVICE_MEMORY, suggesting the failure is in urContextCreate itself on
Battlemage, before the first USM allocation that [GSD-12641] USM device allocation fails with UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY when Level Zero context spans 2 discrete Arc A770 GPUs (regression from 25.40 to 26.09) #916 fails on.
oneAPI 2025.0.4 (Arch repo intel-oneapi-basekit) does not crash, because that release ships
only the v1 UR L0 adapter — strongly implicating the v2 adapter / NEO multi-device-context path. (A
binary built against 2025.3 against the 2025.0.4 v1 adapter has its own ABI mismatch in the alloc
path; not requesting that be fixed, just noting.)
SYCL_UR_USE_LEVEL_ZERO_V2=0 does not fall back to the v1 adapter on Xe2 — the v2 adapter is
selected unconditionally on BMG. If a real fallback knob exists, documenting it would itself be a
workaround.
Ask
Pre-submission Checklist
GPU Hardware
Intel Arc Pro B70
DRI Devices Information
0 crw-rw----+ 1 root video 226, 0 May 2 23:05 /dev/dri/card0
0 crw-rw----+ 1 root video 226, 1 May 2 22:31 /dev/dri/card1
0 crw-rw-rw- 1 root render 226, 128 May 2 22:31 /dev/dri/renderD128
0 crw-rw-rw- 1 root render 226, 129 May 2 22:31 /dev/dri/renderD129
/dev/dri/by-path:
total 0
0 lrwxrwxrwx 1 root root 8 May 2 22:31 pci-0000:43:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root 13 May 2 22:31 pci-0000:43:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root 8 May 2 22:31 pci-0000:47:00.0-card -> ../card0
0 lrwxrwxrwx 1 root root 13 May 2 22:31 pci-0000:47:00.0-render -> ../renderD129
GPU Detailed Information (lspci output)
47:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics] (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1701
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 60
NUMA node: 1
Region 0: Memory at 7e800000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at 7e000000000 (64-bit, prefetchable) [size=32G]
Expansion ROM at 82000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Intel Capabilities v1
CapA: Peg60Dis- Peg12Dis- Peg11Dis- Peg10Dis- PeLWUDis- DmiWidth=x4
EccDis- ForceEccEn- VTdDis- DmiG2Dis- PegG2Dis- DDRMaxSize=Unlimited
1NDis- CDDis- DDPCDis- X2APICEn- PDCDis- IGDis- CDID=0 CRID=0
DDROCCAP+ OCEn- DDRWrtVrefEn+ DDR3LEn+
CapB: ImguDis- OCbySSKUCap- OCbySSKUEn- SMTCap- CacheSzCap 0x0
SoftBinCap- DDR3MaxFreqWithRef100=Disabled PegG3Dis-
PkgTyp- AddGfxEn- AddGfxCap- PegX16Dis- DmiG3Dis- GmmDis-
DDR3MaxFreq=2932MHz LPDDR3En-
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee1e000 Data: 0020
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [110 v1] Null
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 32GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [220 v1] Virtual Resizable BAR
BAR 2: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration- 10BitTagReq+ IntMsgNum 0
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
IOVSta: Migration-
Initial VFs: 4, Total VFs: 4, Number of VFs: 0, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: e223
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000000000000 (64-bit, prefetchable)
Region 2: Memory at 0000000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: xe
Kernel modules: xe
43:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics] (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1701
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 58
NUMA node: 1
Region 0: Memory at 7f800000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at 7f000000000 (64-bit, prefetchable) [size=32G]
Expansion ROM at 82400000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Intel Capabilities v1
CapA: Peg60Dis- Peg12Dis- Peg11Dis- Peg10Dis- PeLWUDis- DmiWidth=x4
EccDis- ForceEccEn- VTdDis- DmiG2Dis- PegG2Dis- DDRMaxSize=Unlimited
1NDis- CDDis- DDPCDis- X2APICEn- PDCDis- IGDis- CDID=0 CRID=0
DDROCCAP+ OCEn- DDRWrtVrefEn+ DDR3LEn+
CapB: ImguDis- OCbySSKUCap- OCbySSKUEn- SMTCap- CacheSzCap 0x0
SoftBinCap- DDR3MaxFreqWithRef100=Disabled PegG3Dis-
PkgTyp- AddGfxEn- AddGfxCap- PegX16Dis- DmiG3Dis- GmmDis-
DDR3MaxFreq=2932MHz LPDDR3En-
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee1c000 Data: 0020
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [110 v1] Null
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 32GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [220 v1] Virtual Resizable BAR
BAR 2: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration- 10BitTagReq+ IntMsgNum 0
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
IOVSta: Migration-
Initial VFs: 4, Total VFs: 4, Number of VFs: 0, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: e223
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000000000000 (64-bit, prefetchable)
Region 2: Memory at 0000000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: xe
Kernel modules: xe
Driver Version
26.05.37020.3-2
Installed GPU Driver Packages
No response
Driver Installation Details
Kernel driver (
xe): ships in-kernel with Linux 6.19. Installed via Manjaro's kernel manager:Linux Distribution
Arch Linux
Other Linux Distribution
No response
Kernel Version & Boot Parameters
6.19.8-1-MANJARO
BOOT_IMAGE=/@/boot/vmlinuz-6.19-x86_64 root=UUID=3c85b363-532c-4253-bf74-194699d38c4e rw rootflags=subvol=@ quiet splash apparmor=1 security=apparmor udev.log_priority=3
xe 4222976 59
intel_vsec 28672 2 pmt_telemetry,xe
drm_ttm_helper 20480 1 xe
ttm 126976 2 drm_ttm_helper,xe
i2c_algo_bit 24576 2 igb,xe
drm_suballoc_helper 16384 1 xe
drm_buddy 32768 1 xe
video 81920 1 xe
gpu_sched 69632 1 xe
drm_gpuvm 57344 1 xe
drm_exec 12288 2 drm_gpuvm,xe
drm_gpusvm_helper 40960 1 xe
drm_display_helper 286720 1 xe
cec 98304 2 drm_display_helper,xe
Actual Behavior
Constructing a SYCL Level Zero context that spans both discrete Arc Pro B70 GPUs throws
UR_RESULT_ERROR_UNKNOWN(2147483646) and terminates. Single-device context construction succeeds,and
sycl-lsenumerates both cards correctly. Symptom shape matches #916 (regression in NEO 26.x formulti-device L0 contexts), but on Battlemage with a different error code, suggesting failure happens
earlier — during
urContextCreateitself rather than first USM allocation. Cross-reference:intel/llm-scaler#382 (same hardware, downstream symptom inside the Intel-published
llm-scaler-vllmcontainer).
sycl::_V1::exception: level_zero backend failed with error: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
sycl::_V1::context::context(std::vectorsycl::device, ...)
+0x32f783 (throw site, libsycl.so.8)
std::terminate
UR_LOADER_DEBUG=1 trace shows both adapters load and
libze_intel_gpu.so.1loads; nozeDevice*orzeContext*calls appear in the trace before the abort — failure occurs inside the v2 adapter'surContextCreatepath before it issues a Level Zero call.[INFO]: loaded adapter (libur_adapter_level_zero.so.0) from .../libur_adapter_level_zero.so.0
[INFO]: loaded adapter (libur_adapter_level_zero_v2.so.0) from .../libur_adapter_level_zero_v2.so.0
zeInit with flags value of 1
[ze_loader] Loader Version 1.27.0
[ze_loader] Loading Driver libze_intel_gpu.so.1 succeeded
[crash]
Expected Behavior
Multi-device context constructs without error; subsequent allocations / queues work, as
sycl-lsalready shows both devices are valid.
Reproduction Rate
Always reproduces - 100%
Steps to Reproduce
d05fe1dor newer)../llama-ls-sycl-device.intel/llm-scaler-vllm:latest, runpython -c "import torch; torch.xpu.device_count()".sycl::context(std::vector<sycl::device>{dev0, dev1})from anySYCL hello-world.
Is this a regression?
Last Known Working Driver Version
No response
First Known Failing Driver Version
No response
API Call Logs
No response
strace Logs
No response
System Logs / dmesg Output
No response
Backtrace (if crash or hang occurred)
No response
Source Code / Reproducer
No response
Command Line / Application Details
No response
oneAPI Version (if applicable)
No response
Screenshots / Video
No response
Additional Notes
Workarounds tried that do NOT help (all crash identically)
ZE_P2P_DISABLE=1VLLM_SKIP_P2P_CHECK=1NEOReadDebugKeys=1 CreateMultipleRootDevices=2ONEAPI_DEVICE_SELECTOR=*:*ZE_FLAT_DEVICE_HIERARCHY={FLAT,COMPOSITE,COMBINED}UR_LOADER_USE_LEVEL_ZERO_V2=0UR_L0_ADAPTER_VERSION=1SYCL_UR_USE_LEVEL_ZERO_V2=0Single-device workaround that works
ONEAPI_DEVICE_SELECTOR=level_zero:0— application sees one card, no multi-device contextconstructed, runs cleanly. Identical workaround to #916.
Notes
intel/llm-scaler-vllm:latestcontainer, so it isnot local environment contamination — see also Unable to run Qwen3.6-35B-A3B FP8 on 2x B70s using
llm-scaler-vllm:0.14.0-b8.2llm-scaler#382(
UR_RESULT_ERROR_OUT_OF_RESOURCESon the same 2×B70 setup, plausibly the same root cause surfacingone call later).
ONEAPI_DEVICE_SELECTOR=level_zero:0works, broke in 26.x) matches[GSD-12641] USM device allocation fails with UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY when Level Zero context spans 2 discrete Arc A770 GPUs (regression from 25.40 to 26.09) #916 (2× Arc A770), but the error code here is
UR_RESULT_ERROR_UNKNOWNrather than
OUT_OF_DEVICE_MEMORY, suggesting the failure is inurContextCreateitself onBattlemage, before the first USM allocation that [GSD-12641] USM device allocation fails with UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY when Level Zero context spans 2 discrete Arc A770 GPUs (regression from 25.40 to 26.09) #916 fails on.
intel-oneapi-basekit) does not crash, because that release shipsonly the v1 UR L0 adapter — strongly implicating the v2 adapter / NEO multi-device-context path. (A
binary built against 2025.3 against the 2025.0.4 v1 adapter has its own ABI mismatch in the alloc
path; not requesting that be fixed, just noting.)
SYCL_UR_USE_LEVEL_ZERO_V2=0does not fall back to the v1 adapter on Xe2 — the v2 adapter isselected unconditionally on BMG. If a real fallback knob exists, documenting it would itself be a
workaround.
Ask
a distinct Battlemage path.
multi-device codepath on BMG would unblock dual-B70 vLLM users today.