Skip to content

Multi-BMG Level Zero device enumeration broken by deferred internal-engine init (regression NEO 25.40→25.44, Arc Pro B50/B60/B70) #921

@F3zz1k

Description

@F3zz1k

Pre-submission Checklist

  • I am using the latest GPU driver version (releases)
  • I have searched for similar issues and found none

GPU Hardware

Intel Arc Pro B70

DRI Devices Information

0 crw-rw----+ 1 root video 226, 0 May 2 23:05 /dev/dri/card0
0 crw-rw----+ 1 root video 226, 1 May 2 22:31 /dev/dri/card1
0 crw-rw-rw- 1 root render 226, 128 May 2 22:31 /dev/dri/renderD128
0 crw-rw-rw- 1 root render 226, 129 May 2 22:31 /dev/dri/renderD129

/dev/dri/by-path:
total 0
0 lrwxrwxrwx 1 root root 8 May 2 22:31 pci-0000:43:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root 13 May 2 22:31 pci-0000:43:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root 8 May 2 22:31 pci-0000:47:00.0-card -> ../card0
0 lrwxrwxrwx 1 root root 13 May 2 22:31 pci-0000:47:00.0-render -> ../renderD129

GPU Detailed Information (lspci output)

47:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics] (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1701
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 60
NUMA node: 1
Region 0: Memory at 7e800000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at 7e000000000 (64-bit, prefetchable) [size=32G]
Expansion ROM at 82000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Intel Capabilities v1
CapA: Peg60Dis- Peg12Dis- Peg11Dis- Peg10Dis- PeLWUDis- DmiWidth=x4
EccDis- ForceEccEn- VTdDis- DmiG2Dis- PegG2Dis- DDRMaxSize=Unlimited
1NDis- CDDis- DDPCDis- X2APICEn- PDCDis- IGDis- CDID=0 CRID=0
DDROCCAP+ OCEn- DDRWrtVrefEn+ DDR3LEn+
CapB: ImguDis- OCbySSKUCap- OCbySSKUEn- SMTCap- CacheSzCap 0x0
SoftBinCap- DDR3MaxFreqWithRef100=Disabled PegG3Dis-
PkgTyp- AddGfxEn- AddGfxCap- PegX16Dis- DmiG3Dis- GmmDis-
DDR3MaxFreq=2932MHz LPDDR3En-
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee1e000 Data: 0020
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [110 v1] Null
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 32GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [220 v1] Virtual Resizable BAR
BAR 2: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration- 10BitTagReq+ IntMsgNum 0
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
IOVSta: Migration-
Initial VFs: 4, Total VFs: 4, Number of VFs: 0, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: e223
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000000000000 (64-bit, prefetchable)
Region 2: Memory at 0000000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: xe
Kernel modules: xe

43:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics] (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1701
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 58
NUMA node: 1
Region 0: Memory at 7f800000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at 7f000000000 (64-bit, prefetchable) [size=32G]
Expansion ROM at 82400000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Intel Capabilities v1
CapA: Peg60Dis- Peg12Dis- Peg11Dis- Peg10Dis- PeLWUDis- DmiWidth=x4
EccDis- ForceEccEn- VTdDis- DmiG2Dis- PegG2Dis- DDRMaxSize=Unlimited
1NDis- CDDis- DDPCDis- X2APICEn- PDCDis- IGDis- CDID=0 CRID=0
DDROCCAP+ OCEn- DDRWrtVrefEn+ DDR3LEn+
CapB: ImguDis- OCbySSKUCap- OCbySSKUEn- SMTCap- CacheSzCap 0x0
SoftBinCap- DDR3MaxFreqWithRef100=Disabled PegG3Dis-
PkgTyp- AddGfxEn- AddGfxCap- PegX16Dis- DmiG3Dis- GmmDis-
DDR3MaxFreq=2932MHz LPDDR3En-
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee1c000 Data: 0020
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [110 v1] Null
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 32GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [220 v1] Virtual Resizable BAR
BAR 2: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration- 10BitTagReq+ IntMsgNum 0
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
IOVSta: Migration-
Initial VFs: 4, Total VFs: 4, Number of VFs: 0, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: e223
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000000000000 (64-bit, prefetchable)
Region 2: Memory at 0000000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: xe
Kernel modules: xe

Driver Version

26.05.37020.3-2

Installed GPU Driver Packages

No response

Driver Installation Details

Kernel driver (xe): ships in-kernel with Linux 6.19. Installed via Manjaro's kernel manager:

sudo mhwd-kernel -i linux619                                                                          
# Reboot, select 6.19 from GRUB                                                                       
 
Userspace runtime + Level Zero (Manjaro official extra repo, Arch packages):                          
sudo pacman -S intel-compute-runtime level-zero-loader vulkan-intel \                               
               intel-media-driver intel-gpu-tools linux-firmware                                      
                                                                                                    
Resulting versions:

│                         Package                         │     Version      │
│ intel-compute-runtime (NEO)                             │ 26.05.37020.3-2  │
│ level-zero-loader                                       │ 1.27.0-2         │
│ vulkan-intel (Mesa)                                     │ 26.0.2-1         │
│ linux-firmware-meta (incl. bmg_guc_70.bin, bmg_huc.bin) │ 20260309-1       │
│ linux619 kernel                                         │ 6.19.8-1-MANJARO │
         
                                                                                                    
oneAPI toolkit: installed via Intel's official offline installer (not the Arch package, which is stuck
 at 2025.0.4):
cd /tmp                                                                                               
wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/99f4837a-25b7-425d-a897-60af022676ea/
intel-oneapi-base-toolkit-2025.3.2.21_offline.sh                                                      
sudo sh intel-oneapi-base-toolkit-2025.3.2.21_offline.sh -a --silent --cli --eula accept              
- Installed to /opt/intel/oneapi/2025.3/ (alongside an earlier 2025.0/ from pacman -S   
intel-oneapi-basekit)                                                                                 
- Activated per-shell via source /opt/intel/oneapi/setvars.sh (setvars.sh auto-resolves latest →      
2025.3)                                                                                               
- which icpx → /opt/intel/oneapi/compiler/2025.3/bin/icpx (version 2025.3.3.20260319)                 
                                                                                                    
Verification after install:                                                                           
$ uname -r                                                                                          
6.19.8-1-MANJARO                                                                                      
                                                                                                      
$ lspci -k | grep -A1 -i battlemage
0000:43:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics]             
    Kernel driver in use: xe                                                                        
0000:47:00.0 VGA compatible controller: Intel Corporation Battlemage G31 [Intel Graphics]             
    Kernel driver in use: xe                                                                        
                                                                                                      
$ sycl-ls                                                                                             
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero V2, Intel(R) Graphics  
[0xe223] 20.2.0 [1.14.37020]                                                                          
[level_zero:gpu][level_zero:1] Intel(R) oneAPI Unified Runtime over Level-Zero V2, Intel(R) Graphics  
[0xe223] 20.2.0 [1.14.37020]                                                                        
                                                                                                      
No PPAs, custom builds, or out-of-tree modules. No DKMS. Stock Manjaro repos for the KMD/NEO/loader 
stack + Intel's official offline installer for oneAPI 2025.3.                                         

Linux Distribution

Arch Linux

Other Linux Distribution

No response

Kernel Version & Boot Parameters

6.19.8-1-MANJARO

BOOT_IMAGE=/@/boot/vmlinuz-6.19-x86_64 root=UUID=3c85b363-532c-4253-bf74-194699d38c4e rw rootflags=subvol=@ quiet splash apparmor=1 security=apparmor udev.log_priority=3

xe 4222976 59
intel_vsec 28672 2 pmt_telemetry,xe
drm_ttm_helper 20480 1 xe
ttm 126976 2 drm_ttm_helper,xe
i2c_algo_bit 24576 2 igb,xe
drm_suballoc_helper 16384 1 xe
drm_buddy 32768 1 xe
video 81920 1 xe
gpu_sched 69632 1 xe
drm_gpuvm 57344 1 xe
drm_exec 12288 2 drm_gpuvm,xe
drm_gpusvm_helper 40960 1 xe
drm_display_helper 286720 1 xe
cec 98304 2 drm_display_helper,xe

Actual Behavior

Constructing a SYCL Level Zero context that spans both discrete Arc Pro B70 GPUs throws
UR_RESULT_ERROR_UNKNOWN (2147483646) and terminates. Single-device context construction succeeds,
and sycl-ls enumerates both cards correctly. Symptom shape matches #916 (regression in NEO 26.x for
multi-device L0 contexts), but on Battlemage with a different error code, suggesting failure happens
earlier — during urContextCreate itself rather than first USM allocation. Cross-reference:
intel/llm-scaler#382 (same hardware, downstream symptom inside the Intel-published llm-scaler-vllm
container).

sycl::_V1::exception: level_zero backend failed with error: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
sycl::_V1::context::context(std::vectorsycl::device, ...)
+0x32f783 (throw site, libsycl.so.8)
std::terminate

UR_LOADER_DEBUG=1 trace shows both adapters load and libze_intel_gpu.so.1 loads; no zeDevice* or
zeContext* calls appear in the trace before the abort
— failure occurs inside the v2 adapter's
urContextCreate path before it issues a Level Zero call.

[INFO]: loaded adapter (libur_adapter_level_zero.so.0) from .../libur_adapter_level_zero.so.0
[INFO]: loaded adapter (libur_adapter_level_zero_v2.so.0) from .../libur_adapter_level_zero_v2.so.0
zeInit with flags value of 1
[ze_loader] Loader Version 1.27.0
[ze_loader] Loading Driver libze_intel_gpu.so.1 succeeded
[crash]

Expected Behavior

Multi-device context constructs without error; subsequent allocations / queues work, as sycl-ls
already shows both devices are valid.

Reproduction Rate

Always reproduces - 100%

Steps to Reproduce

  1. Build llama.cpp with the SYCL backend (commit d05fe1d or newer).
  2. With both B70s visible to the runtime, run ./llama-ls-sycl-device.
  3. (Equivalent) inside intel/llm-scaler-vllm:latest, run python -c "import torch; torch.xpu.device_count()".
  4. (Equivalent, minimal) construct sycl::context(std::vector<sycl::device>{dev0, dev1}) from any
    SYCL hello-world.

Is this a regression?

  • Yes, this is a regression - functionality that previously worked is now broken

Last Known Working Driver Version

No response

First Known Failing Driver Version

No response

API Call Logs

No response

strace Logs

No response

System Logs / dmesg Output

No response

Backtrace (if crash or hang occurred)

No response

Source Code / Reproducer

No response

Command Line / Application Details

No response

oneAPI Version (if applicable)

No response

Screenshots / Video

No response

Additional Notes

Workarounds tried that do NOT help (all crash identically)

Variable / change Effect
ZE_P2P_DISABLE=1 no change
VLLM_SKIP_P2P_CHECK=1 no change
NEOReadDebugKeys=1 CreateMultipleRootDevices=2 no change
ONEAPI_DEVICE_SELECTOR=*:* no change
ZE_FLAT_DEVICE_HIERARCHY={FLAT,COMPOSITE,COMBINED} no change
UR_LOADER_USE_LEVEL_ZERO_V2=0 not recognised by loader
UR_L0_ADAPTER_VERSION=1 not recognised
SYCL_UR_USE_LEVEL_ZERO_V2=0 no change (still crashes via v2 path)
IOMMU disabled in BIOS no change
Move card to different PCIe slot (NUMA + IOMMU groups change) no change
All of the above combined no change

Single-device workaround that works

ONEAPI_DEVICE_SELECTOR=level_zero:0 — application sees one card, no multi-device context
constructed, runs cleanly. Identical workaround to #916.

Notes

Ask

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS: LinuxIssue specific to Linux distributions (Ubuntu, Fedora, RHEL, etc.)Type: BugGeneral bug report, unexpected behavior or crash

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions