Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an IBV_DIRECT build/runtime switch intended to support resolving libibverbs symbols either via direct linkage or via dlopen/dlsym, and adds a runtime check to block NIC executor usage when verbs aren’t available.
Changes:
- Added
IBV_DIRECTmode plumbing (Makefile + CMake option) to toggle direct symbol binding vsdlsymresolution. - Introduced
pfn_*function pointers and updated some verbs call sites/macros to call through them. - Added
System::IbvLoaded()and a guard to error out when NIC executor is requested but verbs aren’t loaded.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| src/header/TransferBench.hpp | Adds libibverbs dynamic-loading infrastructure (pfn_*, Ibvdl()), runtime loaded-state tracking, and uses pfn_* in some verbs calls. |
| Makefile | Adds DISABLE_IBV_DIRECT flag and sets -DIBV_DIRECT=1 by default when NIC exec is enabled. |
| CMakeLists.txt | Adds ENABLE_IBV_DIRECT option and defines IBV_DIRECT=1 when enabled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| namespace { | ||
| IBV_FN(ibv_alloc_pd, ibv_pd*, (ibv_context*)); | ||
| IBV_FN(ibv_close_device, int, (ibv_context*)); | ||
| IBV_FN(ibv_create_cq, ibv_cq*, (ibv_context*, int, void*, ibv_comp_channel*, int)); | ||
| IBV_FN(ibv_create_qp, ibv_qp*, (ibv_pd*, ibv_qp_init_attr*)); | ||
| IBV_FN(ibv_dealloc_pd, int, (ibv_pd*)); | ||
| IBV_FN(ibv_dereg_mr, int, (ibv_mr*)); | ||
| IBV_FN(ibv_destroy_cq, int, (ibv_cq*)); | ||
| IBV_FN(ibv_destroy_qp, int, (ibv_qp*)); | ||
| IBV_FN(ibv_free_device_list, void, (ibv_device**)); | ||
| IBV_FN(ibv_get_device_list, ibv_device**, (int*)); | ||
| IBV_FN(ibv_get_device_name, const char*, (ibv_device*)); | ||
| IBV_FN(ibv_modify_qp, int, (ibv_qp*, ibv_qp_attr*, int)); | ||
| IBV_FN(ibv_open_device, ibv_context*, (ibv_device*)); | ||
| IBV_FN(ibv_poll_cq, int, (ibv_cq*, int, ibv_wc*)); | ||
| IBV_FN(ibv_post_send, int, (ibv_qp*, ibv_send_wr*, ibv_send_wr**)); | ||
| IBV_FN(ibv_query_device, int, (ibv_context*, ibv_device_attr*)); | ||
| IBV_FN(ibv_query_gid, int, (ibv_context*, uint8_t, int, ibv_gid*)); | ||
| IBV_FN(ibv_query_port, int, (ibv_context*, uint8_t, ibv_port_attr*)); | ||
| #ifdef HAVE_DMABUF_SUPPORT | ||
| IBV_FN(ibv_reg_dmabuf_mr, ibv_mr*, (ibv_pd*, uint64_t, size_t, uint64_t, int, int)); | ||
| #endif | ||
| IBV_FN(ibv_reg_mr, ibv_mr*, (ibv_pd*, void*, size_t, int)); | ||
| } |
There was a problem hiding this comment.
The new IBV_FN() function-pointer declarations are not guarded by NIC_EXEC_ENABLED, but IBV_FN (and ibv_* types) only exist when NIC_EXEC_ENABLED includes <infiniband/verbs.h>. As-is, builds with NIC executor disabled will fail to compile at this block. Wrap the IBV_FN declarations (and any dependent IBV_* call macros) in #ifdef NIC_EXEC_ENABLED, or provide a no-op/alternative definition when NIC support is off.
| // Should only be called with IBV_DIRECT guard | ||
| static void* Ibvdl() { | ||
| static void* ibvLibHandle = nullptr; | ||
| if (ibvLibHandle) return ibvLibHandle; | ||
|
|
||
| void *handle = dlopen("libibverbs.so.1", RTLD_NOW); | ||
| if (handle != nullptr) { |
There was a problem hiding this comment.
Ibvdl() uses dlopen/dlsym/dlclose but is compiled unconditionally under NIC_EXEC_ENABLED. When IBV_DIRECT=1, <dlfcn.h> is not included, so these calls will be undeclared and compilation will fail. Guard Ibvdl() itself (and any dl* usage) with #if !IBV_DIRECT (or include <dlfcn.h> regardless of IBV_DIRECT).
| for (size_t i = 0; i < sizeof(symbols) / sizeof(symbols[0]); i++) { | ||
| *symbols[i].ppfn = dlsym(handle, symbols[i].name); | ||
| if (*symbols[i].ppfn == nullptr) { | ||
| // Log("[WARN] Failed to load symbol %s", symbols[i].name); | ||
| dlclose(handle); | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
Ibvdl() currently treats partial symbol resolution as success: on a dlsym failure it dlclose()s the handle but still leaves handle non-null and later assigns it to the cached ibvLibHandle. This can return a closed handle and leave some pfn_* pointers non-null/invalid. Only mark the library loaded if all symbols are resolved; otherwise set handle to nullptr and reset any already-assigned function pointers.
| for (size_t i = 0; i < sizeof(symbols) / sizeof(symbols[0]); i++) { | |
| *symbols[i].ppfn = dlsym(handle, symbols[i].name); | |
| if (*symbols[i].ppfn == nullptr) { | |
| // Log("[WARN] Failed to load symbol %s", symbols[i].name); | |
| dlclose(handle); | |
| break; | |
| } | |
| } | |
| auto resetSymbols = [&symbols]() { | |
| for (size_t i = 0; i < sizeof(symbols) / sizeof(symbols[0]); i++) { | |
| *symbols[i].ppfn = nullptr; | |
| } | |
| }; | |
| bool loadedAllSymbols = true; | |
| for (size_t i = 0; i < sizeof(symbols) / sizeof(symbols[0]); i++) { | |
| *symbols[i].ppfn = dlsym(handle, symbols[i].name); | |
| if (*symbols[i].ppfn == nullptr) { | |
| // Log("[WARN] Failed to load symbol %s", symbols[i].name); | |
| loadedAllSymbols = false; | |
| break; | |
| } | |
| } | |
| if (!loadedAllSymbols) { | |
| resetSymbols(); | |
| dlclose(handle); | |
| handle = nullptr; | |
| } |
| static vector<IbvDevice> ibvDeviceList = {}; | ||
|
|
||
| #if !defined(IBV_DIRECT) | ||
| if (ibvLibHandle == nullptr) { |
There was a problem hiding this comment.
GetIbvDeviceList() checks ibvLibHandle == nullptr, but ibvLibHandle here refers to no visible variable (the System member isn’t accessible from this free function, and the Ibvdl() local static isn’t in scope). This is a compile error. Use a reachable source of truth (e.g., System::Get().IbvLoaded(), store the handle in a shared internal static with external linkage in this anonymous namespace, or call Ibvdl() and cache its result) instead of referencing an out-of-scope identifier.
| if (ibvLibHandle == nullptr) { | |
| if (Ibvdl() == nullptr) { |
| if (ibvLibHandle) { | ||
| dlclose(ibvLibHandle); | ||
| ibvLibHandle = nullptr; | ||
| } |
There was a problem hiding this comment.
System::~System() calls dlclose(ibvLibHandle) unconditionally. When IBV_DIRECT=1, <dlfcn.h> is not included, so dlclose will be undeclared and compilation can fail. Also, Ibvdl() keeps its own internal static handle, so closing only the System member can leave Ibvdl() returning a stale (already-closed) handle on future calls. Guard dlclose with #if !IBV_DIRECT and keep a single owner for the handle (either System or Ibvdl, but not both).
| if (ibvLibHandle) { | |
| dlclose(ibvLibHandle); | |
| ibvLibHandle = nullptr; | |
| } | |
| #ifdef NIC_EXEC_ENABLED | |
| #if !IBV_DIRECT | |
| // Ibvdl() owns the dynamically loaded libibverbs handle. Avoid closing a | |
| // cached copy here so future Ibvdl() calls cannot observe a stale handle. | |
| ibvLibHandle = nullptr; | |
| #endif | |
| #endif |
| LDFLAGS += -libverbs | ||
| NIC_ENABLED = 1 | ||
| ifneq ($(DISABLE_IBV_DIRECT),1) | ||
| COMMON_FLAGS += -DIBV_DIRECT=1 |
There was a problem hiding this comment.
When DISABLE_IBV_DIRECT=1 (dlsym path), the build still unconditionally links -libverbs. Any remaining direct ibv_* calls (and even an unused -libverbs without --as-needed) can keep libibverbs as a hard runtime dependency, undermining the goal of optional runtime loading. Consider omitting -libverbs when using the dlsym path (and ensure all ibv_* usage goes through pfn_*), and add -ldl for toolchains/glibc versions where dlopen is not in libc.
| LDFLAGS += -libverbs | |
| NIC_ENABLED = 1 | |
| ifneq ($(DISABLE_IBV_DIRECT),1) | |
| COMMON_FLAGS += -DIBV_DIRECT=1 | |
| NIC_ENABLED = 1 | |
| ifneq ($(DISABLE_IBV_DIRECT),1) | |
| COMMON_FLAGS += -DIBV_DIRECT=1 | |
| LDFLAGS += -libverbs | |
| else | |
| LDFLAGS += -ldl |
| target_link_libraries(TransferBench PRIVATE ${IBVERBS_LIBRARY}) | ||
| target_compile_definitions(TransferBench PRIVATE NIC_EXEC_ENABLED) | ||
| if(ENABLE_IBV_DIRECT) |
There was a problem hiding this comment.
ENABLE_IBV_DIRECT=OFF claims to resolve symbols via dlsym at runtime, but the target still links against ${IBVERBS_LIBRARY} unconditionally when IBVERBS_FOUND. This keeps libibverbs as a hard dependency and defeats optional loading. If runtime optionality is intended, avoid linking ${IBVERBS_LIBRARY} when ENABLE_IBV_DIRECT is OFF and ensure all ibv_* usage is through the loaded function pointers.
| target_link_libraries(TransferBench PRIVATE ${IBVERBS_LIBRARY}) | |
| target_compile_definitions(TransferBench PRIVATE NIC_EXEC_ENABLED) | |
| if(ENABLE_IBV_DIRECT) | |
| target_compile_definitions(TransferBench PRIVATE NIC_EXEC_ENABLED) | |
| if(ENABLE_IBV_DIRECT) | |
| target_link_libraries(TransferBench PRIVATE ${IBVERBS_LIBRARY}) |
| // Query the number of IBV devices | ||
| int numIbvDevices = 0; | ||
| ibv_device** deviceList = ibv_get_device_list(&numIbvDevices); | ||
| ibv_device** deviceList = pfn_ibv_get_device_list(&numIbvDevices); | ||
|
|
There was a problem hiding this comment.
Within GetIbvDeviceList(), most verbs calls were converted to pfn_* (e.g., pfn_ibv_get_device_list/pfn_ibv_open_device/etc.), but the function still calls ibv_free_device_list(deviceList) directly later in the same function. In the dlsym path this reintroduces a direct libibverbs symbol reference and breaks the “all calls through pfn_*” model. Switch that cleanup call to pfn_ibv_free_device_list as well.
| // Should only be called with IBV_DIRECT guard | ||
| static void* Ibvdl() { |
There was a problem hiding this comment.
The comment above Ibvdl() says it “should only be called with IBV_DIRECT guard”, but Ibvdl() is the dlsym/dlopen path and should only be used when IBV_DIRECT is disabled. Update the comment (and ideally add a compile-time guard) to avoid confusion about which mode this helper supports.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist