Skip to content

[CI][Packaging] Cibuildwheel-based multi-platform wheel publishing#19641

Closed
tlopex wants to merge 43 commits into
apache:mainfrom
tlopex:pypi
Closed

[CI][Packaging] Cibuildwheel-based multi-platform wheel publishing#19641
tlopex wants to merge 43 commits into
apache:mainfrom
tlopex:pypi

Conversation

@tlopex
Copy link
Copy Markdown
Member

@tlopex tlopex commented May 29, 2026

This adds a reproducible cibuildwheel-based pipeline to build and publish TVM binary wheels across platforms, with LLVM linked statically and an optional CUDA runtime, so users can pip install a working TVM without a source build.

tlopex and others added 13 commits May 29, 2026 14:39
Linking the full LLVM static archives into libtvm_compiler.so on Linux
produces a library large enough to trigger a GNU ld (binutils)
R_X86_64_GOTPCRELX relaxation bug. conda-forge LLVM is built with
-fno-plt, so calls are emitted as indirect GOT calls; ld relaxes one of
them to a direct call with an incorrect displacement that lands in
read-only data instead of the intended function. The result is a runtime
SIGSEGV inside llvm::X86Subtarget while building the code-generation pass
pipeline (e.g. tvm.compile(target="llvm")).

This reproduces with a freshly linked, unrepaired manylinux build
(gcc-toolset-14 + GNU ld 2.41), so it is the linker relaxation -- not
auditwheel, LLVM 22, or any TVM source change. The same objects linked by
a different ld (or with relaxation disabled) work correctly.

Pass -Wl,--no-relax when linking tvm_compiler with LLVM on Linux to keep
the GOT-indirect sequences and avoid the miscompilation. It is harmless
when LLVM is linked dynamically. See binutils bug ld/25754.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the debugging instrumentation added while diagnosing the Linux wheel
SIGSEGV now that the root cause is fixed (linker --no-relax):

- verify_tvm_linux.sh: remove the gdb backtrace wrapper and the readelf/nm
  symbol dumps; it now just clears dev library-path overrides and runs the
  cross-platform verifier directly.
- publish workflow + wheel action: remove the debug_symbols option (which
  produced unstripped -g wheels and is unsuitable for publishing).

The legitimate verification (import, runtime-from-wheel and static-LLVM
checks, and the TIRX/Relax compile smoke tests) is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Replace the select_build_matrix job (which generated the matrix from
  embedded Python and the build_target input) with a static
  strategy.matrix.include in build_wheels. The workflow always builds all
  four platform wheels, which is what publishing requires anyway, so the
  dynamic-matrix indirection and the build_target input/validation are
  removed.
- Single-source the LLVM toolchain version: it is now set once in
  LLVM_VERSION and referenced by the cache key and the conda install steps,
  instead of being hardcoded in three places.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Windows repair command (rewrite_wheel.py) passed --distribution-name
but not --distribution-version, so Windows wheels kept the pyproject
version while Linux/macOS wheels got the override. When publishing to
TestPyPI with a version override this produced a mismatched, stale-version
Windows wheel that collided with a previously used filename and failed the
upload. Pass --distribution-version too (empty is a no-op).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
find_include_path() only searched source-tree-relative locations
(<pkg>/../include, ...), so it could not find the headers bundled in an
installed wheel (at <pkg>/include), and it looked for the tvm-ffi/dlpack
headers under 3rdparty/ which wheels do not ship. As a result target="c"
and Module.export_library() (compiling generated C/C++ sources) failed
from an installed wheel with "Cannot find the source directory" or a
missing <tvm/ffi/c_api.h>.

Search "." as well (the wheel layout), and fall back to the include
directories reported by the installed apache-tvm-ffi package
(tvm_ffi.libinfo) for the tvm-ffi and dlpack headers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_transform_lower_tirx.py imported and called Simplify, but the pass
was renamed to StmtSimplify (the only simplify pass exported from
tvm.tirx.transform). The stale name caused a collection-time ImportError.
Use StmtSimplify; the file now collects and all 32 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tlopex tlopex marked this pull request as draft May 29, 2026 19:24
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new wheel packaging flow for TVM, adding GitHub composite actions and helper scripts to build, repair, validate, and verify LLVM-enabled and CUDA-enabled Python wheels. It also updates CMake configurations to handle relative RPATHs and work around a linker relaxation bug, and adjusts TVM's Python library and RPC server to better support wheel-installed environments. The review feedback suggests copying the create_system attribute in rewrite_wheel.py to preserve ZIP platform metadata, and adding a defensive check in verify_tvm_install.py to ensure the TVM runtime library is loaded correctly before accessing its path.

Comment on lines +56 to +63
def _copy_info(info: zipfile.ZipInfo, filename: str) -> zipfile.ZipInfo:
copied = zipfile.ZipInfo(filename=filename, date_time=info.date_time)
copied.compress_type = info.compress_type
copied.comment = info.comment
copied.extra = info.extra
copied.internal_attr = info.internal_attr
copied.external_attr = info.external_attr
return copied
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _copy_info function does not copy the create_system attribute from the original ZipInfo object. This can cause the rewritten wheel to lose its original system platform metadata (e.g., defaulting to Windows/MS-DOS 0 instead of Unix 3), which can affect file permissions when the wheel is unpacked and makes the build non-reproducible depending on the host OS running the script.

Suggested change
def _copy_info(info: zipfile.ZipInfo, filename: str) -> zipfile.ZipInfo:
copied = zipfile.ZipInfo(filename=filename, date_time=info.date_time)
copied.compress_type = info.compress_type
copied.comment = info.comment
copied.extra = info.extra
copied.internal_attr = info.internal_attr
copied.external_attr = info.external_attr
return copied
def _copy_info(info: zipfile.ZipInfo, filename: str) -> zipfile.ZipInfo:
copied = zipfile.ZipInfo(filename=filename, date_time=info.date_time)
copied.create_system = info.create_system
copied.compress_type = info.compress_type
copied.comment = info.comment
copied.extra = info.extra
copied.internal_attr = info.internal_attr
copied.external_attr = info.external_attr
return copied

def _assert_loaded_runtime_from_wheel(libdir: Path, runtime_candidates: list[Path]) -> None:
import tvm.base as tvm_base # pylint: disable=import-outside-toplevel

loaded_runtime = Path(tvm_base._LIB_RUNTIME._name).resolve() # pylint: disable=protected-access
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent potential AttributeError or TypeError if the TVM runtime library fails to load or is initialized as None, add a defensive check before accessing _LIB_RUNTIME._name. This aligns with the defensive checks used elsewhere in the file (e.g., _log_tvm_ffi_details).

Suggested change
loaded_runtime = Path(tvm_base._LIB_RUNTIME._name).resolve() # pylint: disable=protected-access
if tvm_base._LIB_RUNTIME is None or not hasattr(tvm_base._LIB_RUNTIME, "_name"):
raise RuntimeError("TVM runtime library was not loaded correctly or is missing '_name' attribute")
loaded_runtime = Path(tvm_base._LIB_RUNTIME._name).resolve() # pylint: disable=protected-access

Copy link
Copy Markdown
Contributor

@MasterJH5574 MasterJH5574 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished a round on .github/workflows/publish_wheel.yml

fi

- name: Checkout source
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use tag instead of hash. Same for other places.

default: "none"
type: choice
options:
- none
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "none" mean? If it means a build dryrun, maybe add a brief comment about it.

Comment on lines +180 to +184
permissions:
actions: read
contents: read
id-token: write
attestations: write
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the permissions necessary?

Comment on lines +192 to +217
- name: Check wheel sizes
shell: bash
run: |
set -euo pipefail
limit_bytes=100000000
shopt -s nullglob
wheels=(dist/*.whl)
if [[ "${#wheels[@]}" -eq 0 ]]; then
echo "No wheel artifacts found under dist/" >&2
exit 1
fi
if [[ "${#wheels[@]}" -ne 4 ]]; then
echo "Expected 4 wheel artifacts, found ${#wheels[@]}" >&2
printf '%s\n' "${wheels[@]}" >&2
exit 1
fi
failed=0
for wheel in "${wheels[@]}"; do
size="$(stat -c '%s' "$wheel")"
printf '%s %s bytes\n' "$wheel" "$size"
if (( size > limit_bytes )); then
echo "Wheel exceeds 100 MB PyPI/TestPyPI upload limit: ${wheel}" >&2
failed=1
fi
done
exit "$failed"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to manual check? I assume it will automatically fail at the publish step if the wheel is larger than 100MB?

Comment on lines +219 to +222
- name: Generate artifact attestation for wheels
uses: actions/attest-build-provenance@v1
with:
subject-path: dist/*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what this is used for.

Comment on lines +224 to +237
- name: Publish package distributions to TestPyPI
if: ${{ inputs.publish_repository == 'testpypi' }}
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # v1.13.0
with:
attestations: true
verbose: true
repository-url: https://test.pypi.org/legacy/

- name: Publish package distributions to PyPI
if: ${{ inputs.publish_repository == 'pypi' }}
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # v1.13.0
with:
attestations: true
verbose: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's possible to merge these two branches.

Comment on lines +248 to +259
- name: Check out source
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.tag }}
submodules: recursive
fetch-depth: 0
fetch-tags: true

- uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
with:
name: tvm-wheel-linux-x86_64-manylinux_2_28
path: wheelhouse
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need checkout and artifact download here?

Comment on lines +266 to +286
- name: Verify package from TestPyPI
if: ${{ inputs.publish_repository == 'testpypi' }}
env:
TVM_PYTHON: python
TVM_TEST_INDEX_URL: https://test.pypi.org/simple/
TVM_EXTRA_INDEX_URL: https://pypi.org/simple
TVM_EXPECT_LLVM_ENABLED: "1"
TVM_EXPECT_STATIC_LLVM: "1"
TVM_EXPECT_CUDA_RUNTIME: "1"
run: ci/scripts/package/tvm_wheel_helper.sh verify-pypi

- name: Verify package from PyPI
if: ${{ inputs.publish_repository == 'pypi' }}
env:
TVM_PYTHON: python
TVM_TEST_INDEX_URL: https://pypi.org/simple/
TVM_EXTRA_INDEX_URL: https://pypi.org/simple
TVM_EXPECT_LLVM_ENABLED: "1"
TVM_EXPECT_STATIC_LLVM: "1"
TVM_EXPECT_CUDA_RUNTIME: "1"
run: ci/scripts/package/tvm_wheel_helper.sh verify-pypi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's possible to merge these two as well.

Comment on lines +144 to +152
- name: Build CUDA runtime
id: build_cuda
uses: ./.github/actions/build-cuda
with:
arch: ${{ matrix.arch }}
linux_image: ${{ matrix.linux_image }}
linux_image_tag: ${{ matrix.linux_image_tag }}
cuda_architectures: ${{ inputs.cuda_architectures }}
include_cuda_runtime: ${{ matrix.include_cuda_runtime }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip when cuda is not needed.

Comment on lines +37 to +40
include_cuda_runtime:
description: "Set to true to build the CUDA runtime library"
required: false
default: "false"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should always be true? If that's the case, we can remove this param?

Comment on lines +53 to +69
- name: Detect CUDA inputs
id: cuda_inputs
shell: bash -l {0}
env:
INPUT_INCLUDE_CUDA_RUNTIME: ${{ inputs.include_cuda_runtime }}
run: |
set -eux
include_cuda_runtime="$(printf '%s' "${INPUT_INCLUDE_CUDA_RUNTIME}" | tr '[:upper:]' '[:lower:]')"
case "${include_cuda_runtime}" in
1|true|yes|on) include_cuda_runtime=1 ;;
0|false|no|off) include_cuda_runtime=0 ;;
*)
echo "include_cuda_runtime must be a boolean value" >&2
exit 1
;;
esac
echo "include_cuda_runtime=${include_cuda_runtime}" >> "${GITHUB_OUTPUT}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check unnecessary?

Comment on lines +85 to +90
- name: Reject non-Linux CUDA runtime builds
if: runner.os != 'Linux' && steps.cuda_inputs.outputs.include_cuda_runtime == '1'
shell: bash -l {0}
run: |
echo "CUDA runtime wheels are only enabled on Linux in this workflow" >&2
exit 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be dropped?

Comment on lines +92 to +117
- name: Report CUDA runtime output
id: cuda_runtime
shell: bash -l {0}
env:
INCLUDE_CUDA_RUNTIME: ${{ steps.cuda_inputs.outputs.include_cuda_runtime }}
TVM_CUDA_BUILD_DIR: ${{ runner.temp }}/tvm-wheel-cuda
TVM_CUDA_RUNTIME_PATH: ""
run: |
set -eux
if [[ "${INCLUDE_CUDA_RUNTIME}" != "1" ]]; then
echo "path=" >> "${GITHUB_OUTPUT}"
exit 0
fi
cuda_runtime="$(ci/scripts/package/tvm_wheel_helper.sh cuda-path)"
if [[ -z "${cuda_runtime}" ]]; then
echo "CUDA runtime build did not produce libtvm_runtime_cuda.so" >&2
exit 1
fi
case "${cuda_runtime}" in
"${TVM_CUDA_BUILD_DIR}"/*) ;;
*)
echo "CUDA runtime path is outside the expected build directory: ${cuda_runtime}" >&2
exit 1
;;
esac
echo "path=${cuda_runtime}" >> "${GITHUB_OUTPUT}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this report step? Or the "Build CUDA runtime in manylinux" step above already reports necessary info (e.g., success, failure, or failure reason)

@tlopex tlopex closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants