Skip to content

build(ascend): support custom kernel builds#602

Draft
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:build/ascend-custom-kernels
Draft

build(ascend): support custom kernel builds#602
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:build/ascend-custom-kernels

Conversation

@zhangyue207
Copy link
Copy Markdown
Collaborator

Summary

  • Replace the old BUILD_CUSTOM_KERNEL path with BUILD_ASCEND_CUSTOM and drive custom AscendC kernels through the standalone src/native/ascend/custom/build.sh sub-build.
  • Import the produced libno_workspace_kernel.a and link it into generated Python bindings with --whole-archive.
  • Add shared SOC_VERSION detection in src/native/ascend/custom/cmake/detect_soc.cmake.
  • Route the custom sub-build through a non-hidden source symlink and pass MAIN_SRC_DIR explicitly so CANN can find host objects even when the repo is checked out under .worktrees.

Motivation

Custom AscendC kernels are needed by the upcoming RmsNorm and AddRmsNorm Ascend implementations. Building them through a standalone sub-build avoids the CANN extract_host_stub.py path handling issue seen in scikit-build-core temporary builds while keeping the artifacts under build/build_ascend_custom/.

N/A. No issue is linked.

Type of Change

  • feat — new feature / new operator / new platform
  • fix — bug fix
  • perf — performance improvement (no behavioral change)
  • refactor — code restructuring without behavior change
  • test — adding or fixing tests only
  • docs — documentation only
  • build / ci — build system or CI configuration
  • chore — tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA N/A N/A No NVIDIA build code touched.
Iluvatar N/A N/A No Iluvatar build code touched.
MetaX N/A N/A No MetaX build code touched.
Cambricon N/A N/A No Cambricon build code touched.
Moore N/A N/A No Moore build code touched.
Ascend Yes 36 skipped in 0.07s for tests/test_rms_norm.py infiniops-ci/ascend:latest container on Ascend 910B4.
Build and pytest output
python3 -m pip install .[dev] --no-build-isolation \
  -C cmake.define.WITH_ASCEND=ON \
  -C cmake.define.AUTO_DETECT_DEVICES=OFF \
  -C cmake.define.GENERATE_PYTHON_BINDINGS=ON

Successfully built InfiniOps
Successfully installed InfiniOps-0.1.0

pytest tests/test_rms_norm.py --devices ascend -v --tb=short

============================= 36 skipped in 0.07s ==============================

Benchmark / Performance Impact

N/A. This PR changes build plumbing only.

Notes for Reviewers

  • API alignment note: no production src/base/ operator API is changed in this PR.
  • tests/test_rms_norm.py skips because this PR intentionally does not add the Ascend RmsNorm operator wrapper; the operator PR will provide the runnable focused tests.
  • The source symlink is only for the custom sub-build. It avoids a CANN helper script limitation where recursive Python glob skips host object paths containing hidden directory components.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits.
  • Branch name follows <type>/xxx-yyyy-zzzz.
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit.
  • No stray merge commits from master — the branch is rebased cleanly on top of the current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — only custom AscendC build plumbing and SOC detection are included.
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • Public API changes are intentional. No production src/base/ public API is changed.

General Code Hygiene

  • The code is self-explanatory; comments were added only where the reason is non-obvious.
  • Every modified or added file ends with a single trailing newline.
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks.
  • All comments and error messages are in English.
  • Comments and error messages are complete sentences unless the language/framework convention says otherwise.

C++ Specific

N/A. No C++ source files are changed.

Python Specific

N/A. No Python source files are changed.

Testing

  • pytest tests/test_rms_norm.py --devices ascend -v --tb=short was run in the project Ascend container.
  • Platforms not tested are marked N/A in the table because their build code is not changed.
  • New functionality is build-system behavior and is covered by the Ascend install/build command above.

Build, CI, and Tooling

  • The project builds cleanly with pip install .[dev] on the affected Ascend configuration.
  • compile_commands.json still regenerates through the build.
  • New backend/device auto-detection is not required.
  • The existing CUDA-like mutual-exclusion check is not changed.
  • git diff --check passes.
  • bash -n src/native/ascend/custom/build.sh passes.
  • No new runtime dependency was added.

Documentation

  • Inline build comments document the custom sub-build path and SOC_VERSION detection.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • Third-party code is license-compatible and attributed.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant