Skip to content

feat(torch): add generated operator bases#622

Merged
voltjia merged 1 commit into
masterfrom
feat/torch-operator-bases
Jun 3, 2026
Merged

feat(torch): add generated operator bases#622
voltjia merged 1 commit into
masterfrom
feat/torch-operator-bases

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 27, 2026

Summary

  • Add generated C++ operator base headers under src/base/, regenerated from the current PyTorch codegen output merged in PR feat(torch): expose optional codegen parameters #619.
  • Keep the generated base files flat under src/base/; no generator, test, wrapper, build-system, or CI files are changed in this PR.
  • Omit generated bases whose ATen schemas are known to vary across installed PyTorch builds, so they can remain generated by the local codegen environment instead of frozen as stable public bases.

Motivation

The PyTorch codegen work needs a checked-in operator-base layer that matches the current generator behavior, including optional-parameter overload support from PR #619. This PR contains only the generated public base headers, making the downstream base layer reviewable separately from the generator changes.

Closes # N/A — no dedicated issue.

Type of Change

  • feat — new feature / new operator / new platform.
  • N/A — fix — bug fix.
  • N/A — perf — performance improvement (no behavioral change).
  • N/A — refactor — code restructuring without behavior change.
  • N/A — test — adding or fixing tests only.
  • N/A — docs — documentation only.
  • N/A — build / ci — build system or CI configuration.
  • N/A — chore — tooling, formatting, or other non-code changes.
  • N/A — Breaking change.

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • N/A — Build system / CMake / CI; no build-system or CI files are changed.
  • Python bindings / user-facing API

Test Results on Supported Platforms

All rows used a full bare python3 -m pytest -v run, without tests/, --devices, or -n. Each build regenerated PyTorch operator sources first, installed with WITH_TORCH=ON, and smoke-checked representative generated PyTorch operators after install. Build times are from the pip install phase recorded by the local validation runner; pytest times are from the timed pytest command; total time is build + pytest.

Platform Built pytest Result Build Pytest Total Notes / Hardware
NVIDIA Yes 9279 passed, 8565 skipped 1029s 357s 1386s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Iluvatar Yes 7777 passed, 8549 skipped 836s 574s 1410s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
MetaX Yes 8771 passed, 7555 skipped 1436s 418s 1854s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Cambricon Yes 5974 passed, 9968 skipped 2418s 1017s 3435s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Moore Yes 8537 passed, 7807 skipped 2282s 636s 2918s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Ascend Yes 7453 passed, 8831 skipped 1141s 589s 1730s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Full `pytest` output (optional)
NVIDIA:    9279 passed, 8565 skipped in 352.70s (0:05:52)
Iluvatar:  7777 passed, 8549 skipped in 569.47s (0:09:29)
MetaX:     8771 passed, 7555 skipped in 400.83s (0:06:40)
Cambricon: 5974 passed, 9968 skipped in 1008.43s (0:16:48)
Moore:     8537 passed, 7807 skipped in 628.43s (0:10:28)
Ascend:    7453 passed, 8831 skipped in 572.66s (0:09:32)

The test counts are expected to match the PyTorch codegen coverage from PR #619 because this PR only checks in generated base headers from that generator. The only observed difference from the latest PR #619 table is on Ascend: one tests/test_torch_ops.py inner case is skipped instead of passed:

tests/test_torch_ops.py::test_op[npu-dtype1-0.01-0.01-13x4-inner]

The generated inner base, binding metadata, and PyTorch backend source are identical between PR #619's generated output and this PR's checked-in base. Ascend still builds successfully, smoke checks show the PyTorch slot active for Ascend, and the full pytest run exits successfully. This is recorded as a non-blocking skip-count drift rather than a build or execution regression.

Benchmark / Performance Impact

N/A — this PR checks in generated base headers only. The table above records build and test wall time for each platform.

Notes for Reviewers

This PR is rebased on the latest master, after PR #619 was merged. The generated base files are intentionally checked in as generator output. File paths are kept flat under src/base/.

The generated bases intentionally omit src/base/all.h, src/base/any.h, and src/base/internal_scaled_mm.h in this PR because their ATen schemas vary across installed PyTorch builds; those forms are better regenerated by the local codegen environment instead of frozen as stable public bases. The table above was rerun after the author-only amend at commit a81406b. At the time of this update, GitHub CI still reports environment-related failures on Iluvatar and Moore: Iluvatar auto-detects multiple mutually exclusive GPU backends during CI configuration, and Moore CI fails while importing infini.ops with an OpenMP runtime symbol lookup error. The manual full-platform runs above build and test successfully on both platforms with the platform-specific validation environment.


Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
  • No stray merge commits from master — the branch is rebased cleanly on top of current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • Public API changes are intentional, documented in this PR, and reflected in affected callers/tests.

General Code Hygiene

  • The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
  • All comments and error messages are in English (CONTRIBUTING.md §Code/General).
  • Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific

  • Code follows the Google C++ Style Guide strictly.
  • clang-format --dry-run --Werror passes on all modified src/base/*.h files.
  • clang-tidy concerns (per .clang-tidy) have been reviewed — no new warnings beyond the existing baseline.
  • Operator parameter order is inputs first, outputs last; attributes are between inputs and outputs; naming follows PyTorch → ONNX → CUDA API precedence (CONTRIBUTING.md §C++).
  • No exceptions are thrown. Error paths use assert with messages that include at least __FILE__, __LINE__, and __func__ (CONTRIBUTING.md §C++).
  • N/A — No new C++ error or warning message was added.
  • N/A — Kernel files are named correctly; this PR adds operator bases, not kernels.
  • N/A — Kernel and kernel launcher separation is unchanged; this PR adds operator bases, not kernels.
  • Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
  • Exactly one blank line between classes, between classes and functions, and between functions (CONTRIBUTING.md §C++).
  • Exactly one blank line between members within a class (CONTRIBUTING.md §C++).
  • Exactly one blank line before and after the contents of a namespace (CONTRIBUTING.md §C++).
  • New operators added via src/base/<op>.h (inheriting Operator<Op>) with generated PyTorch backends provided by PR feat(torch): expose optional codegen parameters #619 (CONTRIBUTING.md §Adding an Operator).
  • No raw new/delete; RAII / smart pointers / existing allocators are used.

Python Specific

  • N/A — This PR does not modify Python files.

Testing

  • pytest was run locally on every supported platform that this PR can affect, and the results are recorded in the "Test Results" table above (CONTRIBUTING.md §Pull Requests).
  • N/A — Every supported platform was tested.
  • New functionality is covered by PR feat(torch): expose optional codegen parameters #619's generated PyTorch operator test harness and the all-platform full pytest runs recorded above.
  • N/A — This PR does not add Python tests.
  • N/A — This PR does not add flaky parallel-only tests.
  • N/A — This is not a bug-fix-only PR.

Build, CI, and Tooling

  • The project builds cleanly from a fresh directory on every supported platform listed above.
  • compile_commands.json still regenerates through the existing CMake/scikit-build configuration path.
  • N/A — No new backend / device was added.
  • Only one CUDA-like GPU backend is selectable at a time — the existing mutual-exclusion check in CMakeLists.txt is not broken.
  • clang-format.yml and ruff.yml are green; this PR only changes generated C++ headers under src/base/.
  • N/A — No new runtime dependency was added.

Documentation

  • N/A — No user workflow, build flag, or developer workflow documentation changed.
  • New generated operator bases are documented through their checked-in header signatures.
  • N/A — No user-visible breaking change is introduced.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, IP addresses, or personal hardware identifiers have been committed or included in this PR description.
  • N/A — No third-party code was added.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 9444f9c to 9864ff2 Compare May 27, 2026 19:51
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 33e537c to ffc3d68 Compare May 27, 2026 19:54
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 9864ff2 to c0db647 Compare May 27, 2026 20:27
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from ffc3d68 to d89ce8e Compare May 27, 2026 20:28
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from c0db647 to 3e3e319 Compare May 27, 2026 21:15
@voltjia voltjia force-pushed the feat/torch-operator-bases branch 2 times, most recently from fe50963 to c5a3a38 Compare May 27, 2026 21:51
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from c5a3a38 to 312cd42 Compare May 27, 2026 22:25
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 3e3e319 to 2a5d6af Compare May 27, 2026 23:33
@voltjia voltjia force-pushed the feat/torch-operator-bases branch 2 times, most recently from 34db70e to f5f6a15 Compare May 28, 2026 03:39
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from d41f01d to 9f591db Compare May 28, 2026 03:55
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from f5f6a15 to ee42c3c Compare May 28, 2026 03:56
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 9f591db to 70094a1 Compare May 28, 2026 07:41
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from ee42c3c to 9299ffb Compare May 28, 2026 07:44
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 70094a1 to 87e86ab Compare May 28, 2026 08:02
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 9299ffb to 1c61728 Compare May 28, 2026 08:04
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 87e86ab to e62e2b2 Compare June 1, 2026 07:55
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 1c61728 to 63a85dc Compare June 1, 2026 07:55
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from e62e2b2 to 48a3f2c Compare June 1, 2026 08:17
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 63a85dc to 846a477 Compare June 1, 2026 08:17
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 48a3f2c to 5582e8a Compare June 1, 2026 10:53
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 846a477 to 19cb477 Compare June 1, 2026 11:02
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 5582e8a to 4e3cd58 Compare June 2, 2026 08:22
@voltjia voltjia force-pushed the feat/torch-operator-bases branch 3 times, most recently from d343d02 to e0e57a9 Compare June 2, 2026 14:01
@voltjia voltjia changed the base branch from feat/torch-codegen-optional-overloads to master June 2, 2026 14:03
@voltjia voltjia requested a review from a team June 2, 2026 14:03
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from e0e57a9 to a81406b Compare June 2, 2026 23:15
@voltjia
Copy link
Copy Markdown
Collaborator Author

voltjia commented Jun 3, 2026

@crapromer@wooway777 初审,@Ziminli 终审。

Copy link
Copy Markdown

@wooway777 wooway777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

闭着眼睛点的,说了今天必须搞完。要背锅的时候请告知。谢谢🦀🦀

@voltjia voltjia merged commit 316ad6b into master Jun 3, 2026
14 of 18 checks passed
@voltjia voltjia deleted the feat/torch-operator-bases branch June 3, 2026 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants