feat(torch): add generated operator bases by voltjia · Pull Request #622 · InfiniTensor/InfiniOps

voltjia · 2026-05-27T17:43:18Z

Summary

Add generated C++ operator base headers under src/base/, regenerated from the current PyTorch codegen output merged in PR feat(torch): expose optional codegen parameters #619.
Keep the generated base files flat under src/base/; no generator, test, wrapper, build-system, or CI files are changed in this PR.
Omit generated bases whose ATen schemas are known to vary across installed PyTorch builds, so they can remain generated by the local codegen environment instead of frozen as stable public bases.

Motivation

The PyTorch codegen work needs a checked-in operator-base layer that matches the current generator behavior, including optional-parameter overload support from PR #619. This PR contains only the generated public base headers, making the downstream base layer reviewable separately from the generator changes.

Closes # N/A — no dedicated issue.

Type of Change

feat — new feature / new operator / new platform.
N/A — fix — bug fix.
N/A — perf — performance improvement (no behavioral change).
N/A — refactor — code restructuring without behavior change.
N/A — test — adding or fixing tests only.
N/A — docs — documentation only.
N/A — build / ci — build system or CI configuration.
N/A — chore — tooling, formatting, or other non-code changes.
N/A — Breaking change.

Platforms Affected

CPU (WITH_CPU)
NVIDIA (WITH_NVIDIA)
Iluvatar (WITH_ILUVATAR)
MetaX (WITH_METAX)
Cambricon (WITH_CAMBRICON)
Moore (WITH_MOORE)
Ascend (WITH_ASCEND)
PyTorch C++ bindings (WITH_TORCH)
N/A — Build system / CMake / CI; no build-system or CI files are changed.
Python bindings / user-facing API

Test Results on Supported Platforms

All rows used a full bare python3 -m pytest -v run, without tests/, --devices, or -n. Each build regenerated PyTorch operator sources first, installed with WITH_TORCH=ON, and smoke-checked representative generated PyTorch operators after install. Build times are from the pip install phase recorded by the local validation runner; pytest times are from the timed pytest command; total time is build + pytest.

Platform	Built	`pytest` Result	Build	Pytest	Total	Notes / Hardware
NVIDIA	Yes	`9279 passed, 8565 skipped`	1029s	357s	1386s	Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Iluvatar	Yes	`7777 passed, 8549 skipped`	836s	574s	1410s	Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
MetaX	Yes	`8771 passed, 7555 skipped`	1436s	418s	1854s	Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Cambricon	Yes	`5974 passed, 9968 skipped`	2418s	1017s	3435s	Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Moore	Yes	`8537 passed, 7807 skipped`	2282s	636s	2918s	Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Ascend	Yes	`7453 passed, 8831 skipped`	1141s	589s	1730s	Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.

Full `pytest` output (optional)

NVIDIA:    9279 passed, 8565 skipped in 352.70s (0:05:52)
Iluvatar:  7777 passed, 8549 skipped in 569.47s (0:09:29)
MetaX:     8771 passed, 7555 skipped in 400.83s (0:06:40)
Cambricon: 5974 passed, 9968 skipped in 1008.43s (0:16:48)
Moore:     8537 passed, 7807 skipped in 628.43s (0:10:28)
Ascend:    7453 passed, 8831 skipped in 572.66s (0:09:32)

The test counts are expected to match the PyTorch codegen coverage from PR #619 because this PR only checks in generated base headers from that generator. The only observed difference from the latest PR #619 table is on Ascend: one tests/test_torch_ops.py inner case is skipped instead of passed:

tests/test_torch_ops.py::test_op[npu-dtype1-0.01-0.01-13x4-inner]

The generated inner base, binding metadata, and PyTorch backend source are identical between PR #619's generated output and this PR's checked-in base. Ascend still builds successfully, smoke checks show the PyTorch slot active for Ascend, and the full pytest run exits successfully. This is recorded as a non-blocking skip-count drift rather than a build or execution regression.

Benchmark / Performance Impact

N/A — this PR checks in generated base headers only. The table above records build and test wall time for each platform.

Notes for Reviewers

This PR is rebased on the latest master, after PR #619 was merged. The generated base files are intentionally checked in as generator output. File paths are kept flat under src/base/.

The generated bases intentionally omit src/base/all.h, src/base/any.h, and src/base/internal_scaled_mm.h in this PR because their ATen schemas vary across installed PyTorch builds; those forms are better regenerated by the local codegen environment instead of frozen as stable public bases. The table above was rerun after the author-only amend at commit a81406b. At the time of this update, GitHub CI still reports environment-related failures on Iluvatar and Moore: Iluvatar auto-detects multiple mutually exclusive GPU backends during CI configuration, and Moore CI fails while importing infini.ops with an OpenMP runtime symbol lookup error. The manual full-platform runs above build and test successfully on both platforms with the platform-specific validation environment.

Checklist

Title, Branch, and Commits

PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
Each commit message follows Conventional Commits.
Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
No stray merge commits from master — the branch is rebased cleanly on top of current master.
No fixup! / squash! / wip commits remain.

Scope and Design

Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
No unrelated formatting churn that would obscure the diff.
Public API changes are intentional, documented in this PR, and reflected in affected callers/tests.

General Code Hygiene

The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
All comments and error messages are in English (CONTRIBUTING.md §Code/General).
Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific

Python Specific

N/A — This PR does not modify Python files.

Testing

pytest was run locally on every supported platform that this PR can affect, and the results are recorded in the "Test Results" table above (CONTRIBUTING.md §Pull Requests).
N/A — Every supported platform was tested.
New functionality is covered by PR feat(torch): expose optional codegen parameters #619's generated PyTorch operator test harness and the all-platform full pytest runs recorded above.
N/A — This PR does not add Python tests.
N/A — This PR does not add flaky parallel-only tests.
N/A — This is not a bug-fix-only PR.

Build, CI, and Tooling

The project builds cleanly from a fresh directory on every supported platform listed above.
compile_commands.json still regenerates through the existing CMake/scikit-build configuration path.
N/A — No new backend / device was added.
Only one CUDA-like GPU backend is selectable at a time — the existing mutual-exclusion check in CMakeLists.txt is not broken.
clang-format.yml and ruff.yml are green; this PR only changes generated C++ headers under src/base/.
N/A — No new runtime dependency was added.

Documentation

N/A — No user workflow, build flag, or developer workflow documentation changed.
New generated operator bases are documented through their checked-in header signatures.
N/A — No user-visible breaking change is introduced.

Security and Safety

No secrets, access tokens, internal URLs, customer data, IP addresses, or personal hardware identifiers have been committed or included in this PR description.
N/A — No third-party code was added.
No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

voltjia · 2026-06-03T01:34:31Z

请 @crapromer 或 @wooway777 初审，@Ziminli 终审。

wooway777

闭着眼睛点的，说了今天必须搞完。要背锅的时候请告知。谢谢🦀🦀

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 9444f9c to 9864ff2 Compare May 27, 2026 19:51

voltjia force-pushed the feat/torch-operator-bases branch from 33e537c to ffc3d68 Compare May 27, 2026 19:54

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 9864ff2 to c0db647 Compare May 27, 2026 20:27

voltjia force-pushed the feat/torch-operator-bases branch from ffc3d68 to d89ce8e Compare May 27, 2026 20:28

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from c0db647 to 3e3e319 Compare May 27, 2026 21:15

voltjia force-pushed the feat/torch-operator-bases branch 2 times, most recently from fe50963 to c5a3a38 Compare May 27, 2026 21:51

voltjia mentioned this pull request May 27, 2026

feat(torch): expose optional codegen parameters #619

Merged

56 tasks

voltjia force-pushed the feat/torch-operator-bases branch from c5a3a38 to 312cd42 Compare May 27, 2026 22:25

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 3e3e319 to 2a5d6af Compare May 27, 2026 23:33

voltjia force-pushed the feat/torch-operator-bases branch 2 times, most recently from 34db70e to f5f6a15 Compare May 28, 2026 03:39

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from d41f01d to 9f591db Compare May 28, 2026 03:55

voltjia force-pushed the feat/torch-operator-bases branch from f5f6a15 to ee42c3c Compare May 28, 2026 03:56

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 9f591db to 70094a1 Compare May 28, 2026 07:41

voltjia force-pushed the feat/torch-operator-bases branch from ee42c3c to 9299ffb Compare May 28, 2026 07:44

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 70094a1 to 87e86ab Compare May 28, 2026 08:02

voltjia force-pushed the feat/torch-operator-bases branch from 9299ffb to 1c61728 Compare May 28, 2026 08:04

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 87e86ab to e62e2b2 Compare June 1, 2026 07:55

voltjia force-pushed the feat/torch-operator-bases branch from 1c61728 to 63a85dc Compare June 1, 2026 07:55

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from e62e2b2 to 48a3f2c Compare June 1, 2026 08:17

voltjia force-pushed the feat/torch-operator-bases branch from 63a85dc to 846a477 Compare June 1, 2026 08:17

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 48a3f2c to 5582e8a Compare June 1, 2026 10:53

voltjia force-pushed the feat/torch-operator-bases branch from 846a477 to 19cb477 Compare June 1, 2026 11:02

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 5582e8a to 4e3cd58 Compare June 2, 2026 08:22

voltjia force-pushed the feat/torch-operator-bases branch 3 times, most recently from d343d02 to e0e57a9 Compare June 2, 2026 14:01

voltjia changed the base branch from feat/torch-codegen-optional-overloads to master June 2, 2026 14:03

voltjia requested a review from a team June 2, 2026 14:03

feat(torch): add generated operator bases

a81406b

voltjia force-pushed the feat/torch-operator-bases branch from e0e57a9 to a81406b Compare June 2, 2026 23:15

voltjia requested review from Ziminli, crapromer and wooway777 June 3, 2026 01:34

wooway777 approved these changes Jun 3, 2026

View reviewed changes

Ziminli approved these changes Jun 3, 2026

View reviewed changes

voltjia merged commit 316ad6b into master Jun 3, 2026
14 of 18 checks passed

voltjia deleted the feat/torch-operator-bases branch June 3, 2026 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(torch): add generated operator bases#622

feat(torch): add generated operator bases#622
voltjia merged 1 commit into
masterfrom
feat/torch-operator-bases

voltjia commented May 27, 2026 •

edited

Loading

Uh oh!

voltjia commented Jun 3, 2026 •

edited

Loading

Uh oh!

wooway777 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

voltjia commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Type of Change

Platforms Affected

Test Results on Supported Platforms

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene

C++ Specific

Python Specific

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

voltjia commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wooway777 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

voltjia commented May 27, 2026 •

edited

Loading

voltjia commented Jun 3, 2026 •

edited

Loading