Skip to content

Torch-NPU主干和上游社区集成测试#127

Open
kerer-ai wants to merge 134 commits into
Ascend:masterfrom
kerer-ai:dev_master
Open

Torch-NPU主干和上游社区集成测试#127
kerer-ai wants to merge 134 commits into
Ascend:masterfrom
kerer-ai:dev_master

Conversation

@kerer-ai

Copy link
Copy Markdown
Collaborator
  • Add Dockerfile for pytorch-npu-builder with CANN 9.0.0-beta.2
  • Add build-docker-image.yml for scheduled/manual image build
  • Add _build.yml for PyTorch and torch_npu compilation
  • Add _collect.yml for pytest case collection and sharding
  • Add _test.yml for test execution with subprocess isolation
  • Add npu-full-test.yml as main orchestration workflow
  • Add scripts: collect_all_cases.py, run_npu_test_shard.py, generate_report.py
  • Add CLAUDE.md with complete design documentation

Key features:

  • Docker image pass-through via needs.build.outputs.docker-image
  • Case-level sharding for load balancing
  • Per-case subprocess execution for crash isolation
  • Distributed (serial) vs Regular (32 workers) test execution

- Add Dockerfile for pytorch-npu-builder with CANN 9.0.0-beta.2
- Add build-docker-image.yml for scheduled/manual image build
- Add _build.yml for PyTorch and torch_npu compilation
- Add _collect.yml for pytest case collection and sharding
- Add _test.yml for test execution with subprocess isolation
- Add npu-full-test.yml as main orchestration workflow
- Add scripts: collect_all_cases.py, run_npu_test_shard.py, generate_report.py
- Add CLAUDE.md with complete design documentation

Key features:
- Docker image pass-through via needs.build.outputs.docker-image
- Case-level sharding for load balancing
- Per-case subprocess execution for crash isolation
- Distributed (serial) vs Regular (32 workers) test execution

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

- Rename npu-full-test.yml to "PyTorch NPU Full Test(main 分支)"
- Add pull_request trigger to build-docker-image.yml for Dockerfile and workflow changes

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

- Change runner to ubuntu-22.04-arm (supports Docker)
- Skip login and push for pull_request events (only build test)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

- Convert repository_owner to lowercase using tr command
- Docker requires all image names to be lowercase

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

- Change base image from ghcr.io/pytorch/manylinux-builder (private)
  to quay.io/pypa/manylinux_2_28_aarch64 (public, matches PyTorch main)
- Add necessary OS packages matching PyTorch's Dockerfile
- Set Python 3.11 from manylinux as default (PATH=/opt/python/cp311-cp311/bin)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

- Add gcc-toolset-13 for modern GCC toolchain (required for PyTorch build)
- Add language environment variables (LC_ALL, LANG, LANGUAGE)
- Add git safe.directory config for bind-mounted repos
- Remove pytest dependencies (install at test time, not in build image)
- Match PyTorch's .ci/docker/manywheel/Dockerfile_2_28_aarch64

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

CANN toolkit installation requires Python environment.
Previously Python PATH was set after CANN installation,
causing cann-ge-compiler install failure (exit code 4).

Now Python 3.11 PATH is configured before CANN installation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

NNAL package (atb) requires CANN environment variables to be set.
Add 'source /usr/local/Ascend/cann/set_env.sh' before nnal install.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Use 'ascend' (lowercase) instead of github.repository_owner
to match the actual image name pushed to ghcr.io.

Docker requires image names to be lowercase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Trigger when workflow files, scripts, or docker files are modified:
- .github/workflows/** (workflow files)
- .github/scripts/** (Python scripts)
- .github/docker/** (Dockerfile)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

In reusable workflows (workflow_call), the env context is not
available in container configuration. Replace env.REGISTRY with
hardcoded ghcr.io value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ghcr.io images are private by default and require authentication.
Add credentials configuration to all reusable workflows that pull
the pytorch-npu-builder image:
- _build.yml
- _collect.yml
- _test.yml

Uses github.actor and GITHUB_TOKEN for authentication.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Also remove related PR conditional checks since they are no longer needed:
- Remove 'if: github.event_name != 'pull_request' from login step
- Change 'push: ${{ github.event_name != 'pull_request' }}' to 'push: true'

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Use GitHub API to set the container package visibility to public
after pushing the image. This allows anyone to pull the image
without authentication.

PATCH /orgs/{org}/packages/container/{package_name}
with {"visibility":"public"}

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

- Add pull_request trigger for Dockerfile and workflow changes
- PR builds only test build, not push (push: ${{ github.event_name != 'pull_request' }})
- Skip visibility step for PR events

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

1. 统一所有 workflow 的 pip 缓存配置
   - build workflow: 4 处 PIP_CACHE_DIR 设置
   - collect/regular/dist/custom workflow: 添加 PIP_CACHE_DIR 和 Cache pip action
   - 所有 pip install 步骤使用缓存加速下载

2. 添加 torchvision 安装(忽略版本检查)
   - 使用 --no-deps 绕过 torch 版本绑定问题
   - 解决 onnx/test_models 系列测试的依赖缺失

3. 更新黑名单配置 case_paths_ci.yml
   - 新增 torch_openreg 测试(需单独编译,上游默认排除)
   - 新增 dynamo/test_torchrec(fbgemm-gpu 无 ARM64 支持)
   - 新增 onnx/exporter/test_hf_models_e2e(transformers 依赖)
   - 详细注释说明每个黑名单的原因

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

- run_npu_test_shard.py: 移除--case-paths-config参数,discover调用时传None
- _torch-npu-upstream-collect.yml: 移除collect_all_cases.py的--case-paths-config传参
- _torch-npu-upstream-test-dist.yml: 删除冗余的error logs上传步骤
- _torch-npu-upstream-test-regular.yml: 删除冗余的error logs上传步骤

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

主要修改:
- 新增setup-npu-test-env action:封装checkout、cache、安装torch/torch_npu、测试依赖等公共步骤
- 简化4个子workflow:collect/custom/dist/regular统一调用action
- 优化collect_all_cases.py日志:display_name提前计算避免重复逻辑
- 简化run_npu_test_shard.py:移除废弃的shard discovery模式,保留cases-json和test-files模式
- distributed测试串行执行通过max_workers=1实现

参数清理:
- action只保留实际使用的参数:python_version、torch_wheel_artifact、torch_npu_wheel_artifact、pytorch_src_artifact
- 删除无意义的pytorch_version、cache_key_prefix、patch_log_suffix参数

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

问题:收集逻辑只检查"::"存在,错误收集了非测试用例:
- @torch.library.register_fake("torchvision::nms") 包含::但不是测试用例

修复:添加严格过滤条件:
1. 必须包含 ".py::" (Python测试文件标识)
2. 不能以 "@" 开头 (装饰器/注册符号)
3. 不能以 "<" 开头 (pytest收集标记)
4. 不能包含 "(" (函数调用语法)

影响文件:
- collect_all_cases.py: collect_cases_for_file()
- run_npu_test_shard.py: collect_test_cases()

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

pytest --collect-only -q 模式输出标准格式:
- 每行一个完整nodeid: test_file.py::TestClass::test_method
- 最后有统计信息: "X tests collected"

简化解析规则:
1. 跳过空行
2. 跳过包含 "collected"/"selected" 的统计行
3. 跳过以 "=" 开头的分隔线
4. 只检查 ".py::" 确保是Python测试文件

移除之前的复杂字符串匹配(检查@、<、(等),
因为-q模式输出已经很干净,不需要防御性过滤。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

问题:collect_test_cases()与collect_all_cases.py的collect_cases_for_file()功能重复

修改:
1. 删除collect_test_cases()函数(110行)
2. 删除run_tests_with_concurrent_isolation()函数(214行)
3. 添加import collect_all_cases
4. --test-files模式改为:
   - 调用collect_all_cases.collect_all_cases()收集用例
   - 构建CaseExecutionTask列表
   - 调用run_tests_with_tasks_concurrent()执行

统一执行流程:
- --cases-json模式:预收集用例 → run_tests_with_tasks_concurrent()
- --test-files模式:现场收集用例 → run_tests_with_tasks_concurrent()

代码减少291行(1584→1293)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@kerer-ai , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
bd329c89 Add workflow to verify pulling q... the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants