Skip to content

test(pt_expt): speed up AOTInductor UT compilation and fix hardcoded device#5394

Open
wanghan-iapcm wants to merge 4 commits intodeepmodeling:masterfrom
wanghan-iapcm:fix-pt2-default-device
Open

test(pt_expt): speed up AOTInductor UT compilation and fix hardcoded device#5394
wanghan-iapcm wants to merge 4 commits intodeepmodeling:masterfrom
wanghan-iapcm:fix-pt2-default-device

Conversation

@wanghan-iapcm
Copy link
Copy Markdown
Collaborator

@wanghan-iapcm wanghan-iapcm commented Apr 12, 2026

Summary

  • Set fast inductor configs (max_fusion_size=8, epilogue_fusion=False, pattern_matcher=False, package_cpp_only=True, compile_wrapper_opt_level=O0) in conftest.py to reduce .pt2 compile time by ~50% in unit tests. Tests validate correctness only, so runtime performance is irrelevant.
  • Replace hardcoded torch.set_default_device("cuda:9999999") in test_deep_eval.py with torch.get_default_device() save/restore, making the device workaround resilient to changes in the fake device value.

Test plan

  • Existing pt_expt tests pass (.pt2 and .pte inference, spin, fparam/aparam, change-bias, freeze)

Summary by CodeRabbit

  • Tests
    • Improved test infrastructure configuration for AOTInductor/Inductor behavior.
    • Enhanced device handling in test setup to better preserve and restore PyTorch's default device settings.

Note: These are internal testing improvements with no direct impact on end-user functionality.

Han Wang added 2 commits April 12, 2026 17:32
AOTInductor's lowering code creates tensors without explicit device=,
inheriting any active torch.set_default_device. This caused compilation
failures when tests/pt/__init__.py set a fake CUDA device. Move the
set_default_device(None) guard into _deserialize_to_file_pt2 so all
callers (tests, dp freeze, dp compress) are protected, and remove the
12 scattered workarounds from test files.
Set inductor configs in conftest to skip expensive C++ optimizations
during .pt2 compilation: max_fusion_size=8, epilogue_fusion=False,
pattern_matcher=False, package_cpp_only=True, compile_opt_level=O0.
Tests only validate correctness so runtime performance is irrelevant.
Cuts per-model compile time from ~50s to ~30s.
@wanghan-iapcm wanghan-iapcm requested a review from njzjz April 12, 2026 14:36
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 84af2be0-b628-4c8a-a7a7-088d66958837

📥 Commits

Reviewing files that changed from the base of the PR and between 016ea5b and 8050d47.

📒 Files selected for processing (1)
  • source/tests/pt_expt/infer/test_deep_eval.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/pt_expt/infer/test_deep_eval.py

📝 Walkthrough

Walkthrough

Test infrastructure updates: conftest.py adds AOTInductor/Inductor configuration overrides for unit tests; test_deep_eval.py modifies test setup to preserve and restore PyTorch's default device around AOTInductor compilation operations instead of hard-coding device restoration.

Changes

Cohort / File(s) Summary
AOTInductor Configuration
source/tests/pt_expt/conftest.py
Added import of torch._inductor.config and configured multiple Inductor/AOTInductor settings including reduced max_fusion_size, disabled epilogue_fusion and pattern_matcher, and AOTInductor-specific flags (package_cpp_only, compile_wrapper_opt_level).
Device Preservation in Tests
source/tests/pt_expt/infer/test_deep_eval.py
Updated test setup to capture and restore PyTorch's default device via prev = torch.get_default_device() / torch.set_default_device(prev) pattern around deserialization and AOTInductor compilation, replacing hard-coded fake CUDA device restoration.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • njzjz
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title mentions speeding up AOTInductor UT compilation and fixing hardcoded device, which aligns with the main objectives: configuring fast AOTInductor settings and addressing device handling issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 016ea5b809

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.35%. Comparing base (345d162) to head (8050d47).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5394   +/-   ##
=======================================
  Coverage   80.35%   80.35%           
=======================================
  Files         819      819           
  Lines       85445    85446    +1     
  Branches     4140     4140           
=======================================
+ Hits        68662    68663    +1     
- Misses      15508    15510    +2     
+ Partials     1275     1273    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Han Wang added 2 commits April 13, 2026 00:09
Centralizing set_default_device(None) in _deserialize_to_file_pt2
re-pushes a stale DeviceContext on restore (torch.get_default_device
returns a stale value after DeviceContext is popped from mode stack),
breaking subsequent training (Adam optimizer creates tensors without
device=). Revert to per-test workarounds which don't have this issue.
…ce()

The device workaround in test_deep_eval.py hardcoded "cuda:9999999" when
restoring the default device after AOTInductor compilation.  Use
torch.get_default_device() to save/restore the actual previous device
instead, making the tests resilient to changes in the fake device value
set by tests/pt/__init__.py.
@wanghan-iapcm wanghan-iapcm changed the title fix(pt_expt): centralize AOTInductor device guard and speed up UT compilation test(pt_expt): speed up AOTInductor UT compilation and fix hardcoded device Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant