[Pytorch] Make test script generate checkpoints if they don't exist#2650
[Pytorch] Make test script generate checkpoints if they don't exist#2650kainzhong wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Greptile OverviewGreptile SummaryThis PR improves the first-run experience for users running PyTorch tests by automatically generating checkpoint files if they don't exist. The changes address previous review feedback by properly exporting the Key changes:
Impact: Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Script as test.sh
participant Env as Environment
participant FS as File System
participant CheckpointGen as test_checkpoint.py
participant PyTest as pytest
Script->>Env: export NVTE_TEST_CHECKPOINT_ARTIFACT_PATH
Script->>FS: Check if directory exists
alt Directory does not exist
Script->>CheckpointGen: python3 test_checkpoint.py --save-checkpoint all
CheckpointGen->>FS: mkdir(parents=True, exist_ok=True)
loop For each checkpoint type
CheckpointGen->>CheckpointGen: Create module
CheckpointGen->>FS: Save checkpoint file
end
CheckpointGen-->>Script: Success or error_exit
else Directory exists
Script->>Script: Skip generation
end
Script->>PyTest: Run pytest test_checkpoint.py
PyTest->>FS: Load checkpoint files
PyTest->>PyTest: Test module loading
PyTest-->>Script: test_fail or success
|
qa/L0_pytorch_unittest/test.sh
Outdated
| NVTE_TEST_CHECKPOINT_ARTIFACT_PATH=$TE_PATH/artifacts/tests/pytorch/test_checkpoint python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_checkpoint.xml $TE_PATH/tests/pytorch/test_checkpoint.py || test_fail "test_checkpoint.py" | ||
| NVTE_TEST_CHECKPOINT_ARTIFACT_PATH=$TE_PATH/artifacts/tests/pytorch/test_checkpoint | ||
| if [ ! -d "$NVTE_TEST_CHECKPOINT_ARTIFACT_PATH" ]; then | ||
| python3 $TE_PATH/tests/pytorch/test_checkpoint.py --save-checkpoint all |
There was a problem hiding this comment.
NVTE_TEST_CHECKPOINT_ARTIFACT_PATH not exported; checkpoint generation won't use it
| python3 $TE_PATH/tests/pytorch/test_checkpoint.py --save-checkpoint all | |
| NVTE_TEST_CHECKPOINT_ARTIFACT_PATH=$NVTE_TEST_CHECKPOINT_ARTIFACT_PATH python3 $TE_PATH/tests/pytorch/test_checkpoint.py --save-checkpoint all |
5eb12d9 to
63b6471
Compare
qa/L0_pytorch_unittest/test.sh
Outdated
| NVTE_TEST_CHECKPOINT_ARTIFACT_PATH=$TE_PATH/artifacts/tests/pytorch/test_checkpoint python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_checkpoint.xml $TE_PATH/tests/pytorch/test_checkpoint.py || test_fail "test_checkpoint.py" | ||
| export NVTE_TEST_CHECKPOINT_ARTIFACT_PATH=$TE_PATH/artifacts/tests/pytorch/test_checkpoint | ||
| if [ ! -d "$NVTE_TEST_CHECKPOINT_ARTIFACT_PATH" ]; then | ||
| python3 $TE_PATH/tests/pytorch/test_checkpoint.py --save-checkpoint all |
There was a problem hiding this comment.
Missing error handling for checkpoint generation - if this command fails, the script continues and pytest will fail with a less clear error message
| python3 $TE_PATH/tests/pytorch/test_checkpoint.py --save-checkpoint all | |
| python3 $TE_PATH/tests/pytorch/test_checkpoint.py --save-checkpoint all || error_exit "Failed to generate checkpoint files" |
Signed-off-by: Kaining Zhong <kainingz@nvidia.com>
63b6471 to
66aaead
Compare
|
/te-ci |
Description
For users who run pytorch tests for the first time, they are likely to haven't generate checkpoint files and they have to go to
tests/pytorch/test_checkpoint.pyto figure out how to run this file to generate them.Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
tests/pytorch/test_checkpoint.pyto generate all the checkpoint files under that pathcheckpoint_dir.mkdircreate parent directories if they don't exist, which is likely to be the case for people who doexport TE_PATH=$(pwd)before run the testsChecklist: