Skip to content

CI: Added GHA CI workflow#303

Open
leo-amd wants to merge 35 commits intomasterfrom
leo/create-gha-workflow
Open

CI: Added GHA CI workflow#303
leo-amd wants to merge 35 commits intomasterfrom
leo/create-gha-workflow

Conversation

@leo-amd
Copy link
Collaborator

@leo-amd leo-amd commented Feb 19, 2026

No description provided.

pip install dist/apex-*.whl
"

- name: Run L0 tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you run the import extensions test first

python tests/test_extension_import.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

docker_image:
description: 'Docker image to use'
required: false
default: 'rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default: 'rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1'
default: 'rocm/pytorch:latest'

We don't want to have to update this from time-to-time, so if rocm/pytorch:latest works as a reasonable default, let's use that. If it doesn't, then we need to understand why and make it work.

type: boolean

env:
DOCKER_IMAGE: ${{ inputs.docker_image || 'rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1' }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DOCKER_IMAGE: ${{ inputs.docker_image || 'rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1' }}
DOCKER_IMAGE: ${{ inputs.docker_image || 'rocm/pytorch:latest' }}

Also, to follow the ODR (One Definition Rule), let's remove the default value for the inputs.docker_image since it'll get set from here if it's empty.

jobs:
build:
name: Build Apex Wheel
runs-on: build-only-apex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leo-amd Is this a runner label configured just for Apex builds? I'd assume we would use a more generic runner label. In fact, we should use Github's runners for this, because this is just the build job

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are configured per repository

run: |
docker exec apex-test-container bash -c "
set -e
pip install expecttest onnxscript
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amd-sriram Can these be added to some requirements.txt or requirements-dev.txt file of Apex?

run: |
docker exec apex-test-container bash -c "
set -eo pipefail
export HSA_FORCE_FINE_GRAIN_PCIE=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amd-sriram RCCL team (Wenkai) told me a while back that HSA_FORCE_FINE_GRAIN_PCIE is not needed anymore: "since ROCm 5.7, no longer required". Can you please confirm if that's true on ROCm7.2, and remove it from here (and any Apex documentation), if so?

docker exec apex-test-container bash -c "
set -eo pipefail
export HSA_FORCE_FINE_GRAIN_PCIE=1
export HSA_ENABLE_SDMA=0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amd-sriram Is this a hack/workaround?

L0_results.log
contrib_results.log
halo_results.log
syncbn_results.log
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leo-amd Can we just do *.log here to reduce maintenance when/if we update any of the test steps or log names?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good idea

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amd-sriram / @leo-amd The changes to this file shouldn't be part of this PR, they belong in a different PR. If they are already merged into master, please rebase this PR branch so it doesn't show the diffs in this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants