Skip to content

fix(e2e): refresh latest sandbox image for docker runs#1928

Merged
elezar merged 2 commits into
mainfrom
hicks/push-ymusqoxstpnv
Jun 17, 2026
Merged

fix(e2e): refresh latest sandbox image for docker runs#1928
elezar merged 2 commits into
mainfrom
hicks/push-ymusqoxstpnv

Conversation

@krishicks

@krishicks krishicks commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

This fixes an issue where you may run e.g. mise run e2e:python, then after Python is upgraded in mise.toml, subsequent runs of e2e:python fail because the Python version is out of sync.

Agent notes:

Details
  • Python 3.14.5 is selected by mise.toml:22.
  • e2e:python runs uv run pytest through tasks/test.toml:74, using that mise-managed Python.
  • The sandbox image was relying on ghcr.io/nvidia/openshell-community/sandboxes/base:latest in e2e/with-docker-gateway.sh:422.
  • The stale-image bug came from pairing that mutable latest tag with image_pull_policy = "IfNotPresent".

I changed the Docker e2e wrapper so:

  • :latest or omitted-tag images use image_pull_policy = "Always".
  • pinned tags and digest refs use IfNotPresent.
  • callers can override with OPENSHELL_E2E_DOCKER_SANDBOX_IMAGE_PULL_POLICY or OPENSHELL_SANDBOX_IMAGE_PULL_POLICY.

I chose pull-on-latest over pinning because this path already depends on the community base:latest image tracking the repo toolchain. Pinning a digest would just move the sync burden into the repo every time Python changes.

Related Issue

Changes

  • Changes the ImagePullPolicy to Always when pulling an image with either latest or no tag (which is treated as latest)

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@krishicks krishicks added the test:e2e Requires end-to-end coverage label Jun 16, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@krishicks krishicks marked this pull request as ready for review June 16, 2026 14:25
@krishicks krishicks requested review from a team, derekwaynecarr and mrunalp as code owners June 16, 2026 14:25
@github-actions

Copy link
Copy Markdown

Label test:e2e applied for f9899bb. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

TaylorMutch
TaylorMutch previously approved these changes Jun 16, 2026
elezar
elezar previously approved these changes Jun 16, 2026
@elezar elezar force-pushed the hicks/push-ymusqoxstpnv branch from f9899bb to a3bc5bf Compare June 17, 2026 06:47
@elezar elezar requested a review from maxamillion as a code owner June 17, 2026 06:47
@elezar elezar enabled auto-merge (squash) June 17, 2026 06:47
This fixes an issue where you may run e.g. `mise run e2e:python`, then after
Python is upgraded in mise.toml, subsequent runs of `e2e:python` fail because
the Python version is out of sync.

Signed-off-by: Kris Hicks <khicks@nvidia.com>
@elezar elezar dismissed stale reviews from TaylorMutch and themself via 50ad503 June 17, 2026 10:13
@elezar elezar force-pushed the hicks/push-ymusqoxstpnv branch from a3bc5bf to 50ad503 Compare June 17, 2026 10:13
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the hicks/push-ymusqoxstpnv branch from 50ad503 to 2b875e0 Compare June 17, 2026 10:16
@elezar

elezar commented Jun 17, 2026

Copy link
Copy Markdown
Member

@krishicks I split my follow-up into a separate commit on top of your original change so the history shows the intent clearly.

The reason for the follow-up is that image_pull_policy = Always is global for the Docker driver. It refreshes the default :latest sandbox image, but it also affects images built locally by openshell sandbox create --from Dockerfile. Those local builds are tagged like openshell/sandbox-from:<timestamp> and only exist in the local Docker daemon. With Always, the gateway tries to pull that tag from Docker Hub, which caused the rust-docker E2E failure in https://github.com/NVIDIA/OpenShell/actions/runs/27671045158/job/81836989458.

The follow-up keeps the good part of the change: the wrapper refreshes the configured default :latest sandbox image before starting the gateway. It leaves the Docker driver pull policy defaulting to IfNotPresent, so local Dockerfile-built images remain usable. The explicit OPENSHELL_E2E_DOCKER_SANDBOX_IMAGE_PULL_POLICY override still works for cases that really need global Always behavior.

@elezar elezar merged commit 234e69d into main Jun 17, 2026
40 checks passed
@elezar elezar deleted the hicks/push-ymusqoxstpnv branch June 17, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants