Skip to content

fix(python): add encoding=utf-8 to file reads and writes in sandbox.py#1912

Open
mesutoezdil wants to merge 6 commits into
NVIDIA:mainfrom
mesutoezdil:fix/python-client-encoding
Open

fix(python): add encoding=utf-8 to file reads and writes in sandbox.py#1912
mesutoezdil wants to merge 6 commits into
NVIDIA:mainfrom
mesutoezdil:fix/python-client-encoding

Conversation

@mesutoezdil

@mesutoezdil mesutoezdil commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

read_text() and fdopen(..., "w") without an explicit encoding use the system locale,
which can be non-UTF-8 on some platforms (e.g. Windows).

4 call sites in sandbox.py read or write UTF-8 config files without specifying encoding:

  • metadata_path.read_text() gateway metadata.json
  • token_path.read_text() oidc_token.json
  • os.fdopen(fd, "w") atomic write of oidc_token.json
  • active_file.read_text() active_gateway name file

All 4 are now explicit with encoding="utf-8".

…ox.py

read_text(), fdopen() without encoding use the system locale, which can
differ on non-UTF-8 systems. All config files (metadata.json,
oidc_token.json, active_gateway) are UTF-8 text.
@mesutoezdil mesutoezdil requested review from a team, derekwaynecarr and mrunalp as code owners June 15, 2026 19:51
@copy-pr-bot

copy-pr-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@johntmyers johntmyers added the gator:in-review Gator is reviewing or awaiting PR review feedback label Jun 15, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

PR Review Status

Validation: This is a small, concentrated Python SDK bug fix that makes OpenShell's shared gateway config files use an explicit UTF-8 encoding, matching the Rust-side writers for metadata.json, active_gateway, and oidc_token.json.
Head SHA: dba017a42c918bbda280010820f75f0ab4d0811b

Review findings:

  • python/openshell/sandbox.py: _read_oidc_token_bundle() now decodes with explicit UTF-8, but UnicodeDecodeError is not caught. The function contract says unreadable token files return None, so a corrupt/non-UTF-8 oidc_token.json should fail closed like malformed JSON:

    except (OSError, UnicodeDecodeError, json.JSONDecodeError):
        return None
  • Please add focused regression coverage for the locale/encoding behavior. Existing tests mostly use locale-default write_text() / read_text(), so they would not catch a future removal of encoding="utf-8" on non-UTF-8 platforms. A small test that writes non-ASCII UTF-8 bytes directly for metadata.json, active_gateway, and oidc_token.json would cover the important paths.

Docs: not needed; this does not change a documented command, config field, or workflow.

Next state: gator:in-review

@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test dba017a

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Author Follow-Up Nudge

This PR has been in gator:in-review for more than 48 business hours with unresolved review feedback.

@mesutoezdil, please respond to the review comments or push an update. If this is no longer planned, please say so and a maintainer can close it out.

… encoding tests

Corrupt or non-UTF-8 oidc_token.json now returns None consistently.
Add regression tests for non-ASCII UTF-8 paths in metadata, oidc token,
and active_gateway files.
@mesutoezdil mesutoezdil requested a review from maxamillion as a code owner June 18, 2026 14:51
@mesutoezdil

mesutoezdil commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Fixed in 7ac0153. Added ensure_ascii=False to test dumps. Added from_active_cluster test writing active_gateway and metadata.json as explicit UTF-8 bytes.

@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test c7b876b

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head c7b876bc2a4288aa2c71f76f99387d8080ee9b03 after @mesutoezdil's 2026-06-18 14:52 UTC comment saying the previous review findings were fixed.

Disposition: partially resolved.

Resolved:

  • _read_oidc_token_bundle() now catches UnicodeDecodeError, so corrupt/non-UTF-8 oidc_token.json fails closed as requested.

Remaining items:

  • python/openshell/sandbox_test.py:1359 and python/openshell/sandbox_test.py:1384: the new "non-ASCII UTF-8" tests still use json.dumps() with its default ensure_ascii=True, so the é values are escaped as ASCII \u00e9. Please write the raw bytes with json.dumps(..., ensure_ascii=False).encode("utf-8") and assert the expected UTF-8 byte sequence is present before writing.
  • python/openshell/sandbox_test.py:1355: coverage still only exercises oidc_token.json. Please add one focused regression test that writes active_gateway and metadata.json as explicit UTF-8 bytes and exercises SandboxClient.from_active_cluster().

Docs: still not needed; this is an internal Python SDK compatibility fix with no documented command, config field, or workflow change.

Next state: gator:in-review

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 7ac0153a542edcab30efb2d6ee338001700c0e06 after @mesutoezdil's 2026-06-18 15:08 UTC comment saying this was ready for re-check.

Disposition: partially resolved.

Resolved:

  • _read_oidc_token_bundle() now catches UnicodeDecodeError, so corrupt/non-UTF-8 oidc_token.json fails closed as requested.
  • The OIDC token tests now write raw UTF-8 bytes with ensure_ascii=False.

Remaining items:

  • python/openshell/sandbox_test.py:1393 and python/openshell/sandbox_test.py:1400: the from_active_cluster() regression test writes UTF-8 bytes for active_gateway and metadata.json, but it would not reliably fail if either read path reverted to locale-default decoding. gateway_name = "gw-utf8" is ASCII, and the non-ASCII metadata["note"] value is never observed. Please make the decoded non-ASCII values affect assertions, for example by using a non-ASCII gateway name and endpoint host:

    gateway_name = "gw-é"
    meta = {"gateway_endpoint": "http://tést.example:8080"}
    assert client._cluster_name == gateway_name
    assert client._endpoint == "tést.example:8080"

    Using http is fine for this focused encoding regression and avoids needing TLS fixtures.

Docs: still not needed; this is an internal Python SDK compatibility fix with no documented command, config field, or workflow change.

Next state: gator:in-review

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 0cf3fe8e77d044c7d1f55351e81847a273c5fc6d after @mesutoezdil's 2026-06-18 15:17 UTC comment saying this was ready for re-check.

Disposition: partially resolved.

Resolved:

  • _read_oidc_token_bundle() catches UnicodeDecodeError, so corrupt/non-UTF-8 oidc_token.json fails closed as requested.
  • The OIDC token tests now write raw UTF-8 bytes with ensure_ascii=False.
  • The from_active_cluster() regression test now uses decoded non-ASCII values in assertions for both the active gateway name and endpoint.

Remaining items:

  • python/openshell/sandbox_test.py:1384: the added write_bytes(json.dumps(...).encode(...)) call is not formatted the way ruff format expects, and OpenShell / Branch Checks runs python:format:check. Please wrap it as:

    (gateway_dir / "oidc_token.json").write_bytes(
        json.dumps(bundle, ensure_ascii=False).encode("utf-8")
    )

Docs: still not needed; this is an internal Python SDK compatibility fix with no documented command, config field, or workflow change.

Next state: gator:in-review

@mesutoezdil

Copy link
Copy Markdown
Contributor Author

ready for re-check on eff441b

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head eff441b5303fc2d5b65b4f1b58be598fdb5b47ac after @mesutoezdil's 2026-06-18 15:40 UTC comment saying this was ready for re-check.

Disposition: resolved.

Resolved:

  • _read_oidc_token_bundle() catches UnicodeDecodeError, so corrupt/non-UTF-8 oidc_token.json fails closed as requested.
  • The OIDC token tests write raw UTF-8 bytes with ensure_ascii=False.
  • The from_active_cluster() regression test now uses decoded non-ASCII values in assertions for both the active gateway name and endpoint.
  • The long write_bytes(json.dumps(...).encode(...)) call is wrapped in the format expected by ruff format.

Remaining items:

  • No blocking items remain. The independent review noted optional future coverage for the _write_to_disk() write path, but this is not blocking for this PR.

Docs: still not needed; this is an internal Python SDK compatibility fix with no documented command, config field, or workflow change.

E2E: no test:* label is required for this Python encoding fix.

Next state: gator:watch-pipeline

@johntmyers johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 18, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test eff441b

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

CI Failure Needs Author Update

The latest head eff441b5303fc2d5b65b4f1b58be598fdb5b47ac reached CI, but OpenShell / Branch Checks failed.

Failure:

  • Python (linux-amd64-cpu8) and Python (linux-arm64-cpu8) both failed in mise run python:format:check.
  • The failed command was uv run ruff format --check python/ tasks/scripts/*.py deploy/sbom/*.py.
  • CI reported: Would reformat: python/openshell/sandbox_test.py.

Next action: @mesutoezdil, please run mise run python:format or uv run ruff format python/openshell/sandbox_test.py, push the formatting-only update, and gator will re-check the new head.

Docs: still not needed for this internal Python SDK encoding fix.

Next state: gator:in-review

@johntmyers johntmyers added gator:in-review Gator is reviewing or awaiting PR review feedback and removed gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 18, 2026
@mesutoezdil

Copy link
Copy Markdown
Contributor Author

ready for re-check on a6bb809

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head a6bb8095f4906251f1376b12c16213b9d5c352e0 after @mesutoezdil's 2026-06-18 16:25 UTC comment saying this was ready for re-check.

Disposition: resolved.

Resolved:

  • The formatting-only follow-up landed in a6bb8095; the previously failing sandbox_test.py formatting issue is no longer visible in the latest diff.
  • _read_oidc_token_bundle() catches UnicodeDecodeError, so corrupt/non-UTF-8 oidc_token.json fails closed as requested.
  • The OIDC token tests write raw UTF-8 bytes with ensure_ascii=False.
  • The from_active_cluster() regression test uses decoded non-ASCII values in assertions for both the active gateway name and endpoint.

Remaining items:

  • No blocking items remain. The independent review noted optional future coverage for directly asserting the _write_to_disk() os.fdopen(..., encoding="utf-8") call, but this is not blocking for this PR.

Docs: still not needed; this is an internal Python SDK compatibility fix with no documented command, config field, or workflow change.

E2E: no test:* label is required for this Python encoding fix.

Next state: gator:watch-pipeline

@johntmyers johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 18, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test a6bb809

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Maintainer Approval Needed

Gator validation and PR monitoring are complete.

Validation: This is a small, concentrated Python SDK compatibility fix that makes OpenShell sandbox config reads and writes use explicit UTF-8 encoding.
Review: Previous gator review feedback is resolved at head a6bb8095f4906251f1376b12c16213b9d5c352e0; no blocking findings remain.
Docs: Not needed; this is an internal Python SDK encoding fix with no documented command, config field, or workflow change.
Checks: OpenShell / Branch Checks, OpenShell / Helm Lint, and DCO are passing for the latest head.
E2E: N/A; no test:* label is required for this Python-only encoding fix, and the required E2E gate statuses report that E2E labels are not applied.

Human maintainer approval or merge decision is now required.

@johntmyers johntmyers added gator:approval-needed Gator completed review; maintainer approval needed and removed gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gator:approval-needed Gator completed review; maintainer approval needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants