fix: use ungated llama tokenizer mirrors by eexwhyzee · Pull Request #90 · PrimeIntellect-ai/renderers

eexwhyzee · 2026-06-18T23:08:55Z

Note

Medium Risk
Central tokenizer loading now depends on third-party mirror repos for two production model IDs; a bad mirror change could affect templates/encoding until overrides are reviewed, though trust_remote_code stays off and overrides are narrowly scoped.

Overview
Gated Meta Llama-3.2 Instruct tokenizers can be loaded without HuggingFace license access by routing load_tokenizer (and offset-tokenizer reloads) through audited unsloth mirror repos while callers still pass canonical meta-llama/Llama-3.2-*-Instruct IDs.

Adds TOKENIZER_SOURCE_OVERRIDES plus helpers that pick the load repo, apply existing trust/revision policy on the mirror path, and rewrite tokenizer.name_or_path back to the requested Meta ID so MODEL_RENDERER_MAP auto-resolution still picks Llama3Renderer.

Shared test matrices now use the canonical Meta model name with "auto" instead of calling the mirror directly; new unit tests cover mirror selection, name preservation, and offset-tokenizer behavior.

^{Reviewed by Cursor Bugbot for commit 0dc19a0. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Fix tokenizer loading for gated Meta Llama-3.2 models by routing to ungated unsloth mirrors

Adds TOKENIZER_SOURCE_OVERRIDES in renderers/base.py mapping canonical meta-llama/Llama-3.2-1B-Instruct and 3B-Instruct IDs to their ungated unsloth mirrors, so tokenizer loading no longer fails for users without Hugging Face access to the gated repos.
Introduces _tokenizer_source_for and _tokenizer_load_kwargs helpers to apply overrides and compute trust/revision kwargs consistently across load_tokenizer and _get_offset_tokenizer.
Adds _preserve_requested_tokenizer_name to ensure the returned tokenizer's name_or_path always reflects the originally requested canonical model ID, not the mirror path.
Updates tests to use canonical meta-llama/ IDs and adds coverage for mirror routing, name preservation, and offset-tokenizer reload behavior.

^{Macroscope summarized 0dc19a0.}

macroscopeapp · 2026-06-18T23:10:28Z

Approvability

Verdict: Needs human review

This PR changes runtime tokenizer loading behavior by redirecting Meta Llama models to load from unsloth mirrors while preserving canonical names for renderer auto-resolution. The new source override logic and name preservation mechanism warrant human review to verify correctness.

^{You can customize Macroscope's approvability policy. Learn more.}

Use ungated Llama tokenizer mirrors

20d89be

eexwhyzee added 2 commits June 18, 2026 16:20

Format tokenizer alias test

fe1dcda

Classify canonical Llama as preserve-thinking no-op

0dc19a0

eexwhyzee requested a review from hallerite June 19, 2026 00:18

hallerite approved these changes Jun 19, 2026

View reviewed changes

eexwhyzee merged commit 1933293 into main Jun 19, 2026
11 checks passed

eexwhyzee deleted the fix/unsloth-llama-tokenizer branch June 19, 2026 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use ungated llama tokenizer mirrors#90

fix: use ungated llama tokenizer mirrors#90
eexwhyzee merged 3 commits into
mainfrom
fix/unsloth-llama-tokenizer

eexwhyzee commented Jun 18, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eexwhyzee commented Jun 18, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix tokenizer loading for gated Meta Llama-3.2 models by routing to ungated unsloth mirrors

Uh oh!

macroscopeapp Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eexwhyzee commented Jun 18, 2026 •

edited by macroscopeapp Bot

Loading

macroscopeapp Bot commented Jun 18, 2026 •

edited

Loading