Context
The Tier 1.5 ONNX classifier has ~30 non-English training samples. The adversarial robustness benchmark (in design) includes a minimal multilingual smoke test to quantify the gap, but full multilingual coverage is deferred to a separate phase.
Scope
- Assess recall on non-English prompt injection across major languages (Chinese, Spanish, Arabic, Russian, Japanese, Korean, German, French at minimum)
- Curate or generate multilingual attack samples across the 9 benchmark categories
- Evaluate whether augmentation or a separate multilingual model head is more effective
- Measure FPR on non-English benign content (READMEs, docs, comments in non-English repos)
Not in scope
- Tier 0 regex patterns for non-English (separate effort, different architecture)
- Real-time translation-based detection
Dependencies
- Adversarial robustness benchmark framework (in design)
- Tier 1.5 model retraining pipeline
Evidence
Context
The Tier 1.5 ONNX classifier has ~30 non-English training samples. The adversarial robustness benchmark (in design) includes a minimal multilingual smoke test to quantify the gap, but full multilingual coverage is deferred to a separate phase.
Scope
Not in scope
Dependencies
Evidence
docs/MINI-SEMANTIC-MODEL.md)docs/SECURITY.md