Multilingual attack detection coverage

## Context

The Tier 1.5 ONNX classifier has ~30 non-English training samples. The adversarial robustness benchmark (in design) includes a minimal multilingual smoke test to quantify the gap, but full multilingual coverage is deferred to a separate phase.

## Scope

- Assess recall on non-English prompt injection across major languages (Chinese, Spanish, Arabic, Russian, Japanese, Korean, German, French at minimum)
- Curate or generate multilingual attack samples across the 9 benchmark categories
- Evaluate whether augmentation or a separate multilingual model head is more effective
- Measure FPR on non-English benign content (READMEs, docs, comments in non-English repos)

## Not in scope

- Tier 0 regex patterns for non-English (separate effort, different architecture)
- Real-time translation-based detection

## Dependencies

- Adversarial robustness benchmark framework (in design)
- Tier 1.5 model retraining pipeline

## Evidence

- Current training data: ~30 non-English samples (documented in `docs/MINI-SEMANTIC-MODEL.md`)
- Known limitation #3 in `docs/SECURITY.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilingual attack detection coverage #5

Context

Scope

Not in scope

Dependencies

Evidence

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multilingual attack detection coverage #5

Description

Context

Scope

Not in scope

Dependencies

Evidence

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions