Skip to content

Multilingual attack detection coverage #5

@prodnull

Description

@prodnull

Context

The Tier 1.5 ONNX classifier has ~30 non-English training samples. The adversarial robustness benchmark (in design) includes a minimal multilingual smoke test to quantify the gap, but full multilingual coverage is deferred to a separate phase.

Scope

  • Assess recall on non-English prompt injection across major languages (Chinese, Spanish, Arabic, Russian, Japanese, Korean, German, French at minimum)
  • Curate or generate multilingual attack samples across the 9 benchmark categories
  • Evaluate whether augmentation or a separate multilingual model head is more effective
  • Measure FPR on non-English benign content (READMEs, docs, comments in non-English repos)

Not in scope

  • Tier 0 regex patterns for non-English (separate effort, different architecture)
  • Real-time translation-based detection

Dependencies

  • Adversarial robustness benchmark framework (in design)
  • Tier 1.5 model retraining pipeline

Evidence

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions