Skip to content

Comments

Update multilingual leaderboard to 300 instances#51

Open
aorwall wants to merge 1 commit intoSWE-bench:masterfrom
aorwall:fix/multilingual-resolved-rates
Open

Update multilingual leaderboard to 300 instances#51
aorwall wants to merge 1 commit intoSWE-bench:masterfrom
aorwall:fix/multilingual-resolved-rates

Conversation

@aorwall
Copy link
Member

@aorwall aorwall commented Feb 24, 2026

Summary

All 11 multilingual submissions from 20260213 now have exactly 300 instances in per_instance_details (previously 297-298 due to missing trajectories).

Resolved rate changes:

  • Claude 4.5 Haiku: 64.3% → 64.7% (rerun facebook__docusaurus-10309, now resolved)
  • Kimi K2.5: 67.0% → 67.3% (rerun burntsushi__ripgrep-2209, now resolved)

Cost/instance_cost/instance_calls updated for all 11 models to reflect the full 300 instances.

Companion PR: SWE-bench/experiments#421

Test plan

  • All 11 per_instance_details have exactly 300 entries
  • No zero-cost entries
  • Resolved rates match instance counts

All 11 multilingual submissions from 20260213 now have 300 instances
(previously 297-298). Resolved rate changes:
- Claude 4.5 Haiku: 64.3% -> 64.7% (rerun facebook__docusaurus-10309)
- Kimi K2.5: 67.0% -> 67.3% (rerun burntsushi__ripgrep-2209)

Cost fields updated for all 11 models to reflect full 300 instances.
@aorwall aorwall force-pushed the fix/multilingual-resolved-rates branch from c32233b to 9ac5f76 Compare February 24, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant