Skip to content

Publish code-review vanilla baseline (claude-haiku-4-5, repeat=5)#684

Open
gggdttt wants to merge 5 commits into
mainfrom
leaderboard/code-review/28105886936
Open

Publish code-review vanilla baseline (claude-haiku-4-5, repeat=5)#684
gggdttt wants to merge 5 commits into
mainfrom
leaderboard/code-review/28105886936

Conversation

@gggdttt

@gggdttt gggdttt commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Publishes the first code-review leaderboard baseline.

Config: GitHub Copilot CLI (claude-haiku-4-5), category=code-review, repeat=5, no MCP/LSP/instructions/skills/agent. All 5 runs succeeded and aggregated (leaderboard/code-review/28105886936).

Results (95% CI over 5 runs):

  • micro F1 = 20.2% [18.9, 21.3]
  • macro F1 = 22.8% [21.9, 24.8]
  • micro P/R = 19.0% / 21.7% ; macro P/R = 22.7% / 29.5%

Per-run micro F1: 0.218, 0.183, 0.215, 0.205, 0.189.

Establishes the vanilla anchor for measuring skill/instruction improvements.

@gggdttt gggdttt requested a review from haoranpb June 24, 2026 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant