huggingface / lighteval Public

Notifications You must be signed in to change notification settings
Fork 429
Star 2.3k

Code
Issues 206
Pull requests 68
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: huggingface/lighteval

Labels 14 Milestones 0

New pull request New

68 Open 637 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Upgrade vLLM from 0.10.1.1 to 0.14.1

#1173 opened Feb 19, 2026 by NathanHB

Loading…

Fix: pass through custom_tasks and enable multilingual in eval command

#1172 opened Feb 19, 2026 by dzautner

Loading…

2 tasks done

Fix IndexError in LogProbTokenNorm when choices_tokens is shorter than choices_logprob

#1171 opened Feb 18, 2026 by Fridayai700

Loading…

Add jfinqa: Japanese Financial Numerical Reasoning QA

#1169 opened Feb 17, 2026 by ajtgjmdjp

Loading…

2 of 3 tasks

fix: restore task list display logic

#1166 opened Feb 10, 2026 by s1eeping-king

Loading…

fix: Transformers Model no template cast stop_sequences to list

#1165 opened Feb 7, 2026 by mrsndmn

Loading…

Fix TypeError in aa_omniscience_prompt

#1161 opened Jan 22, 2026 by pjavanrood

Loading…

Fix split loading error in bigbench

#1159 opened Jan 22, 2026 by pjavanrood

Loading…

Fix CoQA metric and support multi-doc loading

#1157 opened Jan 22, 2026 by pjavanrood • Draft

Fix RecursionError in imdb_contrastset_prompt

#1155 opened Jan 22, 2026 by pjavanrood

Loading…

Fix legal_summarization keys and SummaC metric

#1153 opened Jan 22, 2026 by pjavanrood • Draft

Fix non-existent evaluation splits in lextreme

#1151 opened Jan 22, 2026 by pjavanrood

Loading…

Fix evaluation split config in lsat_qa

#1149 opened Jan 22, 2026 by pjavanrood

Loading…

Improve NarrativeQA metrics and prompt structure

#1147 opened Jan 22, 2026 by pjavanrood

Loading…

Fix schema validation in olympiad_bench Doc.specific

#1145 opened Jan 22, 2026 by pjavanrood

Loading…

Fix key mismatch and context access in PubMedQA

#1143 opened Jan 22, 2026 by pjavanrood

Loading…

Fix TypeError in real_toxicity_prompts

#1141 opened Jan 22, 2026 by pjavanrood

Loading…

Fix column mismatch and metric in SimpleQA

#1139 opened Jan 22, 2026 by pjavanrood

Loading…

Fix subset names in StoryCloze

#1137 opened Jan 22, 2026 by pjavanrood

Loading…

Fix Doc init and missing metadata in Summarization tasks

#1135 opened Jan 22, 2026 by pjavanrood

Loading…

Fix hardcoded path in tiny_benchmarks

#1133 opened Jan 22, 2026 by pjavanrood

Loading…

Fix KeyError in truthful_qa_generative_prompt

#1131 opened Jan 22, 2026 by pjavanrood

Loading…

Fix MT-Bench multi-turn evaluation logic

#1129 opened Jan 22, 2026 by pjavanrood • Draft

Fix specific error in truthfulqa

#1127 opened Jan 22, 2026 by ChenZiHong-Gavin

Loading…

Support for retriever-augmented models.

#1125 opened Jan 19, 2026 by akshathmangudi

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!