[pull] main from open-compass:main by pull[bot] · Pull Request #25 · changlan/opencompass

pull · 2025-05-22T13:04:18Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

* update * fix lint * fix lint * update precommit * update precommit * fix lint

* 0-shot Smolinstruct Add 0-shot evaluation and postprocess functions for Smolinstruct * fix acc postprocessor * update 0-shot acc postprocessor * rename 0-shot

* 250527 * 250527 * 250527 * 0530 * 0530 * Update srbench_gen.py * Update srbench.py * srbench fix * Update datasets_info.py * Update datasets_info.py --------- Co-authored-by: Myhs-phz <demarcia2014@126.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>

* add LLM as judge setting for matbench * fix reference negative sample gold value missing error * update import * comments trim * fix file and import naming * matbench fix * matbench fix * matbench fix --------- Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk> Co-authored-by: Myhs-phz <demarcia2014@126.com>

* add phybench * phybench fix * update * update --------- Co-authored-by: Myhs-phz <demarcia2014@126.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

* update * update * update

* add gaokao & competition benchmark * fix lint * chem_exam fix --------- Co-authored-by: Myhs-phz <demarcia2014@126.com>

* add gaokao & competition benchmark * fix lint * chem_exam fix * update verifier prompt --------- Co-authored-by: Myhs-phz <demarcia2014@126.com>

* update earth silver benchmark * fix new issues * update * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

* healthbench * fix irrelevant files * first * fix bench * fix bench * fix bench * fix soft link * fix bench * fix bench * healthbench fix * fix bench * fix bench * fix bench * fix bench * fix bench * fix bench * fix bench * fix bench * fix bench --------- Co-authored-by: Myhs-phz <demarcia2014@126.com>

* update * update * update * update * update

* [Dataset] Add R-Bench (ICML 2025) * fixed lint * format rbench.py by isort * rbench fix * r-bench fix * update --------- Co-authored-by: leoyizhang <leoyizhang@tencent.com> Co-authored-by: Myhs-phz <demarcia2014@126.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

* Fix PHYbench * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

* 0616 * 0616 * 0616 * update * update * 0616 --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

* update * update

* update needlebench docs for chinese * update bilingual needlebench docs * update docs typo * update docs * update docs typo * [Docs] fix needlebench examples * Add NeedleBench_V2 * [Fix] Fix pre-commit * remove choice version * [Docs] Update NeedleBench Docs * [Docs] update NeedleBenchV2 Docs * [Docs] Update Default Settings for NeedleBench and ATC Configs * [Fix] Fix precommit * [Minor] fix needlebench summarizer groups * [Minor] Update NeedleBenchV2 dataset-index

* timed re.search and _executor made global * TimeOutError exception handling * added missing blank lines * isort import --------- Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>

* update * update

* debug rjob runner * optimize concurrent requests by adding max_workers * update --------- Co-authored-by: xujun <xujun@pjlab.org.cn>

* debug rjob runner * optimize concurrent requests by adding max_workers * update * optimize the max_workers for OpenAISDK * optimize the max_workers for OpenAISDK * optimize the max_workers for OpenAISDK * optimize the max_workers for OpenAISDK * Update openai_api.py --------- Co-authored-by: xujun <xujun@pjlab.org.cn> Co-authored-by: nic <nic@yccc.follower> Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>

(Warning) This PR may introduce performance BC for ChemExam benchmark.

* Add ProcessBench dataset and evaluation configuration * Enable ProcessBench subsets * Update ProcessBench.py * new file: opencompass/configs/datasets/ProcessBench/README.md

* fix LLM4Mat regex * fix LLM4Mat regex

* update * update * update * update * update

* fix * fix peer * fix

* fix LLM4Mat regex * fix LLM4Mat regex * add scireasoner summarizer * fix invalid score in summarizer * fix --------- Co-authored-by: Myhs_phz <demarcia2014@126.com>

* Update pr-run-test.yml * update * update * update * update

* Add TeleChat-thinking API inference support * fix lint --------- Co-authored-by: Myhs_phz <demarcia2014@126.com>

* fix * fix

…2380)

* fix * Update pr-stage-check.yml * Update pr-stage-check.yml * Initialize average_mfe and retrieved_rfam_family_count Set default values for average_mfe and retrieved_rfam_family_count. * Implement metrics for handling empty predictions * Update pr-stage-check.yml --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>

… LCBV6 (#2393) * fix * fix * fix lint

* add example * fix

* Fix extract_role_pred to properly strip whitespace and use explicit None checks * update * fix lint * update * update * update * update * update * update * update * update * update * Update test_base_task.py * Update test_base_task.py

[ci] update dlc setting (#2112)

c3779eb

pull bot added the ⤵️ pull label May 22, 2025

pull bot had a problem deploying to prod May 22, 2025 13:04 Error

Myhs-phz and others added 27 commits May 27, 2025 19:41

add qwen3 lmdeply (#2126)

6f3c670

[Dataset] Add SuperGPQA subfield configs (#2124)

408f5ca

* update * fix lint * fix lint * update precommit * update precommit * fix lint

[Dataset] Add Smolinstruct configs (#2127)

d572761

* 0-shot Smolinstruct Add 0-shot evaluation and postprocess functions for Smolinstruct * fix acc postprocessor * update 0-shot acc postprocessor * rename 0-shot

[Datasets] Add PHYBench (#2125)

80ec846

* add phybench * phybench fix * update * update --------- Co-authored-by: Myhs-phz <demarcia2014@126.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

[Update] Minor Updates (#2136)

5720ebf

* update * update * update

[Fix] Delete wrong internvl config (#2135)

8379c4b

[Feature] Add Chem Gaokao & Competition Benchmark (#2134)

1b34aaf

* add gaokao & competition benchmark * fix lint * chem_exam fix --------- Co-authored-by: Myhs-phz <demarcia2014@126.com>

[Feature] Adjust Chem exam verifier (#2142)

da23fd9

* add gaokao & competition benchmark * fix lint * chem_exam fix * update verifier prompt --------- Co-authored-by: Myhs-phz <demarcia2014@126.com>

[Dataset] Earth Silver Benchmark (#2140)

eae3142

* update earth silver benchmark * fix new issues * update * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

[Update] Update Aime2024 auto-download (#2137)

c29c258

[Feature] Rjob Runner (#2144)

becff4c

* update * update * update * update * update

[Dataset] Update PHYbench postprocess (#2150)

2a1e6e8

* Fix PHYbench * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

[Dataset] Update Srbench dataset (#2154)

5d75dc2

* 0616 * 0616 * 0616 * update * update * 0616 --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

[Update] Update SmolInsttruct dataset config (#2157)

978fc36

* update * update

[Update] Handle SRbench non-parseable pred (#2158)

944e90d

* update * update

[Dataset] Extend 256k and 512k data for RULER (#2109)

5fd4899

More stable MBPP evaluation (#2111)

d07bb3d

* timed re.search and _executor made global * TimeOutError exception handling * added missing blank lines * isort import --------- Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>

[Dataset] Fix SuperGPQA postprocess (#2165)

a9fc798

* update * update

fix statis.py (#2170)

87ae8db

[Fix] RJob Runner and max_worker for OpenAISDK (#2171)

bc0480b

* debug rjob runner * optimize concurrent requests by adding max_workers * update --------- Co-authored-by: xujun <xujun@pjlab.org.cn>

[Fix] Fix the systemprompt issue for chem_exam benchmark(#2169)

ea02ee9

(Warning) This PR may introduce performance BC for ChemExam benchmark.

sudanl and others added 30 commits December 17, 2025 14:03

[Dataset] Add ProcessBench dataset and evaluation configuration (#2274)

66a0b7e

* Add ProcessBench dataset and evaluation configuration * Enable ProcessBench subsets * Update ProcessBench.py * new file: opencompass/configs/datasets/ProcessBench/README.md

[Fix] Fix LLM4Mat eval (SciReasoner) (#2366)

26ae808

* fix LLM4Mat regex * fix LLM4Mat regex

[Fix] Fix Openai_streaming about max_worker and o1_model_list (#2367)

4bf596c

[Dataset] add UGD_hard (#2365)

2bf3bc2

[ci] add ifbench and lcb_pro into daily testcase (#2369)

4fee422

* update * update * update * update * update

[Fix] fix problems in SciReasoner (#2372)

d90ce47

* fix * fix peer * fix

[Dataset] Add SciReasoner Summarizer (#2370)

17c8f4c

* fix LLM4Mat regex * fix LLM4Mat regex * add scireasoner summarizer * fix invalid score in summarizer * fix --------- Co-authored-by: Myhs_phz <demarcia2014@126.com>

[ci] change github runner (#2373)

5f0676e

* Update pr-run-test.yml * update * update * update * update

[Model] Add TeleChat-thinking API inference support (#2371)

ba9e13e

* Add TeleChat-thinking API inference support * fix lint --------- Co-authored-by: Myhs_phz <demarcia2014@126.com>

[Fix] Fix other eval problems in SciReasoner (#2375)

c512d07

* fix * fix

[Fix] fix sample num and peer evaluator in SciReasoner (#2378)

eaf6ef2

[Fix] add eval_prompt for ugd_hard (#2376)

2f13380

[Fix] Add handling for the Ellipsis case in bio_data_task evaluation (#…

ee41531

…2380)

[Fix] fix RNAfold cmd and RNA sequence matching problems (#2382)

fa01328

[Update] Add meta logger in OpenICLInferTask (#2383)

736a7e8

[Fix] fix pattern match in Smolinstruct (#2384)

f4f09d2

[Fix] fix smact requirement (#2377)

ff73a4c

[Fix] fix finish_reason problem in OpenAISDKStreaming

29ca947

[Update] Add evaluation example for Intern-S1-Pro (#2394)

bb15146

[Fix] Add custom mock for sys.stdin that supports buffer attribute in…

a292806

… LCBV6 (#2393) * fix * fix * fix lint

[Update] Add evaluation example for SciReasoner (#2395)

480c6ca

* add example * fix

[ci] add unittest (#2390)

0d99e7c

* Fix extract_role_pred to properly strip whitespace and use explicit None checks * update * fix lint * update * update * update * update * update * update * update * update * update * Update test_base_task.py * Update test_base_task.py

[Update] add LLM-judge-based C-Eval config

714b380

[Update] add finish_reason_confirm tag in OpenAISDKStreaming

2c6fb22

[Update] update subjective config in compassbench

98e29a2

[Update] add oss download for new added datasets.

d83a68d

[Bump] Bump version to 0.5.2 (#2402)

9741792

[Dataset] add AIME2026 and HMMT Feb 2026 (#2404)

cd0c6d7

[Fix] fix path of IFEval (#2406)

7a6dcf3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from open-compass:main#25

[pull] main from open-compass:main#25
pull[bot] wants to merge 141 commits intochanglan:mainfrom
open-compass:main

pull bot commented May 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

pull bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

pull bot commented May 22, 2025 •

edited

Loading