Open
Conversation
* update * fix lint * fix lint * update precommit * update precommit * fix lint
* 0-shot Smolinstruct Add 0-shot evaluation and postprocess functions for Smolinstruct * fix acc postprocessor * update 0-shot acc postprocessor * rename 0-shot
* 250527 * 250527 * 250527 * 0530 * 0530 * Update srbench_gen.py * Update srbench.py * srbench fix * Update datasets_info.py * Update datasets_info.py --------- Co-authored-by: Myhs-phz <demarcia2014@126.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
* add LLM as judge setting for matbench * fix reference negative sample gold value missing error * update import * comments trim * fix file and import naming * matbench fix * matbench fix * matbench fix --------- Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk> Co-authored-by: Myhs-phz <demarcia2014@126.com>
* add phybench * phybench fix * update * update --------- Co-authored-by: Myhs-phz <demarcia2014@126.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
* update * update * update
* add gaokao & competition benchmark * fix lint * chem_exam fix --------- Co-authored-by: Myhs-phz <demarcia2014@126.com>
* add gaokao & competition benchmark * fix lint * chem_exam fix * update verifier prompt --------- Co-authored-by: Myhs-phz <demarcia2014@126.com>
* update earth silver benchmark * fix new issues * update * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
* healthbench * fix irrelevant files * first * fix bench * fix bench * fix bench * fix soft link * fix bench * fix bench * healthbench fix * fix bench * fix bench * fix bench * fix bench * fix bench * fix bench * fix bench * fix bench * fix bench --------- Co-authored-by: Myhs-phz <demarcia2014@126.com>
* update * update * update * update * update
* [Dataset] Add R-Bench (ICML 2025) * fixed lint * format rbench.py by isort * rbench fix * r-bench fix * update --------- Co-authored-by: leoyizhang <leoyizhang@tencent.com> Co-authored-by: Myhs-phz <demarcia2014@126.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
* Fix PHYbench * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
* 0616 * 0616 * 0616 * update * update * 0616 --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
* update * update
* update * update
* update needlebench docs for chinese * update bilingual needlebench docs * update docs typo * update docs * update docs typo * [Docs] fix needlebench examples * Add NeedleBench_V2 * [Fix] Fix pre-commit * remove choice version * [Docs] Update NeedleBench Docs * [Docs] update NeedleBenchV2 Docs * [Docs] Update Default Settings for NeedleBench and ATC Configs * [Fix] Fix precommit * [Minor] fix needlebench summarizer groups * [Minor] Update NeedleBenchV2 dataset-index
* timed re.search and _executor made global * TimeOutError exception handling * added missing blank lines * isort import --------- Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
* update * update
* debug rjob runner * optimize concurrent requests by adding max_workers * update --------- Co-authored-by: xujun <xujun@pjlab.org.cn>
* debug rjob runner * optimize concurrent requests by adding max_workers * update * optimize the max_workers for OpenAISDK * optimize the max_workers for OpenAISDK * optimize the max_workers for OpenAISDK * optimize the max_workers for OpenAISDK * Update openai_api.py --------- Co-authored-by: xujun <xujun@pjlab.org.cn> Co-authored-by: nic <nic@yccc.follower> Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
(Warning) This PR may introduce performance BC for ChemExam benchmark.
* Add ProcessBench dataset and evaluation configuration * Enable ProcessBench subsets * Update ProcessBench.py * new file: opencompass/configs/datasets/ProcessBench/README.md
* fix LLM4Mat regex * fix LLM4Mat regex
* update * update * update * update * update
* fix * fix peer * fix
* fix LLM4Mat regex * fix LLM4Mat regex * add scireasoner summarizer * fix invalid score in summarizer * fix --------- Co-authored-by: Myhs_phz <demarcia2014@126.com>
* Update pr-run-test.yml * update * update * update * update
* Add TeleChat-thinking API inference support * fix lint --------- Co-authored-by: Myhs_phz <demarcia2014@126.com>
* fix * Update pr-stage-check.yml * Update pr-stage-check.yml * Initialize average_mfe and retrieved_rfam_family_count Set default values for average_mfe and retrieved_rfam_family_count. * Implement metrics for handling empty predictions * Update pr-stage-check.yml --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
… LCBV6 (#2393) * fix * fix * fix lint
* add example * fix
* Fix extract_role_pred to properly strip whitespace and use explicit None checks * update * fix lint * update * update * update * update * update * update * update * update * update * Update test_base_task.py * Update test_base_task.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )