Add ore, pkp, scielo_br, scielo_mx, scielo_preprints-jats corpora to eval.yml#617
Conversation
…eval.yml part of eLifePathways/ScienceBeam2.0#83 Extends both train and validation splits with the five additional corpora from the sciencebeam-v2-benchmarking dataset. Smoke sampling stays at 10 per corpus; full counts match the dataset row counts. No code changes needed — predict.py and score.py already iterate over the split's corpus keys dynamically.
ScienceBeam Parser EvaluationOverall (59 docs across 6 corpora)grobid 0.9.0-crf: 60 docs | sciencebeam-parser:main-3874e53e-20260527.2141: 9 docs | sciencebeam-parser:pr-617-3afbeb64-20260527.2150: 59 docs
biorxiv (9 docs)grobid 0.9.0-crf: 10 docs | sciencebeam-parser:main-3874e53e-20260527.2141: 9 docs | sciencebeam-parser:pr-617-3afbeb64-20260527.2150: 9 docs
ore (10 docs)grobid 0.9.0-crf: 10 docs | sciencebeam-parser:main-3874e53e-20260527.2141: 0 docs | sciencebeam-parser:pr-617-3afbeb64-20260527.2150: 10 docs
pkp (10 docs)grobid 0.9.0-crf: 10 docs | sciencebeam-parser:main-3874e53e-20260527.2141: 0 docs | sciencebeam-parser:pr-617-3afbeb64-20260527.2150: 10 docs
scielo_br (10 docs)grobid 0.9.0-crf: 10 docs | sciencebeam-parser:main-3874e53e-20260527.2141: 0 docs | sciencebeam-parser:pr-617-3afbeb64-20260527.2150: 10 docs
scielo_mx (10 docs)grobid 0.9.0-crf: 10 docs | sciencebeam-parser:main-3874e53e-20260527.2141: 0 docs | sciencebeam-parser:pr-617-3afbeb64-20260527.2150: 10 docs
scielo_preprints-jats (10 docs)grobid 0.9.0-crf: 10 docs | sciencebeam-parser:main-3874e53e-20260527.2141: 0 docs | sciencebeam-parser:pr-617-3afbeb64-20260527.2150: 10 docs
|
… report With multiple corpora the report was growing too long to scan at a glance. Each corpus section is now wrapped in a collapsed <details> block, and an Overall section is added at the top showing doc-count-weighted F1 across all corpora so the headline result is immediately visible.
part of https://github.com/eLifePathways/ScienceBeam2.0/issues/83
Extends both train and validation splits with the five additional corpora from the sciencebeam-v2-benchmarking dataset. Smoke sampling stays at 10 per corpus; full counts match the dataset row counts. No code changes needed — predict.py and score.py already iterate over the split's corpus keys dynamically.