[EASI] merge EASI added benchmarks and models into VLMEvalKit by PeterWangyi · Pull Request #1433 · open-compass/VLMEvalKit

PeterWangyi · 2026-02-05T07:15:09Z

Feature:

Add 20+ Spatial Intelligence benchmarks
Add 20+ Spatial Intelligence models
Implement an independent scoring component that supports regular expression matching and LLM-as-a-judge

To ensure reproduction accuracy, we have provided a detailed verification report in bench_verify.md, comparing our results with official baselines.

PeterWangyi · 2026-02-11T10:18:49Z

hi @mzr1996

I have followed the suggestions:

Added deprecation information for mmsi-video
Added a benchmark verify markdown
After internal review, we still prefer that related tsvs be hosted under our HF space. Thank you for your understanding~

Ready for re-review. Thanks!

PeterWangyi · 2026-02-25T07:57:32Z

hi @mzr1996
I have resolved the conflict, and this PR is now ready for re-review, thanks!

mzr1996 · 2026-02-26T09:18:01Z

Looks like we need to update the requirements file.
https://github.com/open-compass/VLMEvalKit/actions/runs/22387494567/job/64801226075?pr=1433
CI failed because of missing dependencies.

PeterWangyi · 2026-02-26T09:52:00Z

Looks like we need to update the requirements file. https://github.com/open-compass/VLMEvalKit/actions/runs/22387494567/job/64801226075?pr=1433 CI failed because of missing dependencies.

Thanks for the heads-up! I have updated the requirements to include the missing dependencies. However, it seems a new issue has emerged in the latest CI run. I'm currently looking into what might be causing this.

mzr1996 · 2026-02-27T04:26:31Z

I have invited our QA team to check the CI result. Please wait for our fix.

zhulinJulia24 · 2026-03-02T04:53:29Z

@PeterWangyi Please rebase main branch. The baseline and image path is updated in main branch. Thanks!

* [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download

* [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation

* [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch

* [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com>

* [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com>

* [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models * [Feature] Add SenseSI series models * [Feature] add use custom propmt flag to contrl prompt format. --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com>

…ce and rename spatial utils folder (open-compass#7) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch * [Refactor] Change all EASI related bench tsv download url * [Refactor] Add caa & mra definition and add site & vsi paper link * [Refactor] declare no circular is aligned with mmsi offical mmsi * [Refactor] rename spatial utils folder to reduce confusion * [Refactor] add EASI prompt format explaination * [Refactor] add EASI prompt format explaination * [Refactor] switch to new hf url

…mpatibility with latest transformers (open-compass#8) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version

* [Benchmark] Support RefCOCO (open-compass#1305) * Suppot Qwen3VL Series * Support Qwen3-VL Series * Support Qwen3-VL Series * Support Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * support refcoco * fix lint * [Benchmark] Add MindCube Bench (open-compass#1) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Benchmark] Add EASI related image spatial bench (open-compass#2) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Benchmark] Add EASI related video benchmark (open-compass#3) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch * [Model] Add SpatialMLLM model (open-compass#4) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Model] Add Spatial VLM Models (open-compass#5) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Models] Add SenseSI series models (open-compass#6) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models * [Feature] Add SenseSI series models * [Feature] add use custom propmt flag to contrl prompt format. --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Refactor] Modify EASI tsv download url and add several paper reference and rename spatial utils folder (open-compass#7) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch * [Refactor] Change all EASI related bench tsv download url * [Refactor] Add caa & mra definition and add site & vsi paper link * [Refactor] declare no circular is aligned with mmsi offical mmsi * [Refactor] rename spatial utils folder to reduce confusion * [Refactor] add EASI prompt format explaination * [Refactor] add EASI prompt format explaination * [Refactor] switch to new hf url * [Fix] SiteImage tsv download url and remove load_in_8bit param for compatibility with latest transformers (open-compass#8) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version * [Refactor] change SenseNova-SI series models hf dir and fix vsi sitevideo dataset type (open-compass#9) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version * [Fix] Specify the dataset type for Sitevideo Vsi. * [Refactor] change sensenova_si hf dir * [Feature] Add VsiBench Debiased subset * [Feature] Add VsiBench Debiased subset (open-compass#10) * [Feature] Add cambrian-s model * [Refactor]] Add Requirements guide * [Fix] delete refcoco due to force push * [Fix] error when text is empty * [Feature] automatically specify device * [Refactor] remove unused code and set videoreader num_thread to default=0 --------- Co-authored-by: Junming Lin <114148730+mjuicem@users.noreply.github.com> Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com>

…s#13) * [Benchmark] Support RefCOCO (open-compass#1305) * Suppot Qwen3VL Series * Support Qwen3-VL Series * Support Qwen3-VL Series * Support Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * support refcoco * fix lint * [Benchmark] Add MindCube Bench (open-compass#1) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Benchmark] Add EASI related image spatial bench (open-compass#2) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Benchmark] Add EASI related video benchmark (open-compass#3) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch * [Model] Add SpatialMLLM model (open-compass#4) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Model] Add Spatial VLM Models (open-compass#5) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Models] Add SenseSI series models (open-compass#6) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models * [Feature] Add SenseSI series models * [Feature] add use custom propmt flag to contrl prompt format. --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Refactor] Modify EASI tsv download url and add several paper reference and rename spatial utils folder (open-compass#7) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch * [Refactor] Change all EASI related bench tsv download url * [Refactor] Add caa & mra definition and add site & vsi paper link * [Refactor] declare no circular is aligned with mmsi offical mmsi * [Refactor] rename spatial utils folder to reduce confusion * [Refactor] add EASI prompt format explaination * [Refactor] add EASI prompt format explaination * [Refactor] switch to new hf url * [Fix] SiteImage tsv download url and remove load_in_8bit param for compatibility with latest transformers (open-compass#8) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version * [Refactor] change SenseNova-SI series models hf dir and fix vsi sitevideo dataset type (open-compass#9) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version * [Fix] Specify the dataset type for Sitevideo Vsi. * [Refactor] change sensenova_si hf dir * [Feature] Add VsiBench Debiased subset * [Feature] Add VsiBench Debiased subset (open-compass#10) * [Feature] Add cambrian-s model * [Refactor]] Add Requirements guide * [Fix] delete refcoco due to force push * [Fix] error when text is empty * [Fix] spatialmllm inference error during multi images qa * [Feature] automatically specify device * [Refactor] remove unused code and set videoreader num_thread to default=0 --------- Co-authored-by: Junming Lin <114148730+mjuicem@users.noreply.github.com> Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com>

* [Benchmark] Support RefCOCO (open-compass#1305) * Suppot Qwen3VL Series * Support Qwen3-VL Series * Support Qwen3-VL Series * Support Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * support refcoco * fix lint * [Benchmark] Add MindCube Bench (open-compass#1) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Benchmark] Add EASI related image spatial bench (open-compass#2) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Benchmark] Add EASI related video benchmark (open-compass#3) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch * [Model] Add SpatialMLLM model (open-compass#4) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Model] Add Spatial VLM Models (open-compass#5) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Models] Add SenseSI series models (open-compass#6) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models * [Feature] Add SenseSI series models * [Feature] add use custom propmt flag to contrl prompt format. --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Refactor] Modify EASI tsv download url and add several paper reference and rename spatial utils folder (open-compass#7) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch * [Refactor] Change all EASI related bench tsv download url * [Refactor] Add caa & mra definition and add site & vsi paper link * [Refactor] declare no circular is aligned with mmsi offical mmsi * [Refactor] rename spatial utils folder to reduce confusion * [Refactor] add EASI prompt format explaination * [Refactor] add EASI prompt format explaination * [Refactor] switch to new hf url * [Fix] SiteImage tsv download url and remove load_in_8bit param for compatibility with latest transformers (open-compass#8) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version * [Refactor] change SenseNova-SI series models hf dir and fix vsi sitevideo dataset type (open-compass#9) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version * [Fix] Specify the dataset type for Sitevideo Vsi. * [Refactor] change sensenova_si hf dir * [Feature] Add VsiBench Debiased subset * [Feature] Add VsiBench Debiased subset (open-compass#10) * [Feature] Add cambrian-s model * [Refactor]] Add Requirements guide * [Fix] delete refcoco due to force push * [Fix] error when text is empty * [Fix] spatialmllm inference error during multi images qa * [Feature] Add VST --------- Co-authored-by: Junming Lin <114148730+mjuicem@users.noreply.github.com> Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com>

* [Benchmark] Support RefCOCO (open-compass#1305) * Suppot Qwen3VL Series * Support Qwen3-VL Series * Support Qwen3-VL Series * Support Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * Support Qwen3-Omni and update Qwen3-VL Series * support refcoco * fix lint * [Benchmark] Add MindCube Bench (open-compass#1) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Benchmark] Add EASI related image spatial bench (open-compass#2) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Benchmark] Add EASI related video benchmark (open-compass#3) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch * [Model] Add SpatialMLLM model (open-compass#4) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Model] Add Spatial VLM Models (open-compass#5) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Models] Add SenseSI series models (open-compass#6) * [Model] add SpatialMLLM support * [Model] add SpatialMLLM import to __init__.py * [Style] apply pre-commit check * [Feature] Add more spatial model * [Feature] support correct loading of qwen25 derivative models * [Feature] Add SenseSI series models * [Feature] add use custom propmt flag to contrl prompt format. --------- Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com> * [Refactor] Modify EASI tsv download url and add several paper reference and rename spatial utils folder (open-compass#7) * [Feature] add EASI related spatial bench utils func * [Style] format using pre-commit * [Style] format using pre-commit * [Feature] upgrade unknown image format verify * [Feature] add mindcube bench * [Fix] cache path is none error during first download * [Feature] add embspatial benchmark * [Feature] add viewspatial benchmark * [Feature] add mmsi with out circular benchmark * [Feature] enable mmsi && embspatial && viewspatial evaluation * [Feature] add prepare tsv method to VideoBaseDataset * [Feature] add vsi bench * [Feature] add Site Bench * [Feature] enable vsi && site evaluation * [Fix] Sitebench category name mismatch * [Refactor] Change all EASI related bench tsv download url * [Refactor] Add caa & mra definition and add site & vsi paper link * [Refactor] declare no circular is aligned with mmsi offical mmsi * [Refactor] rename spatial utils folder to reduce confusion * [Refactor] add EASI prompt format explaination * [Refactor] add EASI prompt format explaination * [Refactor] switch to new hf url * [Fix] SiteImage tsv download url and remove load_in_8bit param for compatibility with latest transformers (open-compass#8) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version * [Refactor] change SenseNova-SI series models hf dir and fix vsi sitevideo dataset type (open-compass#9) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version * [Fix] Specify the dataset type for Sitevideo Vsi. * [Refactor] change sensenova_si hf dir * [Feature] Add VsiBench Debiased subset * [Feature] Add VsiBench Debiased subset (open-compass#10) * [Feature] Add cambrian-s model * [Refactor]] Add Requirements guide * [Fix] delete refcoco due to force push * [Fix] error when text is empty * [Fix] spatialmllm inference error during multi images qa * [Feature] Add VST * [Feature] automatically specify device * [Feature] add bagel * [Refactor] remove unused code and set videoreader num_thread to default=0 * [Feature] Add Bagel Model * [Feature] Add Bagel Model --------- Co-authored-by: Junming Lin <114148730+mjuicem@users.noreply.github.com> Co-authored-by: oscarqjh <oscar.jh9@gmail.com> Co-authored-by: Oscar Qian <91544028+oscarqjh@users.noreply.github.com>

)

…ideo dataset type (open-compass#9) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version * [Fix] Specify the dataset type for Sitevideo Vsi. * [Refactor] change sensenova_si hf dir

* [Feature] add sparbench && spatialvizbench && starebench * [Fix] no cot && cot confit error * [Fix] no cot && cot confit error * [Feature] update tsv url && rm useless notes * [Refactor] upgrade output format * [Refactor] upgrade output format * [Feature] Add OmniSpatialBench * [Refactor] upgrade notes * [Refactor] remove useless spatial_rel_bench_folder due to rebase and version rollback issue. * [Refactor] remove useless logic * [Refactor] more elegant way to choose qwen architecture * [Refactor] more elegant way to choose qwen architecture * [Refactor] remove useless model name kwarg * [Refactor] remove vsi EASI prompt since EASI will not promote ss * [Refactor] Modify tsv url and remove useless print * [Feature] added stratified statistical accuracy function * [Fix] remove top comment * [Refactor] extract common parts in build prompt for easier understanding * [Refactor] extract common url * [Refactor] extract common url

* [Feature] add sparbench && spatialvizbench && starebench * [Fix] no cot && cot confit error * [Fix] no cot && cot confit error * [Feature] update tsv url && rm useless notes * [Refactor] upgrade output format * [Refactor] upgrade output format * [Feature] Add OmniSpatialBench * [Refactor] upgrade notes * [Refactor] remove useless spatial_rel_bench_folder due to rebase and version rollback issue. * [Refactor] remove useless logic * [Refactor] more elegant way to choose qwen architecture * [Refactor] more elegant way to choose qwen architecture * [Refactor] remove useless model name kwarg * [Refactor] remove vsi EASI prompt since EASI will not promote ss * [Refactor] Modify tsv url and remove useless print * [Feature] added stratified statistical accuracy function * [Fix] remove top comment * [Refactor] extract common parts in build prompt for easier understanding * [Refactor] extract common url * [Refactor] extract common url * [Feature] add EASI tsv md5 * [Refactor] remove useless LMUdata import

* Squashed 'vlmeval/vlm/vlm3r/CUT3R/' content from commit 5124436 git-subtree-dir: vlmeval/vlm/vlm3r/CUT3R git-subtree-split: 51244364af3566d6473559f71a81b4accc75c424 * Add VLM3R * Add VLM3R * support to download cut3r ckp from official google drive * add code to build cut3r from the source * use EASI prompt for vsibench * rm CUT3R subtree * Add CUT3R code * rm unused code in CUT3R * fix import error * ignore pth and data * rm unused spatial encoder and vision encoder * fix the bug for Siglip vision encoder * download cut3r pth from HF instead of google drive

…ompass#22) * [Refactor] Refactor regex answer parsing and improve comments * [Feature] use last number instead of first to match na options * [Feature] support English number words in NA matcher * [Feature] add tools to build options from xlsx rows * [Feature] support llm matching for both mcq and vqa * [Feature] support parallel llm judge and support na llm extract * [Fix] fix coner case when options in mutliple lines * [Feature] add matching func factory * [Fix] llm extract fetching problem * [Refactor] improve func naming * [Feature] support LLM matching * [Refactor] extract common func * [Refactor] remove useless content * [Refactor] modify the content of the copilot check. * [Fix] construct type error * [Feature] determine result file name by judge model name * [Refactor] fix naming of eval mcq func * [Refactor] improve code style.

…tes consistently (open-compass#26) * [Feature] Extracting common result file naming logic * [Refactor] Use single quotes consistently * [Refactor] Use single quotes consistently * [Refactor] Use single quotes consistently * [Refactor] Use single quotes consistently * [Refactor] Adopt copilot recommendations

* refactor VLM3R: remove folder `vlmeval/vlm/vlm3r/*`, and add `vlmeval/vlm/vlm3r.py` * remove unused import package * add the hyper-param in the init func instead of using a fixed value & mv the `VLM3R` into the `spatial_related_models` * add comments

* [Feature] Extracting common result file naming logic * [Refactor] Use single quotes consistently * [Refactor] Use single quotes consistently * [Refactor] Use single quotes consistently * [Refactor] Use single quotes consistently * [Feature] Add a caching mechanism for LLM evaluation * [Feature] Enable llm cache mechanism * [Fix] fix bugs according to copilot

* [Refactor] Refactor regex answer parsing and improve comments * [Feature] use last number instead of first to match na options * [Feature] support English number words in NA matcher * [Feature] add tools to build options from xlsx rows * [Feature] support llm matching for both mcq and vqa * [Feature] support parallel llm judge and support na llm extract * [Fix] fix coner case when options in mutliple lines * [Feature] add matching func factory * [Fix] llm extract fetching problem * [Refactor] improve func naming * [Feature] support LLM matching * [Refactor] extract common func * [Refactor] remove useless content * [Refactor] modify the content of the copilot check. * [Fix] construct type error * [Feature] determine result file name by judge model name * [Feature] add ERQA bench * [Refactor] Add task category and add llm eval * [Feature] add robospatialbench * [Feature] add refspatialbench * [Feature] enable three er bench * [Feature] Add er benchs * [Feature] Extracting common result file naming logic * [Refactor] Use single quotes consistently * [Refactor] Use single quotes consistently * [Refactor] Use single quotes consistently * [Refactor] Use single quotes consistently * [Feature] Add a caching mechanism for LLM evaluation * [Feature] Enable llm cache mechanism * [Feature] backup er bench update * [Feature] get ready for ERQA bench * [Feature] use point2dparser to get point coord and get ready for refspatial * [Feature] add point paeser for er benchs * [Feature] use point2dparser to get points and get ready for robospatial * [Feature] aligned with qwen3vl prompt format * [Feature] Add a unified interface to robospatia * [Feature] fix and refactor according to copilot * [Refactor] rename ERQA to ERQABench to align with other EASI added benchs * [Fix] 3DSR result fetch issue

* [Feature] Add SPBench * [Feature] Update dataset url to hf url * [Refactor] refactor according to copilot

* [Feature] Add MMSI-Video-Bench * [Refactor] refactor according to copilot * [Refactor] refactor according to copilot * [Fix] lmudata error

* [Feature] Add MMSI-Video-Bench * [Refactor] refactor according to copilot * [Refactor] refactor according to copilot * [Feature] Init commit on vsi super bench * [Feature] Add VsiSuperCount * [Refactor] refactor according to copilot * [Refactor] remove * import and enable flake8 check * [Refactor] refactor according to copilot * [Fix] lmudata error

)

…ring download (open-compass#36) * [Feature] support mmsi video sub bench scores and ignore video.zip while download * [Feature] print scores use 100 points format * [Feature] update tsv hf download path * [Feature] upgrade is_nan_or_none func

* [Feature] Init commit for STI-Bench * [Feature] update stibench tsv hf download path * [Refactor] upgrade acccording to copilot

* [Feature] Init commit for STI-Bench * [Feature] add sensenova-si latest models * [Feature] update stibench tsv hf download path * [Refactor] upgrade acccording to copilot

* [Feature] Init commit for STI-Bench * [Feature] add sensenova-si latest models * [Feature] update stibench tsv hf download path * [Refactor] upgrade acccording to copilot * [Feature] Init commit on DSR bench * [Feature] modify download func to download video data from easi hf dir * [Refactor] refactor according to copilot * [Perf] improve save video frames efficiency

…d_eriq [Benchmark] Add ERIQ bench

PeterWangyi · 2026-03-02T09:32:07Z

@mzr1996 @zhulinJulia24
Thank you both for your help. I have rebase the code, and CI has passed completely.
Ready for re-review~

PeterWangyi and others added 24 commits March 2, 2026 06:29

[Fix] SiteImage tsv download url and remove load_in_8bit param for co…

c8b5700

…mpatibility with latest transformers (open-compass#8) * [Fix] siteimage wrong url * [Fix] transformer do not have load_in_8bit param in current version

[Feature] add sensenova-si-v11 series models (open-compass#14)

f308f5c

[Fix] Fix data path structure due to mindcube hf update (open-compass#17

be7e3a9

)

[Feature] Add Vsi-Debiased subset (open-compass#12)

424ecd8

PeterWangyi and others added 18 commits March 2, 2026 07:24

[Benchmark] Add SPBench (open-compass#30)

ba05094

* [Feature] Add SPBench * [Feature] Update dataset url to hf url * [Refactor] refactor according to copilot

[Benchmark] Add MMSI-Video-Bench (open-compass#31)

d70e7a9

* [Feature] Add MMSI-Video-Bench * [Refactor] refactor according to copilot * [Refactor] refactor according to copilot * [Fix] lmudata error

[Fix] fix vsi-bench video path update issue

8e7efa4

[Refactor] Add references for all EASI-added benchmarks (open-compass#35

8c71ef1

)

[Benchmark] Add STI-Bench (open-compass#37)

93e591c

* [Feature] Init commit for STI-Bench * [Feature] update stibench tsv hf download path * [Refactor] upgrade acccording to copilot

[Model] Add SenseNova-SI latest models (open-compass#38)

8dcbf3a

* [Feature] Init commit for STI-Bench * [Feature] add sensenova-si latest models * [Feature] update stibench tsv hf download path * [Refactor] upgrade acccording to copilot

[Fix] vsi bench video llm load path error (open-compass#41)

1ea7083

[Fix] restore video datasets and separate EASI lists

6689f1c

[Fix] align SiteBench with EASI

bb1b399

[Chore] align bench/model code with EASI

0b75105

[Chore] exclude image_base/image_mcq from big PR

335b322

[Chore] exclude internvl_chat and video_base from big PR

ae41bd3

Merge pull request open-compass#42 from EvolvingLMMs-Lab/dev/peter/ad…

eb68712

…d_eriq [Benchmark] Add ERIQ bench

[Refactor] note mmsi_video deprecated and point to mmsibench

4cc4efe

deps: add num2words

d62d7dd

PeterWangyi force-pushed the pr-easi-big branch from 90698e3 to d62d7dd Compare March 2, 2026 08:01

PeterWangyi added 9 commits March 2, 2026 08:06

chore: drop benchmark_verify and torchrun

d9dbd48

fix: restore image_base and image_mcq to oc main states

e7b3f46

fix: align image_mcq with oc main

f9d5b8c

fix: align model files with oc main

c83e8e6

[Refactor] rm useless vsi import

6114a3a

[Refactor] put easi code together

69914ff

[Refactor] put easi code together

5f64332

fix: align video_base with oc main

4880344

fix: use video_vsi_dataset in video config

e4f574f

mzr1996 merged commit ef83c66 into open-compass:main Mar 4, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EASI] merge EASI added benchmarks and models into VLMEvalKit#1433

[EASI] merge EASI added benchmarks and models into VLMEvalKit#1433
mzr1996 merged 51 commits intoopen-compass:mainfrom
PeterWangyi:pr-easi-big

PeterWangyi commented Feb 5, 2026 •

edited

Loading

Uh oh!

PeterWangyi commented Feb 11, 2026

Uh oh!

PeterWangyi commented Feb 25, 2026 •

edited

Loading

Uh oh!

mzr1996 commented Feb 26, 2026

Uh oh!

PeterWangyi commented Feb 26, 2026

Uh oh!

mzr1996 commented Feb 27, 2026

Uh oh!

zhulinJulia24 commented Mar 2, 2026

Uh oh!

PeterWangyi commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

PeterWangyi commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PeterWangyi commented Feb 11, 2026

Uh oh!

PeterWangyi commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mzr1996 commented Feb 26, 2026

Uh oh!

PeterWangyi commented Feb 26, 2026

Uh oh!

mzr1996 commented Feb 27, 2026

Uh oh!

zhulinJulia24 commented Mar 2, 2026

Uh oh!

PeterWangyi commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PeterWangyi commented Feb 5, 2026 •

edited

Loading

PeterWangyi commented Feb 25, 2026 •

edited

Loading