aeon-toolkit · TonyBagnall · Mar 25, 2026 · Mar 23, 2026 · Mar 23, 2026 · Mar 23, 2026
diff --git a/README.md b/README.md
@@ -7,7 +7,27 @@
 </table>
 
 <p align="center">
-  <strong>The archive and benchmark repository for multivariate time series classification.</strong>
+  <strong>The Multiverse archive for multivariate time series classification.</strong>
+
+The **Multiverse** is an expanded archive for multivariate time series classification 
+(MTSC), together with supporting code, metadata, and benchmark results. It consolidates 
+datasets from the original UEA MTSC archive, newer MTSC collections, donated 
+standalone datasets, and associated benchmark results into a single open repository.
+
+The current paper version describes:
+
+- 133 unique MTSC problems
+- 147 released datasets when preprocessing variants are included
+- a curated 66 dataset subset, **Multiverse-core (MV-core)**, for algorithm benchmarking
+
+This repository aims to make it easier to:
+
+- load Multiverse datasets through `aeon`
+- inspect archive metadata and dataset variants
+- reproduce baseline benchmark results
+- compare against published and recreated results
+- contribute new results, metadata, and documentation as the archive evolves
+
 </p>
 
 <p align="center">
@@ -24,34 +44,36 @@
   <a href="docs/contributing.md">Contributing</a>
 </p>
 
-**The Multiverse** is a new archive of multivariate time series classification datasets.
-This repository is for accessing, benchmarking, and extending this new archive.
+## Installation
 
-It brings together datasets, published results, reproducible evaluation workflows, and leaderboard infrastructure in one place. The aim is to make it easier to:
+You can install from pip, 
 
-- access the <a href="docs/datasets.md"> multiverse </a>, a collection of benchmark 
-  datasets for  multivariate time series 
-  classification,
-- explore and compare against <a href="docs/results.md">published results</a> of 
-  classification algorithms,
-- reproduce baseline <a href="docs/experiments.md"> experiments</a>,
-- evaluate <a href="docs/classifiers.md">new classifiers consistently</a>,
-- and <a href="docs/contributing.md">contribute</a> new algorithms and results back to 
-  the archive.
 
-This repository is intended as both a practical resource for researchers and a 
-public record of benchmark results.
+```bash
+git clone https://github.com/aeon-toolkit/multiverse.git
+cd multiverse
+pip install -e .
+```
 
----
-### Top of the league
+but at present, the best route is to install from source, since it is changing rapidly.
 
-Places 1 to 5 by ranks
+This repository depends on `aeon` and uses the `aeon` dataset loading interface as 
+the main public API for archive access.
 
-Further information and more extensive leaderboard views linked here:
+## Quick start
 
-- [`docs/leaderboard.md`](leaderboards/leaderboard.md)
 
-## Install package
+At present, the safest route is to install from source.
+
+```bash
+git clone https://github.com/aeon-toolkit/multiverse.git
+cd multiverse
+pip install -e .
+```
+
+This repository depends on `aeon` and uses the `aeon` dataset loading interface as the main public API for archive access.
+
+## Quick start
 
 Install the release package from PyPI:
 
@@ -70,18 +92,18 @@ Use ``aeon`` to download data from zenodo and load into memory.
 
 ```python
 from aeon.datasets import load_classification
+
 X, y = load_classification("BasicMotions")
-print(X.shape)
-print(y[:10])
-trainX, trainy = load_classification("BasicMotions", split="train")
-testX, testy = load_classification("BasicMotions", split="test")
+train_X, train_y = load_classification("BasicMotions", split="train")
+test_X, test_y = load_classification("BasicMotions", split="test")
 
+print(X.shape)
 ```
 
-More info and links to code - [`docs/leaderboard.md`](docs/leaderboard.md)
+More info and links to code - [`docs/datasets.md`](docs/datasets.md)
 
 ### Train and test a classifier
-Train and test any aeon classifier that can 
+
 ```python
 from aeon.classification.deep_learning import InceptionTimeClassifier
 from multiverse.classification import TimesNet
@@ -97,9 +119,9 @@ Multiverse ported classifiers - [`multiverse/classification`](multiverse/classif
 Load results directly in code
 ```python
 from aeon.classification.deep_learning import InceptionTimeClassifier
+
 ```
-Or explore published results explored in this repo - [`docs/results.md`]
-(docs/results.md)
+Or explore published results explored in this repo - [`docs/results.md`](docs/results.md)
 
 ### Run an experiment
 To reproduce a benchmark run or evaluate a new classifier, start from:

diff --git a/docs/datasets.md b/docs/datasets.md
@@ -69,14 +69,9 @@ download_archive(archive="UEA", extract_path="C:\\Temp\\")
 
 ```
 Currently should be one of "EEG","UCR","UEA","Imbalanced","TSR", "Unequal". See 
-``aeon`` documentation for more details. 
-There are lists of datasets in aeon and a dictionary of all zenodo keys.
+``aeon`` documentation for more details.  There are lists of datasets in aeon and a 
+dictionary of all zenodo keys.
 
 ```python
-
-from aeon.datasets.tsc_datasets import multiverse_core, multiverse2026, eeg2026
-print(len(multiverse_core)) # 66
-print(len(multiverse2026)) # 133
-print(len(eeg2026))  # 28
-
+from aeon.datasets.tsc_datasets import multiverse_core, multiverse2026, eeg2026, tsc_zenodo
 ```
diff --git a/docs/evaluation.md b/docs/evaluation.md
@@ -1,7 +1,11 @@
-# Evaluation protocol
+# Experimental Protocols
+
+There are many variations on how people structure experiments and a range of metrics 
+used in comparison. 
+
+The results we present are for the moment the simplest: we do all 
+training/validation on the default train split, and evaluate once on the test set.
+
+
+There are alternatives: we could perform stratified resamples or cross validate. 
 
-This is a placeholder. Decide and document:
-- default train-test split vs resampling policy
-- metrics (accuracy, balanced accuracy, macro F1, etc)
-- time and memory measurement policy
-- hyperparameter policy (global, per-dataset, tuned)
diff --git a/docs/index.md b/docs/index.md
diff --git a/docs/leaderboard.md b/docs/leaderboard.md
@@ -1,7 +1,13 @@
 # Leaderboards
 
 The leaderboards can be interactively generated on the WEBSITE. These are some 
-illustrative static leaderboards. 
+illustrative static leaderboards ranked on classification accuracy. We will embed 
+the interactive version and update this dynamic in time. 
+
+## Multiverse
+
+## Multiverse-core
+
 
 ## Multiverse-core
 
@@ -12,60 +18,6 @@ The EEG archive is a collection of EEG classification problems, described in [1]
 release, it contains 30 datasets. Two of these are univariate and two are not 
 available on zenodo. The resulting list is contained in the multiverse
 
-eeg = [
-    "Alzheimers",
-    "Blink",
-    "ButtonPress",
-    "EpilepticSeizures",
-    "EyesOpenShut",
-    "FaceDetection",
-    "FeedbackButton",
-    "FeetHands",
-    "FingerMovements",
-    "HandMovementDirection",
-    "ImaginedFeetHands",
-    "ImaginedOpenCloseFist",
-    "InnerSpeech",
-    "LongIntervalTask",
-    "LowCost",
-    "MatchingPennies",
-    "MindReading",
-    "MotorImagery",
-    "OpenCloseFist",
-    "PhotoStimulation",
-    "PronouncedSpeech",
-    "SelfRegulationSCP1",
-    "SelfRegulationSCP2",
-    "ShortIntervalTask",
-    "SitStand",
-    "Sleep",
-    "SongFamiliarity",
-    "VisualSpeech",
-]
-
-
-
-We currently have results for the train/test splits with the following classifiers.
-
-all_classifiers  = [
-    "Arsenal",
-    "CNN",
-    "CSP-SVM",
-    "DrCIF",
-    "HC2",
-    "IT",
-    "MRHydra",
-    "R-KNN",
-    "R-MDM",
-    "STC",
-    "SVM",
-    "TDE",
-]
-
-See the paper and aeon-neuro for details of these classifier. The overall accuracy 
-picture is
-
-
 
 ## UEA archive
 

diff --git a/docs/licences.md b/docs/licences.md
diff --git a/docs/mtsc_registry.csv b/docs/mtsc_registry.csv
diff --git a/docs/results.md b/docs/results.md
@@ -1,8 +1,68 @@
-## Results for multiverse datasets
+# Classifier Results
 
-Results used in bake offs are available on timeseriesclassification.com and 
-obtainable in code with aeon.
+Results used in past bake offs are available on [tsc.com]
+(https://timeseriesclassification.com) and 
+obtainable in code with [aeon](https://github.com/aeon-toolkit/aeon/blob/main/aeon/benchmarking/results_loaders.py).
 
-You can explore and download these results interactively on the COMING SOON website.
+```python
+
+from aeon.benchmarking.results_loaders import get_available_estimators
+cls = get_available_estimators("Classification")  # doctest: +SKIP
+from aeon.benchmarking.results_loaders import get_estimator_results
+cls = ["HC2"]  # doctest: +SKIP
+data = ["Chinatown", "Adiac"]  # doctest: +SKIP
+get_estimator_results(estimators=cls, datasets=data) # doctest: +SKIP
+```
+
+We currently store the multiverse results in the results directory. Currently 
+only have accuracy for the default splits for subsets of the multiverse. This is 
+still a work in progress. You will soone be able to explore and download these results 
+interactively on the[multiverse website](COMING SOON).
+
+The dataset lists are 
+
+```python
+
+from aeon.datasets.tsc_datasets import multiverse_core, multiverse2026, eeg2026
+print(len(multiverse_core)) # 66
+print(len(multiverse2026)) # 133
+print(len(eeg2026))  # 28
+
+```
+
+### The Full Multiverse, 2026
+
+The full multiverse has 133 datasets in it. We have results for 17 classifiers on 
+some subset of these problems. 
+
+```python
+from pathlib import Path
+import pandas as pd
+
+# Run this from the repository root
+df = pd.read_csv(Path("results") / "multiverse" / "accuracy_mean.csv")
+print(df.head())
+```
+## The Multiverse-core (M-core)
+
+We specify a subset of 66 datasets for evaluation. These are more balanced in 
+application, remove overly similar, too simple or zero information datasets and 
+have a good distribution in size and length.
+
+```python
+df = pd.read_csv(Path("results") / "multiverse_core" / "accuracy_mean.csv")
+print(df.shape)
+```
+
+
+## The EEG Classification archive, 2026
+
+The EEG archive is a sub-project meant to benchmark EEG classification algorithms. 
+The project is based around [aeon-neuro](https://github.com/aeon-toolkit/aeon-neuro)
+
+
+```python
+df = pd.read_csv(Path("results") / "eeg" / "accuracy_mean.csv")
+print(df.shape)
+```
 
-Raw spreadsheets are in the results directory.
diff --git a/...all_eval_accuracy_critical_difference.pdf → ...accuracy_accuracy_critical_difference.pdf b/...all_eval_accuracy_critical_difference.pdf → ...accuracy_accuracy_critical_difference.pdf
diff --git a/img/all_eval_logloss_critical_difference.pdf → ...rse_eval_accuracy_critical_difference.pdf b/img/all_eval_logloss_critical_difference.pdf → ...rse_eval_accuracy_critical_difference.pdf
diff --git a/pyproject.toml b/pyproject.toml
@@ -48,7 +48,7 @@ dependencies = [
     "pandas>=2.0",
     "requests>=2.31",
     "tqdm>=4.66",
-    "aeon>=1.4",
+    "aeon>=1.4.0",
     "tsml_eval>=0.0",
     "aeon_neuro>=0.1",
 ]