Skip to content

Add "type" identifier to error rate serialization#120

Merged
thequilo merged 2 commits intomainfrom
wer_identifier
Sep 19, 2025
Merged

Add "type" identifier to error rate serialization#120
thequilo merged 2 commits intomainfrom
wer_identifier

Conversation

@thequilo
Copy link
Member

Adds a key "type" to serialized error rates. This fixes ambiguous cases where saved results from DI-cpWER and ORC-WER could not be discriminated, and makes identification easier from the files alone.

In [1]: from meeteval.wer.wer.orc import OrcErrorRate

In [2]: from meeteval.wer.wer.error_rate import ErrorRate

In [3]: er = OrcErrorRate.zero()

In [4]: serialized = er.asdict()

In [5]: serialized
Out[5]:
{'error_rate': 0,
 'errors': 0,
 'length': 0,
 'insertions': 0,
 'deletions': 0,
 'substitutions': 0,
 'reference_self_overlap': None,
 'hypothesis_self_overlap': None,
 'assignment': (),
 'type': 'orc-error-rate'}

In [6]: ErrorRate.from_dict(serialized)
Out[6]: OrcErrorRate(error_rate=0, errors=0, length=0, insertions=0, deletions=0, substitutions=0, assignment=())

The old heuristics for detecting the error rate type are kept for backwards compatibility.

Also fixes a few minor bugs.

@thequilo thequilo merged commit 31419d0 into main Sep 19, 2025
8 checks passed
@thequilo thequilo deleted the wer_identifier branch September 19, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant