Preprocessing ERE

I ran the script `preprocessing/process_ere.py` and I discovered that the amount of sentences in `train.w1.oneie.json` (12977) is not as same as the paper claimed (14736). And of course, I cannot reproduce the F1 score result on the ERE-EN dataset.

So I looked into this script and in line 1336, it just ignored all the data in dataset 'normal'. However, if I changed to `os.path.join(input_dir, 'source', 'cmptxt', '*', '*.txt'))`. An error occurs when processing this line `entity.char_offsets_to_token_offsets(tokens)`, only a few docs. Ignoring all errors, I got 18895, but still not the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing ERE #16

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Preprocessing ERE #16

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions