Skip to content

Preprocessing ERE #16

@zhou6140919

Description

@zhou6140919

I ran the script preprocessing/process_ere.py and I discovered that the amount of sentences in train.w1.oneie.json (12977) is not as same as the paper claimed (14736). And of course, I cannot reproduce the F1 score result on the ERE-EN dataset.

So I looked into this script and in line 1336, it just ignored all the data in dataset 'normal'. However, if I changed to os.path.join(input_dir, 'source', 'cmptxt', '*', '*.txt')). An error occurs when processing this line entity.char_offsets_to_token_offsets(tokens), only a few docs. Ignoring all errors, I got 18895, but still not the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions