Skip to content

evaluation for duplicated answer choices #117

@KonstantinHebenstreit

Description

@KonstantinHebenstreit

In datasets are sometimes examples with 4 or 5 answer choices. I think what has been done is just to duplicate one of the answer choices to always have 5 choices.
The evaluation script does not include this option.
Since we put letters in front of the choices (A,B,C,D,E), the model can also answer with a letter. But if the right choice it as two places it has two letters. This can lead to wrong evaluation scores based on the letters.

First example is commonsense_qa, but there might be others.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions