Tackling algorithmic bias: De-biasing CoNLL 2003 Dataset

Balancing CoNLL 2003

Balancing CoNLL 2003 - Hero image

We investigated the CoNLL-2003 dataset—a standard for building algorithms that recognize named entities in text—and found that the data is highly skewed toward male names. Download the Named Entity Recognition annotations for the original CoNLL-2003 dataset and our augmentation of it to reduce bias in your models.

