Tackling algorithmic bias: De-biasing CoNLL 2003 Dataset
Balancing CoNLL 2003
We investigated the CoNLL-2003 dataset—a standard for building algorithms that recognize named entities in text—and found that the data is highly skewed toward male names. Download the Named Entity Recognition annotations for the original CoNLL-2003 dataset and our augmentation of it to reduce bias in your models.