<- Back to research

DEBAGREEMENT: A comment-reply dataset for (dis)agreement detection in online debates

John Pougué-Biyong*, Valentina Semenova*, Alexandre Matton, Rachel Han, Aerin Kim, Renaud Lambiotte, J. Doyne Farmer

*Equal contribution

In this paper, we introduce DEBAGREEMENT, a dataset of 42,894 commentreply pairs from the popular discussion website reddit, annotated with agree, neutral or disagree labels. We collect data from five forums on reddit: r/BlackLivesMatter, r/Brexit, r/climate, r/democrats, r/Republican. For each forum, we select comment pairs such that they form altogether a user interaction graph. DEBAGREEMENT presents a challenge for Natural Language Processing (NLP) systems, as it contains slang, sarcasm and topic-specific jokes, often present in online exchanges. We evaluate the performance of state-of-the-art language models on a (dis)agreement detection task, and investigate the use of contextual information available (graph, authorship, and temporal information). Since recent research has shown that context, such as social context or knowledge graph information, enables language models to better perform on downstream NLP tasks, DEBAGREEMENT provides novel opportunities for combining graph-based and text-based machine learning techniques to detect (dis)agreements online.

Bibtex Citation

@inproceedings{
pougu{\'e}-biyong2021debagreement,
title={{DEBAGREEMENT}: A comment-reply dataset for (dis)agreement detection in online debates},
author={John Pougu{\'e}-Biyong and Valentina Semenova and Alexandre Matton and Rachel Han and Aerin Kim and Renaud Lambiotte and Doyne Farmer},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
year={2021},
url={https://openreview.net/forum?id=udVUN__gFO}
}