Products
Scale RapidThe fastest way to production-quality labels.
Scale StudioLabeling infrastructure for your workforce.
Scale 3D Sensor FusionAdvanced annotations for LiDAR + RADAR data.
Scale ImageComprehensive annotations for images.
Scale VideoScalable annotations for video data.
Scale TextSophisticated annotations for text-based data.
Scale AudioAudio Annotation and Speech Annotation for NLP.
Scale MappingThe flexible solution to develop your own maps.
Scale CatalogCreate, enrich, and enhance eCommerce data.
Scale Enterprise AIModels to support your business use cases.
Scale NucleusThe mission control for your data
Scale LaunchShip and track your models in production
Scale Content UnderstandingManage content for better user experiences
Scale InstantMLNext-day machine learning models, without ML expertise
Scale SpellbookThe platform for large language model apps
Scale SyntheticGenerate synthetic data
Solutions
Retail & eCommerce
Defense
Logistics
Autonomous Vehicles
Robotics
AR/VR
Content & Language
RLHF
Smart Port Lab
Federal LLMs
Resources
Resource Library
Blog
Events
Open Datasets
Interviews
Documentation
Guides
Customers
Pricing
Conference
AI Readiness Report 2022
Company
Debagreement: Reddit 50K
Open-source agreement / disagreement dataset of Reddit interactions developed in partnership with Oxford University.
Original Reddit Interactions
The original comments and subsequent interactions scraped from Reddit with no assigned labels.
Labeled Dataset: Full Agreement
Dataset where all annotators achieved full agreement and pairs have been checked by authors of the paper.
Labeled Dataset: 2/3 Agreement
Dataset where there is 2/3 inter-annotator agreement or above.
Labeled Dataset: Challenge
Dataset where consensus was not reached among annotators, or pairs were labeled as “unsure.”
42,894
annotations
4
classes*
The pushshift.io Reddit API was designed and created by the /r/datasets mod team to help provide enhanced functionality
and search capabilities for searching Reddit comments and submissions. This RESTful API gives full functionality for searching Reddit data and also includes the capability of creating powerful
data aggregations. With this API, you can quickly find the data
that you are interested in and find fascinating correlations.
Agree
The reply shows approval towards the initial statement or initial author.
Some examples of phrases that express agreement are: “I agree”,
“That’s absolutely right”, “That’s so true”, “Exactly”, “That’s how I feel”, etc.
Neutral
There is a topical exchange between authors but agreement / disagreement is not expressed.
Disagree
The reply shows disapproval or dislike towards the initial statement or initial author.
Some examples of phrases that express disagreement are: “No way”,
“I don’t think so”, “Not necessarily”, “That’s not always true”, etc.
Unsure
It is not possible to make a decision based on the information at hand.

dev kit
Get Started with Oxford Dataset
Read the full Research Paper. If you use the dataset, please make sure to cite our paper.