Introducing: Scale’s Large Language Model Toolkit

byon March 22, 2023

Getting data to train and evaluate your large language models has never been more Rapid! Introducing…Scale’s Large Language Model Toolkit! 

We’ve developed expedited workflows to help you set up projects within Rapid, Scale’s self-serve data labeling tool. Sign up today to try it out and get $20 in free credits.

After signing up, you’ll see all of our available project templates and our “LLM Training Toolkit” front and center. We’ve divided our LLM toolkit into three buckets according to different workflows: 

  1. Demonstration Data: Collect human-generated text over various domains and use cases

  2. Comparison Data: Evaluate your model outputs with human feedback

  3. Trust & Safety: Align your model to produce factually correct and safe content 

Demonstration Data 

We have two subcategories of project types under the “Demonstration Data” bucket. 

(1) InstructGPT-style presets: Each of these comes with expert-written instructions, a preset taxonomy with the ability to add a customization section for your specific use case, the ability to upload your own prompts or have Scale create them for you, and a pre-trained workforce that has significant experience in these types of prompts. Check out Section 3.4 of the InstructGPT paper for yourself! 😉

(2) Demonstration corrections: If you have demonstrations or model outputs that need corrections, we can also assist you. Just upload your data or hook up your model endpoint and setup your taxonomy accordingly so that our labelers can get that done for you quickly. 

Comparison Data 

With our workflows, you can easily customize comparison annotation projects and quickly scale to production. You can ask a labeler to rank, rate on a Likert scale, and classify model outputs. You can either upload these model outputs or configure a model endpoint for labelers to interact with as they are making the comparisons. 

Trust & Safety 

Aligning your models with Rapid speed has never been easier. With our expertise and specialized workforce, we can work with you to identify problematic model output. 

We support several use cases, including Grounded Generation, Red Teaming, Completion Citation, and Research-Based Correction. 

  • Grounded Generation: Ask a labeler to research or read certain sources to verify uploaded model output.

  • Red Teaming*: Provide an endpoint for labelers to chat with and elicit unsafe model outputs. 

  • Completion Citation*: Ask a labeler to attribute various phrases to sources that the information came from. 

  • Research-Based Correction*: Upload model output for labelers to find better sources for, edit/rewrite completions given any linked sources, add/edit/remove missing citations, etc.   

*Note: Use cases marked with * are currently not available via our self-serve platform. If you are interested, please contact sales to learn more. 

For our Grounded Generation use case, we offer two self-serve, popular verification methods.

(1) Multi-class Classification: This involves selecting all error classes that pertain to the text via confidence-based consensus. 

(2) Named Entity Recognition:  This involves highlighting spans of erroneous text and indicating the relevant error types for each span.

And this is just the beginning! Our toolkit is constantly expanding, and we are excited to continuously be solving some of the world’s hardest data annotation problems. Not seeing what you’re looking for in the toolkit today or have other questions? Email us at:

The future of your industry starts here.