Coding Data Stream

Enhance model performance on coding tasks with agentic coding demonstrations for code generation, optimization, debugging, and cross-language support.

Overview

Diverse, high-quality coding tasks

Scale’s expert contributors create rich datasets for model training, demonstrating ideal assistant responses for realistic but challenging programming tasks. Explore examples from the Coding Data Stream below.

#1: Multi-turn coding assistance

In this dialog, a user generates and incrementally refines code to simulate the properties of an enclosed gas system, including code generation, visualization, and answering user questions about the generated code.

View Example →

#2: Compute a modified Levenshtein distance

Write a function to compute a modified Levenshtein distance between two strings. The distance calculation should incorporate a penalty based on the absolute alphabetical difference between any swapped letters, as well as the alphabetical cost of any deletions or insertions. The cost of a deletion or insertion is determined by the alphabetical index of the deleted or inserted character.

View Example →

#3: Sort an array of integers representing colors

Write a function to sort an array of color codes, where the integers 0, 1, and 2 represent the colors red, white, and blue, respectively. The function should rearrange the elements so that objects of the same color are adjacent, with the order being red, white, and blue. The function should be able to handle empty input arrays and invalid colors. If the array contains only one unique color code, the function should return the array as is.

View Example →