Scale AI logo

Scale AI logo
Book a Demo→
Log In
SFT Data Streams
#
1

SFT 
Data Streams

Coding

Coding Data Stream

Enhance model performance on coding tasks with agentic coding demonstrations for code generation, optimization, debugging, and cross-language support.

Overview

Diverse, high-quality coding tasks

Scale’s expert contributors create rich datasets for model training, demonstrating ideal assistant responses for realistic but challenging programming tasks. Explore examples from the Coding Data Stream below.

#1: Multi-turn coding assistance

Prompt 1

I’m doing a simulation of a gas in a two dimensional box. My program keeps track of the movement of each gas particle and returns a 3-dimensional list. The first index of the list corresponds with different times. The other two indices arrange in columns the particles’ x coordinates, y coordinates, velocities in the x direction and velocities in the y direction, in that order. I want to verify if my simulation is conservative. Write a python function called `isConserv` to do that. The function must take the list that my simulation throws and compute the total energy of the system. Kinetic and potential energy will be calculated taking the masses equal to 1 and an interaction between the particles with potential U=4(1/r^12 - 1/r^6), being “r” the distance between particles. The function will consider that the simulation is conservative, and hence return true, if the standard deviation of the total energy is less than 0.1. Otherwise, the function will return false.

Response 1

Here you have the `isConserv` function:

```python
import math
import statistics

# Evaluates if the input dataset results from a conservative
# algorithm. Returns true if the standard deviation of the
# total energy is less than 0.1.
def isConserv(data):
    Total_Energy = [] # List to store the total energy at each time.

    # Ranges the times.
    for i in range(len(data)):
        energy = 0
       
        # Ranges the particles.
        for j in range(len(data[i])):
            # Adds the kinetic energy of the current particle.
            energy += 0.5 * (data[i][j][2] ** 2 + data[i][j][3] ** 2)
           
            # Adds the potential energy between the current and the following particles.
            for k in range(j + 1, len(data[i])):
                r = math.sqrt((data[i][j][0] - data[i][k][0]) ** 2 + (data[i][j][1] - data[i][k][1]) ** 2)
                energy += 4 * (1 / r ** 12 - 1 / r ** 6)
       
        # Adds the total energy at time i.
        Total_Energy.append(energy)

    # Returns true if the standard deviation is less than 0.1.
    return (statistics.stdev(Total_Energy) < 0.1)
```

Feel free to ask for more code.

Prompt 2 Category - Introduce new parameters to the input

Prompt 2

I want to visualize how the total energy changes. Plot the total energy as a function of time. Add legends to the axes. The horizontal axis should show only integer times.

This function includes the plot you described:

```python
import math
import statistics
import matplotlib.pyplot as plt

# Evaluates if the input dataset results from a conservative
# algorithm. Plots the total energy as a function of time.
# Returns true if the standard deviation of the total energy
# is less than 0.1.
def isConserv(data):
    Total_Energy = [] # List to store the total energy at each time.

    # Ranges the times.
    for i in range(len(data)):
        energy = 0
       
        # Ranges the particles.
        for j in range(len(data[i])):
            # Adds the kinetic energy of the current particle.
            energy += 0.5 * (data[i][j][2] ** 2 + data[i][j][3] ** 2)
           
            # Adds the potential energy between the current and the following particles.
            for k in range(j + 1, len(data[i])):
                r = math.sqrt((data[i][j][0] - data[i][k][0]) ** 2 + (data[i][j][1] - data[i][k][1]) ** 2)
                energy += 4 * (1 / r ** 12 - 1 / r ** 6)
       
        # Adds the total energy at time i.
        Total_Energy.append(energy)
   
    # Plots the total energy as a function of time.
    plt.plot(list(range(len(Total_Energy))), Total_Energy)
    plt.xlabel('Time')
    plt.ylabel('Total energy')
    plt.xticks(list(range(len(Total_Energy))))
    plt.show()

    # Returns true if the standard deviation is less than 0.1.
    return (statistics.stdev(Total_Energy) < 0.1)
```

You can use the same code snippet I gave you before to see the changes. Reach me if you need further assistance.

Prompt 3

Add or remove functionality

You are right. This new version of the function admits an argument to compare with the standard deviation of the total energy:

```python
import math
import statistics
import matplotlib.pyplot as plt

# Evaluates if the input dataset results from a conservative
# algorithm. Plots the total energy as a function of time.
# Returns true if the standard deviation of the total energy
# is less than the max_dev argument.
def isConserv(data, max_dev = 0.1):
    Total_Energy = [] # List to store the total energy at each time.

    # Ranges the times.
    for i in range(len(data)):
        energy = 0
       
        # Ranges the particles.
        for j in range(len(data[i])):
            # Adds the kinetic energy of the current particle.
            energy += 0.5 * (data[i][j][2] ** 2 + data[i][j][3] ** 2)
           
            # Adds the potential energy between the current and the following particles.
            for k in range(j + 1, len(data[i])):
                r = math.sqrt((data[i][j][0] - data[i][k][0]) ** 2 + (data[i][j][1] - data[i][k][1]) ** 2)
                energy += 4 * (1 / r ** 12 - 1 / r ** 6)
       
        # Adds the total energy at time i.
        Total_Energy.append(energy)
   
    # Plots the total energy as a function of time.
    plt.plot(list(range(len(Total_Energy))), Total_Energy)
    plt.xlabel('Time')
    plt.ylabel('Total energy')
    plt.xticks(list(range(len(Total_Energy))))
    plt.show()

    # Returns true if the standard deviation is less than max_dev.
    return (statistics.stdev(Total_Energy) < max_dev)
```

You can use this piece of code to check the new feature:

```python
data = [[[2,1,1.1,0.2],
        [3,4,2.3,1.5],
        [1,1,0.8,2.0],
        [3,1,1.4,1.1],
        [5,0,2.1,0.1]],
        [[2,1,1.0,0.3],
        [3,4,2.3,1.4],
        [1,1,1,1.9],
        [3,1,1.3,1.1],
        [5,0,2.2,0]],
        [[2,1,1.0,0.4],
        [3,4,2.2,1.3],
        [1,1,0.9,1.8],
        [3,1,1.1,1.2],
        [5,0,2.2,0.2]]]

print(isConserv(data, 1))
```

Please, let me know if you have any questions.

Prompt 4 Category - Introducing new constraints

Prompt 4

I don’t understand why you range only the particles starting from j+1 in this part of the code:

for k in range(j + 1, len(data[i])):
                r = math.sqrt((data[i][j][0] - data[i][k][0]) ** 2 + (data[i][j][1] - data[i][k][1]) ** 2)
                energy += 4 * (1 / r ** 12 - 1 / r ** 6)

Please, explain that to me. Then, give me a version of the code where that loop ranges all the particles, making the corrections needed so it still calculates the total energy correctly.

Response 4

The potential energy of the system is given by interactions between pairs of particles. If you were to range all the particles in the loop you pointed out, then each pair of particles would be counted twice. Also, the interaction of a particle with itself is not defined, that’s why the for loop starts at j + 1.
The following is a way to correctly calculate the total energy considering all the particles in each loop. However, it is important to take into account that the number of operations more than doubles compared to the first version.

```python
import math
import statistics
import matplotlib.pyplot as plt

# Evaluates if the input dataset results from a conservative
# algorithm. Plots the total energy as a function of time.
# Returns true if the standard deviation of the total energy
# is less than the max_dev argument.
def isConserv(data, max_dev = 0.1):
    Total_Energy = [] # List to store the total energy at each time.

    # Ranges the times.
    for i in range(len(data)):
        K_energy = 0
        U_energy = 0
       
        # Ranges the particles.
        for j in range(len(data[i])):
            # Adds the kinetic energy of the current particle.
            K_energy += 0.5 * (data[i][j][2] ** 2 + data[i][j][3] ** 2)
           
            # Adds the potential energy between the current and all the other particles.
            for k in range(len(data[i])):
                # Doesn't consider the interaction of a particle with itself.
                if j == k:
                    continue
               
                r = math.sqrt((data[i][j][0] - data[i][k][0]) ** 2 + (data[i][j][1] - data[i][k][1]) ** 2)
                U_energy += 4 * (1 / r ** 12 - 1 / r ** 6)
       
        # Divides by 2 to avoid the double counting of the potential energy.
        U_energy = U_energy / 2
       
        # Adds the total energy at time i.
        Total_Energy.append(K_energy + U_energy)
   
    # Plots the total energy as a function of time.
    plt.plot(list(range(len(Total_Energy))), Total_Energy)
    plt.xlabel('Time')
    plt.ylabel('Total energy')
    plt.xticks(list(range(len(Total_Energy))))
    plt.show()

    # Returns true if the standard deviation is less than max_dev.
    return (statistics.stdev(Total_Energy) < max_dev)
```

The previous usage example code is still applicable. Don’t hesitate to ask me if you have more doubts.

Prompt 5 Category - Asking to refactor the code for readability or performance

Prompt 5 

To have a better analysis of the plot, include in it the mean of the total energy and the mean plus and minus the maximum admissible standard deviation. Include legends to differentiate the lines. Change the vertical axis label to just “Energy”. Also, go back to the first more optimized way of calculating the total energy.

That’s a good way to analyze the plot. Here are the changes you suggested:

````python
import math
import statistics
import matplotlib.pyplot as plt

# Evaluates if the input dataset results from a conservative
# algorithm. Plots the total energy as a function of time.
# Returns true if the standard deviation of the total energy
# is less than the max_dev argument.
def isConserv(data, max_dev = 0.1):
    Total_Energy = [] # List to store the total energy at each time.

    # Ranges the times.
    for i in range(len(data)):
        energy = 0
       
        # Ranges the particles.
        for j in range(len(data[i])):
            # Adds the kinetic energy of the current particle.
            energy += 0.5 * (data[i][j][2] ** 2 + data[i][j][3] ** 2)
           
            # Adds the potential energy between the current and the following particles.
            for k in range(j + 1, len(data[i])):
                r = math.sqrt((data[i][j][0] - data[i][k][0]) ** 2 + (data[i][j][1] - data[i][k][1]) ** 2)
                energy += 4 * (1 / r ** 12 - 1 / r ** 6)
       
        # Adds the total energy at time i.
        Total_Energy.append(energy)
   
    mean = statistics.mean(Total_Energy)
   
    # Plots the total energy as a function of time.
    plt.plot(list(range(len(Total_Energy))), Total_Energy, label = ""Total energy"")
    # Plots a horizontal line with the value of the mean.
    plt.plot(list(range(len(Total_Energy))), [mean for i in range(len(Total_Energy))], label = ""Mean"")
    # Plots a horizontal line with the value of the mean plus the maximum admissible standard deviation.
    plt.plot(list(range(len(Total_Energy))), [mean + max_dev for i in range(len(Total_Energy))], label = ""Mean + maximum admissible deviation"")
    # Plots a horizontal line with the value of the mean minus the maximum admissible standard deviation.
    plt.plot(list(range(len(Total_Energy))), [mean - max_dev for i in range(len(Total_Energy))], label = ""Mean - maximum admissible deviation"")
    plt.xlabel('Time')
    plt.ylabel('Energy')
    plt.xticks(list(range(len(Total_Energy))))
    plt.legend()
    plt.show()

    # Returns true if the standard deviation is less than max_dev.
    return (statistics.stdev(Total_Energy) < max_dev)
```

Let me know if you want to make further updates to your function.

#2: Compute a modified Levenshtein distance

Prompt

Write a function to compute a modified Levenshtein distance between two strings. The distance calculation should incorporate a penalty based on the absolute alphabetical difference between any swapped letters, as well as the alphabetical cost of any deletions or insertions. The cost of a deletion or insertion is determined by the alphabetical index of the deleted or inserted character.

Response

def levenshtein_alpha_distance(str1, str2):
    m, n = len(str1), len(str2)
    alphabet = "abcdefghijklmnopqrstuvwxyz"
    cost_map = {char: i + 1 for i, char in enumerate(alphabet)}
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m + 1):
        for j in range(n + 1):
            if i == 0:
                dp[i][j] = sum(cost_map.get(ch, 0) for ch in str2[:j])
            elif j == 0:
                dp[i][j] = sum(cost_map.get(ch, 0) for ch in str1[:i])
            else:
                str1_cost = cost_map.get(str1[i - 1].lower(), 0)
                str2_cost = cost_map.get(str2[j - 1].lower(), 0)
                if str1[i - 1] == str2[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1]
                else:
                    substitution_cost = abs(str2_cost - str1_cost)
                    dp[i][j] = min(
                        dp[i - 1][j] + str1_cost,  # Deletion
                        dp[i][j - 1] + str2_cost,  # Insertion
                        dp[i - 1][j - 1] + substitution_cost  # Substitution
                    )
    return dp[m][n]

#3: Sort an array of integers representing colors

Prompt

Write a function to sort an array of color codes, where the integers 0, 1, and 2 represent the colors red, white, and blue, respectively. The function should rearrange the elements so that objects of the same color are adjacent, with the order being red, white, and blue. The function should be able to handle empty input arrays and invalid colors. If the array contains only one unique color code, the function should return the array as is.

Response

def arrange_colors(color_codes):
    # Filter for known color codes
    color_codes = [color for color in color_codes if color in [0, 1, 2]]
    # Handle empty case
    if not color_codes:
        return []
    # If the array contains only one unique color code, return it as is
    if len(set(color_codes)) == 1:
        return color_codes
    low, mid, high = 0, 0, len(color_codes) - 1
    # Sort codes
    while mid <= high:
        if color_codes[mid] == 0:
            color_codes[low], color_codes[mid] = color_codes[mid], color_codes[low]
            low += 1
            mid += 1
        elif color_codes[mid] == 1:
            mid += 1
        else:
            color_codes[mid], color_codes[high] = color_codes[high], color_codes[mid]
            high -= 1
    return color_codes

The future of your industry starts here

Book a Demo→
Build AI→
  • Products

    • Scale Data Engine
    • Scale GenAI Platform
    • Scale Donovan
    • Government

      • Public Sector
  • Company

    • About
    • Careers
    • Security
    • Terms
    • Privacy
    • Modern Slavery Statement
  • Resources

    • Blog
    • Contact Us
    • Customers
    • Events
    • Documentation
    • Guides
    • Community
    • Research
  • Guides

    • Data Labeling
    • ML Model Training
    • Diffusion Models
    • Guide to AI for eCommerce
    • Computer Vision Applications
    • Large Language Models
  • Follow Us

Copyright © 2026 Scale AI, Inc. All rights reserved.Terms of Use & Privacy Policy
  • Enterprise