Deploy Your Model

Two Types of Hosted Endpoints

There are two types of hosted endpoints associated with each Model trained on Rapid. Regular endpoints and a production endpoint. Upon completion of every training run, we automatically generated a regular endpoint that you can use for model inference via our API as outlined below. Additionally, each project has a single production endpoint. This endpoint, when enabled, has permanently active workers, which enables much lower latency on model inference. You can select any one model version to be used for production endpoint. You will also need to actively enable the endpoint in the UI. Note that using the production endpoint will incur hourly charges, while using regular endpoints incurs a charge per inference.

Try Your Model in the Browser

You can try the endpoint we generate for very model version using the Playground Tab tab that is available for each model version you have trained. Simply upload an image from your local machine and see the inference response of the respective model version appear. The inference result is available as an image as well as in JSON format.


model playground

Using a Hosted Endpoint via API

Beyond trying the model endpoint in the browser, the standard way to use a Scale hosted endpoint is via API request . This applies to both regular and the production endpoints, with the difference in response latency as outlined above.

We document here two different approaches: using the Scale Launch Python client or directly using CURL. Alternatively, most programming languages offer a programmatic way to query an API (e.g. requests in Python; axios in TypeScript).

The purpose of this guide is to enable you to complete an end to end integration test. As you iterate on the model, you can both use regular endpoints or continuously swap the model version in the production endpoint.

Step 1: Retrieve API Key

You can access this through your Rapid Account with this URL:
This is also found by clicking on your User Profile Icon in the Top Right of Rapid, then selecting "API Key"

Step 2a: Inference Using the Scale Launch Python Client

To get started with using your model on Python, you first want to install the client, this is available publicly on pypi.

!pip install scale-launch

Before we can run inference, you need to initialize the client with an API key, this should look like live_<token>

from launch import LaunchClient
client = LaunchClient(api_key=API_KEY)

To run inference, we just need a dictionary with a single field, image_url, and we set return_pickled to False since we want the response serialized and not pickled.

The response returns a payload with {results: <RESULT_PAYLOAD>, status: <STATUS_OF_JOB>} where results is a dictionary with a bitmask: {‘semantic_segmentation_mask’: <BITMASK>} where each value is a serialized bitmask of the image ([image_height, image_width]), where 0 represents False and 1 represents True for the given segmentation class. The semantic segmentation mask represents the entire mask of the object.

import numpy as np
from skimage import io
import matplotlib.pyplot as plt

args = {"image_url": "IMAGE_STORAGE_URL"}

# visualize original input image
original_image = io.imread(args['image_url']).astype(np.uint8)

# submit async task to the endpoint
task_id = client.async_request("endpoint_name", args=args, return_pickled=False)

# retrieve task results, this should only take at most a few seconds
import time
result = {}

while result.get("state", None) not in ["SUCCESS", "FAILURE"]:
    result = client.get_async_response(task_id)

Step 2b: Inference Using CURL

To use the endpoint, we want to create a REST request to the Scale server.

Submit Task

POST Request
JSON Payload: { args: { image_urls: [‘insert-image-url-here’, ‘insert-image-url-here’] } , return_pickled: false}
	username: API Key
	password: ‘’ (empty)
Response: task_id

The Response should return a task_id, which we then use to get the results.

Retrieve Results

GET Request
	username: API Key
	password: ‘’ (empty)
Response: {results: <RESULT_PAYLOAD>, state: <STATUS_OF_JOB>}

Endpoint Input and Outputs

  • images: [List of either public URL, raw image serialized into [H, W, 3], or s3 URL],
  • scale_s3_url = False (configure to True if you want it to read from internal scale bucket)

RESULT_PAYLOAD, if the job is complete, is a serialized List[dict] where each item in the List corresponds to a single image result. Each Dict contains the following items (depending on task type):

  • masks: List[Bitmask of image size dimensions where 0 represents no mask and 1 represents mask]
  • boxes: List[[X1, Y1, X2, Y2]] where each element in the list is a box with the two corners (x, y)
  • scores: scores for each prediction (either masks or boxes)
  • labels: label for each prediction (either masks or boxes)
Updated a year ago