Two Types of Hosted Endpoints
There are two types of hosted endpoints associated with each Model trained on Rapid. Regular endpoints
and a production endpoint
. Upon completion of every training run, we automatically generated a regular endpoint that you can use for model inference via our API as outlined below. Additionally, each project has a single production endpoint
. This endpoint, when enabled, has permanently active workers, which enables much lower latency on model inference. You can select any one model version to be used for production endpoint
. You will also need to actively enable the endpoint in the UI. Note that using the production endpoint
will incur hourly charges, while using regular endpoints
incurs a charge per inference.
Try Your Model in the Browser
You can try the endpoint we generate for very model version using the Playground Tab
tab that is available for each model version you have trained. Simply upload an image from your local machine and see the inference response of the respective model version appear. The inference result is available as an image as well as in JSON format.

model playground
Using a Hosted Endpoint via API
Beyond trying the model endpoint in the browser, the standard way to use a Scale hosted endpoint is via API request . This applies to both regular and the production endpoints, with the difference in response latency as outlined above.
We document here two different approaches: using the Scale Launch Python client or directly using CURL. Alternatively, most programming languages offer a programmatic way to query an API (e.g. requests in Python; axios in TypeScript).
The purpose of this guide is to enable you to complete an end to end integration test. As you iterate on the model, you can both use regular endpoints or continuously swap the model version in the production endpoint.
Step 1: Retrieve API Key
You can access this through your Rapid Account with this URL: https://dashboard.scale.com/rapid/settings/apikey
This is also found by clicking on your User Profile Icon in the Top Right of Rapid, then selecting "API Key"
Step 2a: Inference Using the Scale Launch Python Client
To get started with using your model on Python, you first want to install the client, this is available publicly on pypi
.
!pip install scale-launch
Before we can run inference, you need to initialize the client with an API key, this should look like live_<token>
API_KEY = 'YOUR_API_KEY'
from launch import LaunchClient
client = LaunchClient(api_key=API_KEY)
To run inference, we just need a dictionary with a single field, image_url
, and we set return_pickled
to False
since we want the response serialized and not pickled.
The response returns a payload with {results: <RESULT_PAYLOAD>, status: <STATUS_OF_JOB>}
where results
is a dictionary with a bitmask: {‘semantic_segmentation_mask’: <BITMASK>}
where each value is a serialized bitmask of the image ([image_height, image_width]
), where 0 represents False
and 1 represents True
for the given segmentation class. The semantic segmentation mask represents the entire mask of the object.
import numpy as np
from skimage import io
import matplotlib.pyplot as plt
args = {"image_url": "IMAGE_STORAGE_URL"}
# visualize original input image
original_image = io.imread(args['image_url']).astype(np.uint8)
plt.imshow(original_image)
# submit async task to the endpoint
task_id = client.async_request("endpoint_name", args=args, return_pickled=False)
# retrieve task results, this should only take at most a few seconds
import time
result = {}
while result.get("state", None) not in ["SUCCESS", "FAILURE"]:
result = client.get_async_response(task_id)
time.sleep(0.5)
Step 2b: Inference Using CURL
To use the endpoint, we want to create a REST request to the Scale server.
Submit Task
POST Request
URL: https://api.scale.com/v1/hosted_inference/task_async/<endpoint_name>
JSON Payload: { args: { image_urls: [‘insert-image-url-here’, ‘insert-image-url-here’] } , return_pickled: false}
Auth:
username: API Key
password: ‘’ (empty)
Response: task_id
The Response should return a task_id, which we then use to get the results.
Retrieve Results
GET Request
URL: https://api.scale.com/v1/hosted_inference/endpoints/<endpoint_name>/task_async/<task_id>
Auth:
username: API Key
password: ‘’ (empty)
Response: {results: <RESULT_PAYLOAD>, state: <STATUS_OF_JOB>}
Endpoint Input and Outputs
ENDPOINT_INPUT_PAYLOAD:
- images: [List of either public URL, raw image serialized into [H, W, 3], or s3 URL],
- scale_s3_url = False (configure to True if you want it to read from internal scale bucket)
RESULT_PAYLOAD, if the job is complete, is a serialized List[dict] where each item in the List corresponds to a single image result. Each Dict contains the following items (depending on task type):
- masks: List[Bitmask of image size dimensions where 0 represents no mask and 1 represents mask]
- boxes: List[[X1, Y1, X2, Y2]] where each element in the list is a box with the two corners (x, y)
- scores: scores for each prediction (either masks or boxes)
- labels: label for each prediction (either masks or boxes)