Getting Started

Nucleus is a dataset management platform that helps ML teams build better datasets. Bring your data, labels, and model predictions together to debug your models and improve your datasets.

Install Python Client Using pip

We recommend using our Python SDK to interact with the Nucleus API.

pip install scale-nucleus

Our Python client code is open source! You can check out our codebase here: https://github.com/scaleapi/nucleus-python-client

Get API Key

To interact with Scale (and Nucleus) APIs, you'll need to get an API key. Follow this guide to get set up.

It's always a good idea to store this API key as an environment variable in order to avoid accidentally checking it into source control. We recommend adding a line like this to your shell profile.

export NUCLEUS_API_KEY=<YOUR_API_KEY>

Granting Scale Read Access to Your Data

Follow this guide to grant delegated access of your remote data to Scale. At the moment AWS S3, Google Cloud Storage, and Azure Blob Storage are supported.

Once complete, you can verify whether Nucleus has access by uploading to a test dataset using the following snippet:

import nucleus

client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
dataset = client.create_dataset("TestAccess")

accessible_url = YOUR_ACCESSIBLE_FILE_URL
dataset_item = nucleus.DatasetItem(image_location=accessible_url, reference_id='test_item_id')

print(dataset.append(dataset_items))
Updated 16 days ago