Nucleus is a dataset management platform that helps ML teams build better datasets. Bring your data, labels, and model predictions together to debug your models and improve your datasets.
Install Python Client Using pip
pip
We recommend using our Python SDK to interact with the Nucleus API.
pip install scale-nucleus
Our Python client code is open source! You can check out our codebase here: https://github.com/scaleapi/nucleus-python-client
Get API Key
To interact with Scale (and Nucleus) APIs, you'll need to get an API key. Follow this guide to get set up.
It's always a good idea to store this API key as an environment variable in order to avoid accidentally checking it into source control. We recommend adding a line like this to your shell profile.
export NUCLEUS_API_KEY=<YOUR_API_KEY>
Granting Scale Read Access to Your Data
Follow this guide to grant delegated access of your remote data to Scale. At the moment AWS S3, Google Cloud Storage, and Azure Blob Storage are supported.
Once complete, you can verify whether Nucleus has access by uploading to a test dataset using the following snippet:
import nucleus
client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
dataset = client.create_dataset("TestAccess")
accessible_url = YOUR_ACCESSIBLE_FILE_URL
dataset_item = nucleus.DatasetItem(image_location=accessible_url, reference_id='test_item_id')
print(dataset.append(dataset_items))