Nucleus is a dataset management platform that helps ML teams build better datasets. Bring your data, labels, and model predictions together to debug your models and improve your datasets.
We recommend using our Python SDK to interact with the Nucleus API.
pip install scale-nucleus
Our Python client code is open source! You can check out our codebase here: https://github.com/scaleapi/nucleus-python-client
To interact with Scale (and Nucleus) APIs, you'll need to get an API key. Follow this guide to get set up.
It's always a good idea to store this API key as an environment variable in order to avoid accidentally checking it into source control. We recommend adding a line like this to your shell profile.
Follow this guide to grant delegated access of your remote data to Scale. At the moment AWS S3, Google Cloud Storage, and Azure Blob Storage are supported.
Once complete, you can verify whether Nucleus has access by uploading to a test dataset using the following snippet:
import nucleus client = nucleus.NucleusClient(YOUR_SCALE_API_KEY) dataset = client.create_dataset("TestAccess") accessible_url = YOUR_ACCESSIBLE_FILE_URL dataset_item = nucleus.DatasetItem(image_location=accessible_url, reference_id='test_item_id') print(dataset.append(dataset_items))