Uploading 2D Data

Overview

In this guide, we'll walk through the steps to upload your 2D image data to Nucleus.

  1. Create Dataset
  2. Create image DatasetItems
  3. Upload DatasetItems to Dataset
  4. Update metadata

Creating a Dataset

Get started by creating a new Dataset to which to upload your 2D data.

If you do have an existing Dataset, you can retrieve it by its dataset ID, which are always prefixed with ds_. You can list all of your dataset ID using NucleusClient.list_datasets(), or extract it from the Nucleus dashboard's URL upon clicking into the Dataset.

from nucleus import NucleusClient

client = NucleusClient(YOUR_API_KEY)

dataset = client.create_dataset(YOUR_DATASET_NAME)

Creating Image DatasetItems

You can upload DatasetItems on the Nucleus dashboard or via API.

1230

Uploading local files from the dashboard (creates a new Dataset).

When uploading items via API, you'll first need to construct DatasetItem payloads. The best way to do so is using the Python SDK's DatasetItem constructor, which takes in a few parameters:

PropertyTypeDescription
image_locationstring (required)Local path or remote URL to the image. For large uploads we require the data to be stored with AWS S3, Google Cloud Storage, or Azure Blob Storage for faster concurrent & asynchronous processing.
reference_idstring (required)A user-specified identifier for the image. Typically this is an internal filename or any unique, easily identifiable moniker.
metadatadictOptional metadata pertaining to the image, e.g. time of day, weather. These attributes will be queryable in the Nucleus platform. Metadata can be updated after uploading (via reference ID).
upload_to_scalebooleanSet this to false in order to use privacy mode for this item. This means the data will not be uploaded to Scale, and the image_location must be a URL that is accessible to anyone who wants to see the images in Nucleus. Note: privacy mode is only available to enterprise customers.
pointcloud_locationstringThis parameter should not be supplied for 2D items! DatasetItems can also house lidar pointclouds; please check out our 3D guide for more info.
from nucleus import DatasetItem

accessible_urls = [
    "http://farm1.staticflickr.com/107/309278012_7a1f67deaa_z.jpg",
    "http://farm9.staticflickr.com/8001/7679588594_4e51b76472_z.jpg",
    "http://farm6.staticflickr.com/5295/5465771966_76f9773af1_z.jpg",
    "http://farm4.staticflickr.com/3449/4002348519_8ddfa4f2fb_z.jpg",
]
reference_ids = ['107', '8001', '5925', '3449']
metadata_dicts = [
  	{'indoors': True},
  	{'indoors': True},
  	{'indoors': False},
  	{'indoors': False},
]

dataset_items = []
for url, ref_id, metadata in zip(accessible_urls, reference_ids, metadata_dicts):
    item = DatasetItem(image_location=url, reference_id=ref_id, metadata=metadata)
    dataset_items.append(item)

Uploading DatasetItems to Nucleus

We'll upload to the Dataset created earlier in this guide. You can always retrieve a Dataset by its dataset ID, which are always prefixed with ds_. You can list all of your datasets' IDs using NucleusClient.list_datasets(), or extract one from the Nucleus dashboard's URL upon clicking into the Dataset.

from nucleus import NucleusClient

client = NucleusClient(YOUR_API_KEY)

dataset = client.get_dataset(YOUR_DATASET_ID)

With your images and dataset ready, you can now upload to Nucleus using Dataset.append.

# after creating or retrieving a Dataset
job = dataset.append(
  	items=dataset_items,
  	update=True, # more on this later!
  	asynchronous=True # highly recommended for larger uploads
)

# async jobs will run in the background, poll using:
job.status()

# or block until job completion using:
job.sleep_until_complete()

It is highly recommended to set asynchronous=True for larger uploads! The concurrent processing will dramatically increase upload throughput.

Updating Metadata

By setting the update=True in Dataset.append, your upload will overwrite metadata for any existing item with a shared reference_id.

For instance, in our example we uploaded an item with reference ID 107. We can update this item's metadata to include a new key-value pair as follows:

updated_item = DatasetItem(
  	image_location="http://farm1.staticflickr.com/107/309278012_7a1f67deaa_z.jpg",
  	reference_id="107",
  	metadata={"indoors": True, "room": "living room"}
)

job = dataset.append(
  	items=[updated_item],
  	update=True # update on reference ID collision
  	asynchronous=True
)
job.sleep_until_complete()

We highly recommend adding as much metadata as possible! Metadata is queryable in the Nucleus dashboard and can unlock many data exploration and curation workflows. See our metadata guide for more info.

Updated 2 years ago