Adding Annotations and Predictions

Overview

Most of the highest leverage workflows in Nucleus are based on visualizing and quantifying model predictions vs. ground truth annotations.

In this guide, we'll walk through the steps to upload your ground truth annotations and model predictions to Nucleus.

Uploading Ground Truth Annotations

Adding ground truth to your dataset in Nucleus allows you to visualize annotations, query for dataset items based on the annotations they contain, and evaluate your models by comparing their predictions to ground truth.

Nucleus expects a specific payload schema for annotations and predictions. The recommended way to construct these payloads is using our Python SDK.

Check out our API reference for specifics on each annotation/prediction type.

from nucleus import (
		BoxAnnotation,
		PolygonAnnotation,
		Segment,
		SegmentationAnnotation
		Point3D,
		CuboidAnnotation,
	  NucleusClient,
)

# box
box_gt_1 = BoxAnnotation(
    label="car",
    x=0,
    y=0,
    width=10,
    height=10,
    reference_id="image_1",
    annotation_id="image_1_car_box_1",
	  metadata={"vehicle_color": "red"}
)
box_gt_2 = BoxAnnotation(
    label="car",
    x=4,
    y=6,
    width=12,
    height=18,
    reference_id="image_1",
    annotation_id="image_1_car_box_2",
	  metadata={"vehicle_color": "blue"}
)

# polygon
polygon_gt = PolygonAnnotation(
    label="bus",
    vertices=[{"x": 100, "y": 100}, {"x": 150, "y": 200}, {"x": 200, "y": 100}],
    reference_id="image_2",
    annotation_id="image_2_bus_polygon_1",
    metadata={"vehicle_color": "yellow"}
)

# segmentation
segmentation_gt = SegmentationAnnotation(
    mask_url="s3://your-bucket-name/segmentation-masks/image_2_mask_id1.png",
    annotations=[
        Segment(label="grass", index="1"),
        Segment(label="road", index="2"),
      	Segment(label="bus", index="3", metadata={"vehicle_color": "yellow"}),
      	Segment(label="tree", index="4")
	  ],
    reference_id="image_2",
  	annotation_id="image_2_mask_1",
)

# cuboid
cuboid_gt = CuboidAnnotation(
  	label="car",
  	position=Point3D(100, 100, 10),
  	dimensions=Point3D(5, 10, 5),
  	yaw=0,
  	reference_id="pointcloud_1",
  	annotation_id="pointcloud_1_car_cuboid_1",
  	metadata={"vehicle_color": "green"}
)

client = NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("YOUR_DATASET_ID")

response = dataset.annotate
		annotations=[box_gt_1, box_gt_2, polygon_gt, segmentation_gt, cuboid_gt],
  	update=True,
    asynchronous=False # async is recommended, but sync jobs are easier to debug
)
print(response)

Uploading Model Predictions

By uploading model predictions to Nucleus, you can compare your predictions to ground truth annotations and discover problems with your models or dataset.

You can also upload predictions for unannotated data to enable curation and querying workflows. This can fonr instance help you identify the most effective subset of unlabeled data to label next.

Within Nucleus, models work as follows:

  1. Create a Model. You can do this just once and reuse the model on multiple datasets.
  2. Upload predictions to a dataset.
  3. Trigger calculation of model metrics.

You'll then be able to debug your models against your ground truth qualitatively with queries and visualizations, or quantitatively with metrics, plots, and other insights. You can also compare multiple models that have been run on the same dataset.

In terms of payload construction, the schema is largely shared between annotations and predictions, with additional optional attributes (confidence, class_pdf) for predictions. We've provided an example just for bounding box predictions, but it generalizes well to other prediction types.

from nucleus import NucleusClient, BoxPrediction

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")

# create model
model = client.add_model(
  	name="My Model",
	  reference_id="My-CNN",
  	metadata={"timestamp": "121012401"}
)

# create box predictions
box_pred_1 = BoxPrediction(
    label="car",
    x=0,
    y=0,
    width=10,
    height=10,
    reference_id="image_1",
    annotation_id="image_1_car_box_1",
	  confidence=0.6,
  	class_pdf={"car": 0.6, "truck": 0.4},
	  metadata={"vehicle_color": "red"}
)
box_pred_2 = BoxPrediction(
    label="car",
    x=4,
    y=6,
    width=12,
    height=18,
    reference_id="image_1",
    annotation_id="image_1_car_box_2",
	  confidence=0.9,
  	class_pdf={"car": 0.9, "truck": 0.1},
	  metadata={"vehicle_color": "blue"}
)

job = dataset.upload_predictions(
  	model=model,
  	predictions=[box_pred_1, box_pred_2],
  	update=True,
    asynchronous=True # async is recommended, but sync jobs can be easier to debug
)
# poll current status
job.status()
# block until upload completes
job.sleep_until_complete()

Calculate Model Metrics

After creating a model and uploading its predictions, you'll need to call this endpoint to update matches against ground truth annotations and calculate various metrics such as IOU. This will enable sorting by metrics, filtering down to false positives or false negatives, and a number of evaluation plots and metrics present in the Insights page.

You can continue to add model predictions to a dataset even after running the calculation of the metrics. However, the calculation of metrics will have to be retriggered for the new predictions to be matched with ground truth and update sorts, false positive/negative filters, and metrics used in the Insights page.

📘

How Nucleus matches predictions to ground truth

During IOU calculation, predictions are greedily matched to ground truth by taking highest IOU pairs first. By default the matching algorithm is class-sensitive: it will treat a match as a true positive if and only if the labels are the same.

If you'd like to compute IOU by allowing associations between certain labels and predictions that don't have the same name, you can specify them using a list of allowed_label_matches (shown in the example below).

from nucleus import NucleusClient

client = NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset(dataset_id="YOUR_DATASET_ID")

model = client.get_model(model_id="YOUR_MODEL_ID", dataset_id="YOUR_DATASET_ID")

"""
associate car and bus bounding boxes for IOU computation,
but otherwise force associations to have the same class (default)
"""
dataset.calculate_evaluation_metrics(model, options={
    "allowed_label_matches": [
        {
            "ground_truth_label": "car",
            "model_prediction_label": "bus"
        },
        {
            "ground_truth_label": "bus",
            "model_prediction_label": "car"
        }
    ]
})

Once your predictions' metrics have finished processing, you can check out the Objects tab or Insights page to explore, visualize, and debug your models and data!

Updated 2 years ago