Sensor Fusion Scene Format Overview

What is a Sensor Fusion Scene?

A Sensor Fusion Scene (SFS) file is a container format designed to represent a 3D scene by synchronizing data from multiple sensor types like LiDAR, cameras, and radar. Its strength lies in its efficiency and synchronization. An SFS file stores data in compressed formats like MP4 or in binary typed arrays, synchronized to a single time scale. This structure makes SFS files smaller and faster to process, while also allowing Scale to leverage work across 2D and 3D pipelines through projection and the linking of objects.

Core SFS Functionality

To build an SFS file, you'll primarily work with classes from the scale_sensor_fusion_io library. The most common are PosePath, CameraSensor, and LidarSensor.

PosePath
- A PosePath defines an object's movement and orientation through the scene over time. It's constructed from two main components: an index containing an array of timestamps, and data containing a corresponding array of poses. Each pose is represented as an array of seven numbers: three for the position (x, y, z) and four for the orientation as a scalar-last quaternion (qx, qy, qz, qw).
CameraSensor
- This class represents a single camera in the scene. A CameraSensor requires a unique id, a PosePath to describe its position over time, and its intrinsics (focal length, principal point, etc.). For workflows where 3D annotations are not generated, the intrinsics can be a dummy variable; their use is in reprojection of cuboids into 2D space. Its most important feature is the video object, which holds the video content as a binary byte array (Uint8Array) along with an array of timestamps for each frame and the video's FPS.
LidarSensor
- This class represents a single LiDAR. Like the camera, it requires a unique id and a PosePath. The point cloud data itself is organized into a list of frames. Each frame has a start timestamp and a points object containing the actual data for that capture period. The points object holds several binary arrays:
  - positions: A Float32Array of all point (x, y, z) coordinates.
  - intensities: A Uint8Array of intensity values for each point.
  - timestamps: A Uint32Array of per-point timestamps, which is valuable for "frameless" or high-frequency data.
  - colors: An optional Uint8Array of RGB values, which can be generated by projecting camera colors onto the point cloud.

SFS also has functionality for a generic PointSensor as well as a RadarSensor. All of these data classes can be found within the scale_sensor_fusion_io library. Additional reference material can also be found at the bottom of this page.

Sensor Fusion Scene Creation Workflow

Producing an SFS file can be completed with a series of steps which focus on data synchronization, object creation, and scene assembly. A high level overview of these steps is below with code samples included for each step to the right.

Step 1: Synchronize Timestamps

This is the most critical preparation step. All sensor data must exist on a single, unified timeline. This timeline is not tied to a specific sensor, but exists across the entire span of the scene.

First, gather all frame-level timestamps from every sensor.
Identify the single earliest timestamp among them. This will become your scene's "zero" point, or
t=0
.
Convert all timestamps to microseconds and subtract the "zero" timestamp from every timestamp in your dataset. The original start time is saved separately and stored as the
time_offset
in the final scene file.

Synchronize Timestamps

### IDENTIFY MINIMUM TIMESTAMP ###

USEC_IN_SEC = 1e6

min_camera_timestamps = []
min_lidar_timestamps = []

for camera in cameras:
    min_camera_timestamps.append(camera_timestamps[camera.name].min())

for lidar in lidars:
    min_lidar_timestamps.append(lidar["t"].min())

minimum_timestamp = min(
    min_camera_timestamps + min_lidar_timestamps) * USEC_IN_SEC

Step 2: Ingest and Format Data

With timestamps aligned, you can load and format the raw sensor data.

For cameras: Convert your image sequences into MP4 video files. There are several utility functions within
scale_sensor_fusion_io
which can assist with this. The result should be a byte array for each camera's video stream.
For LiDAR: Load the point cloud data for each frame. The goal is to get the position, intensity, and per-point timestamp data into NumPy arrays or similar structures that can be easily converted to the required binary format.
For Poses: Load all pose data format it into the (N, 7) array structure required by the
PosePath
class. Since each sensor needs its own
PosePath
, the
PosePath
should be specific to the sensor; each sensor will need to be adjusted using its extrinsic calibration with respect to the frame of the Ego
PosePath
. If pose data does not have the same timestamps as your
CameraSensor
or
LidarSensor
, the poses can be interpolated using helper functions like
apply_interpolated_transform_to_points
or the class function
PosePath.interpolate(timestamps)
.

Ingest and Format Data

### GENERATE POSES ###
''' The poses need to be normalized with the minimum timestamp and should be converted to world coordinates. 
If poses are already recorded in world coordinates, this step can be skipped. '''

import scale_sensor_fusion_io as sfio

with open("pose.json", "r") as f:
    poses_json = json.load(f)  # array of poses for each frame

pose_values = np.array([list(pose.values()) for pose in poses_json])
pose_values[:, 0] = pose_values[:, 0] * USEC_IN_SEC - minimum_timestamp

all_poses = sfio.PosePath(
        data=np.fromiter(
            (
                (
                    pose["tx"],
                    pose["ty"],
                    pose["tz"],
                    pose["qx"],
                    pose["qy"],
                    pose["qz"],
                    pose["qw"],
                )
                for pose in poses_json
            ),
            dtype=np.dtype((float, 7)),
            count=len(poses_json),
        ),
        index=pose_values[:, 0],
    )

# converts poses to world coordinates from ego coordinates
all_poses_world = sfio.PosePath(all_poses.invert().as_matrix()[0] @ all_poses)

Step 3: Instantiate Sensor Objects

Now, use the formatted data to create your sensor objects.

For each camera: Create a
PosePath(timestamps, pose_data)
and then a
CameraSensor(id, intrinsics, PosePath, video_data)
.
For the LiDAR: Create a
PosePath(timestamps, pose_data)
. Then, for each lidar frame, package point cloud data as
LidarSensorPoints(timestamps, positions, intensities)
and create a
LidarSensor(id, PosePath, list_of_lidar_frames)
.

Instantiate Sensor Objects

### GENERATE CAMERA SENSORS ###
from pyquaternion import Quaternion
from scale_lidar_io import transform

def generate_camera_sensor(camera_name: str, timestamps: List[int], poses: List[dict]) -> sfio.CameraSensor:
    ### CAMERA INTRINSICS ###
    # Load the camera intrinsics and distortion parameters from the json file
    with open("camera_param.json", "r") as f:
        intrinsics_json = json.load(f)
        intrinsics = sfio.CameraIntrinsics(
        fx=intrinsics_json[camera_name]["intrinsics"]["fx"],
        fy=intrinsics_json[camera_name]["intrinsics"]["fy"],
        cx=intrinsics_json[camera_name]["intrinsics"]["cx"],
        cy=intrinsics_json[camera_name]["intrinsics"]["cy"],
        width=CAMERA_WIDTH,
        height=CAMERA_HEIGHT,
        distortion=sfio.CameraDistortion.from_dict(
            intrinsics_json[camera_name]
        ),
    )

    ### CAMERA POSES ###
    ''' poses_json and pose_values were already calculated earlier. These poses are in the GPS/IMU frame, 
        so we need to adjust poses to account for the camera extrinsics. '''

    lidar_to_cam_transform = transform.Transform.from_Rt(
         R=Quaternion([extrinsics[camera_name]["qw"],
                    extrinsics[camera_name]["qx"],
                    extrinsics[camera_name]["qy"],
                    extrinsics[camera_name]["qz"]]),
                    t=np.array([extrinsics[camera_name]["tx"],extrinsics[camera_name]["ty"],extrinsics[camera_name]["tz"]])
    )

    poses = poses @ lidar_to_cam_transform.matrix

    poses = poses.interpolate(timestamps)

    ### GENERATE VIDEO ###

    # use this utility function to encode the video correctly
    sfio.utils.video_helpers.generate_video(
        image_files=sorted(
            glob.glob(os.path.join(SAMPLE_CAMERA_PATH, camera_name, f"*.jpg"))
        ),
        target_file=os.path.join(SAMPLE_CAMERA_PATH, camera_name, "video.mp4"),
        fps=CAMERA_FPS,
    )

    video = sfio.CameraSensorVideo(
        timestamps=timestamps,
        content=np.fromfile(
            os.path.join(SAMPLE_CAMERA_PATH, camera_name, "video.mp4"), dtype=np.uint8
        ),
        fps=CAMERA_FPS,
    )

    return sfio.CameraSensor(
        id=camera_name,
        intrinsics=intrinsics,
        video=video,
        poses=poses,
    )

### GENERATE LIDAR SENSOR ###
def generate_lidar_sensor(
    dataframes: List[pd.DataFrame], lidar_timestamps: List[int], poses: List[dict]) -> sfio.LidarSensor:
    ### CREATE LIDAR FRAMES ###

    lidar_interp_poses = poses.interpolate(lidar_timestamps)

    interp_points = [
        sfio.utils.pose_path_helpers.apply_interpolated_transform_to_points(
            lidar_df[["x", "y", "z"]].values,
            lidar_df["time (s)"].values,
            poses,
        )
        for lidar_df in dataframes
    ]

    lidar_points = [
        sfio.LidarSensorPoints(
            positions=interpted.astype(np.float32),
            timestamps=df["time (s)"].to_numpy(dtype=np.uint32),
            intensities=df["intensity"].to_numpy(dtype=np.uint8),
        )
        for df, interpted in zip(dataframes, interp_points)
    ]

    frames = [
        sfio.LidarSensorFrame(points=points, timestamp=timestamp)
        for points, timestamp in zip(lidar_points, lidar_timestamps)
    ]

    ### LIDAR POSES ###

    for i in range(len(dataframes)):
         dataframes[i][["x", "y", "z"]] = interp_points[i]

    # this sensor is using the world frame, make sure we indicate that here
    return sfio.LidarSensor(
        id="lidar_0", poses=lidar_interp_poses, frames=frames, coordinates="world"
    )

Step 4: Assemble and Serialize the Scene

Combine all the created objects into a single root

Scene

object.

Create a list containing all the
CameraSensor
and
LidarSensor
objects you instantiated.
Instantiate the main
Scene
object, passing it your list of
sensors
and the
time_offset
you calculated in Step 1.
Finally, use an SFS-specific encoder (like the
JSONBinaryEncoder
from the library) to write your
Scene
object to an .sfs file. This encoder correctly handles the conversion of your data into the efficient binary format.

Assemble Scene

### CREATE SENSORS ###
lidar_sensor = generate_lidar_sensor(lidar_dataframes, lidar_timestamps, all_poses_world)
camera_sensors = [
    generate_camera_sensor(camera.name, camera_timestamps[camera.name], all_poses_world)
    for camera in cameras
] 

sensors = camera_sensors + [lidar_sensor]

### ASSEMBLE SCENE ####
scene = sfio.Scene(
    sensors=sensors, time_offset=minimum_timestamp, time_unit="microseconds"  # type: ignore
)

# convert the Scene object to an sfs object
sfs_scene = sfio.model_converters.sfs.to_scene_spec_sfs(scene)

encoder = sfio.JSONBinaryEncoder()
encoder.write_file(os.path.join(SAMPLE, "example_scene.sfs"), sfs_scene) # type: ignore

Step 5: Verify the Output

After saving the file, you should verify its integrity. You can do this programmatically by using the library's

parse_and_validate_scene

function, which checks for structural correctness. After confirming the structural integrity of the SFS file, you can visualize the output of the scene using your Scale Dashboard

debug

entrypoint, located here. You must be logged in to open the Scale Dashboard.

Verify Scene

### VERIFY SCENE ###
import pprint
pp = pprint.PrettyPrinter(depth=6)

def test0(sfs_scene_path):
    print("Test 0")
    raw_data = read_file(sfs_scene_path)

    result = parse_and_validate_scene(raw_data)

    if not result.success:
        pp.pprint(asdict(result))
    else:
        print("Scene parsed successfully")

test0(os.path.join(SAMPLE, "example_scene.sfs"))

Scale Annotations

After Scale has completed annotating your task, the next step is to retrieve that task and ingest the annotations. Scale’s API allows you to easily retrieve single tasks or a large number of tasks, but you’ll need to parse the annotations to leverage them for model training.

If you submitted a legacy task, you will receive a JSON response which corresponds to the legacy format you submitted. You can find more information about those responses in the corresponding Task Reference page.

If you submitted a Sensor Fusion or Multi-Stage task, you will receive a response in an .SFS or .BS5 format. As covered in the Sensor Fusion Task Reference, an SFS/BS5 file is a custom format used by Scale to more efficiently store 2D and 3D data, and under the hood, it is comprised of 3 sections: a JSON Object header, a zero byte padding, and a binary array.

The JSON header contains information about the file format, the time unit, time offset, as well as annotations and attributes. A final field, labeled $items, contains pointers to the binary arrays if relevant to help reconstruct certain response formats. It is at this point that the procedure for parsing annotation depends on the type of annotation required.

Sparse Data Annotations

For annotations like 3D cuboids, 2D bounding boxes, keypoints, text, or other types which do not require comprehensive data context to return, responses can be converted to a common JSON type with minor manipulation. The byte array for these responses will not contain any data, so removing the zero byte buffer will yield a standard JSON object.

Dense Data Annotations

For tasks in Lidar Semantic Segmentation (LSS) or Image Semantic Segmentation, reconstructing the annotations requires large amounts of data, since every point or pixel respectively has a label assigned. The data required to assemble the full annotation is stored in the binary array portion of the BS5 or SFS response, and must be read using our binary decoder. A sample of that has been included below:

Dense Annotation

from scale_json_binary import read_file
import os
import urllib.request

def save_file(url: str, filename: str) -> None:
   """
   Downloads a file from a URL, like the Scale Annotation response
   """
   urllib.request.urlretrieve(url, filename)

def read_binary_json(url: str) -> dict:
   """
   Save the binary JSON locally, read into a JSON object, and then delete the tmp file
   """
   response_filename = 'tmp_task_response.json'
   save_file(url, response_filename)
   response = read_file(response_filename)
   os.remove(response_filename)
   return response

Response Format

Once you have the annotations for your task ingested, you’ll notice that for each annotation, there are a number of fields you can retrieve data from, some of which have Scale annotations, some of which include task data like time offset and time unit, and one which is available for customer metadata, assigned upon task creation.

You may notice that each annotation does not have values for every timestamp in the task, in SFS responses, we only include position information when the value has changed over time. Static objects and objects seen when the vehicle is at rest may not have a value for every timestamp; this is expected behavior, and.

SFS Response

{
"version": "1.0", # or "5.1"
"annotations": [
{
    "id": "UUID",
    "type": "Annotation Type", # cuboid, box, line, etc
    "stationary": "Boolean",
    "label": "string", # defined by taxonomy
        "path": {
            "timestamps" : {},
            "values" : {}
		},
    "attributes" : {}, # defined by taxonomy
    }, 
    ... ]
"attributes" :
    {}
"time_offset": int, # from SFS creation
"time_unit" : "microseconds", # from SFS creation
"metadata" : {}, # customer defined metadata
"$items" : [], # representation of Binary Arrays
}

Advanced Workflows

Prelabel Generation

Including model predictions in your task submission is one feature that customers leverage when they want to Scale to assess their model’s performance while also ensuring that their data is annotated with the highest quality and level of review.

Annotations can be included in the task by creating Annotation Objects in an attached “hypothesis” SFS file. These objects can be any of the many annotation types which Scale delivers, including but not limited to:

3D Cuboids
2D Bounding Boxes
Polylines
Polygons
Lidar Point Labels (LSS)

Each annotation type requires different arguments to generate, which can be found in scale_sensor_fusion_io in the types/spec.py portion of the library or in the extended reference at the bottom of this page. For any geometric annotations which require an AnnotationPath, please make sure that they reflect the coordinate system of your scene, whether ego or world.

An example payload including the hypothesis is shown below:

    payload = {
        "project": project,
        "scene_format": "sensor_fusion",
        "attachments": ["s3://bucket/sample.sfs"],
        "instruction": "This is a test task",
        "hypothesis": {"annotations": {"url": "s3://bucket/hypothesis.sfs"}},
        }

Common Errors

There are a few common errors we see when creating Sensor Fusion Scenes for the first time

timestamps not recorded in microseconds (1e-6)
timestamps must be saved in microseconds across all sensors and poses. If your sensors have a higher or lower sample rate than 1MHz, simply divide or multiply the timestamps accordingly
Tasks erroring on Scale’s platform
SensorFusionScene objects greater than 1.5 gigabytes can sometimes cause timeouts when being fetched on the Scale platform. We would encourage you to experiment with methods to reduce the size of your SFS scenes to below 1.5gb by adjusting the length of your scene, voxelization of pointclouds, or reduction in video frame rate where appropriate. Within the scale_sensor_fusion_io there are functions like generate_video which can leverage MP4 compression to significantly reduce the size of SFS files. Feel free to reach out to your technical team to strategize a process that works best for you.
Egocentric 3D scenes
With a proper PosePath, you should see point clouds that represent static objects as static in our SFS visualization tool. If you notice that the vehicle is static while point clouds move around you, check to make sure that you’re adjusting point clouds to compensate for the path of your vehicle. A helpful function for this is apply_interpolated_transform_to_points located in the scale_sensor_fusion_io library.
Incorrect Camera Projections
When hovering over each camera view in your debug viewer, if the camera’s Field of View (FOV) doesn’t match the expected orientation of the camera, confirm that each camera has been transformed in relation to a common pose path. If each camera has an extrinsic calibration, ensure that they’re calibrated against a common point on the vehicle.

Poses

A PosePath represents the position and orientation of a sensor or an object in the scene at different timestamps. The timestamps numbers may not be the same timestamps used in other Sensor objects, as pose timestamps might have been interpolated.

PosePath has the following field:

timestamps: An array of numbers representing the timestamps at which the sensor or object was at a specific position and orientation.
values: An array of arrays containing the [x, y, z, qx, qy, qz, qw] components of the pose (position and scalar-last quaternion).

These values will be in either ego-centric or world coordinates, depending on the “coordinates” field of the sensor. If there are no “coordinates” field provided, the field will be parsed as world coordinates
If “ego” value is provided, a GPSSensor must be provided to use as the “ego” pose.

Sensors

Points Sensor

A PointsSensor is a sensor that captures points in 3D space. It has the following fields:

id : The unique identifier of the sensor.
type : The string "points" indicating this is a points sensor.
parent_id (optional): The unique identifier of the parent sensor if it exists.
points : An object containing the following fields:
- positions : An array of 3D positions represented as a Float32Array.
- colors (optional): An array of RGB colors represented as a Uint8Array.

Lidar Sensor

A LidarSensor is a sensor that captures 3D points with optional intensity, colors and per-point timestamp data. It has the following fields:

id : The unique identifier of the sensor.
type : The string "lidar" indicating this is a lidar sensor.
parent_id (optional): The unique identifier of the parent sensor if it exists.
poses : A PosePath object that defines the path of the sensor.
coordinates (optional): A string representing the coordinate system the lidar is in, either "ego" or "world". It’s world by default.
frames : An array of frame objects containing the following fields:
- timestamp : The start timestamp of the frame.
- points : An object containing the following fields:
  - positions : A binary array of 3D positions represented as a Float32Array.
  - colors (optional): A binary array of RGB colors represented as a Uint8Array
  - intensities (optional): A binary array of intensity values represented as a Uint8Array.
  - timestamps (optional): A binary array of timestamps represented as a Uint32Array or Uint64Array . If scene.time_unit == "nanosecond" this field will be parsed as Uint64Array, otherwise it will be parsed as Uint32Array

Radar Sensor

A RadarSensor is a sensor that captures 3D radar points with optional direction and length data values. It has the following fields:

id : The unique identifier of the sensor.
type : The string "radar" indicating this is a radar sensor.
parent_id (optional): The unique identifier of the parent sensor if it exists.
poses : A PosePath object that defines the path of the sensor.
coordinates (optional): A string representing the coordinate system the lidar is in, either "ego" or "world". It’s world by default.
frames : An array of frame objects containing the following fields:
- timestamp : The start timestamp of the frame.
- points : An object containing the following fields:
  - positions : A binary array of 3D positions represented as a Float32Array.
  - directions (optional): A 3D binary array of directions represented as a Float32Array.
  - lengths (optional): A binary array of length values represented as a Float32Array.
  - timestamps (optional): A binary array of timestamps represented as a Uint32Array or Uint64Array . If scene.time_unit == "nanosecond" this field will be parsed as Uint64Array, otherwise it will be parsed as Uint32Array

Camera Sensor

A CameraSensor is a sensor that captures 2D images or video. It has the following fields:

id: The unique identifier of the sensor.
type : The string "camera" indicating this is a camera sensor.
parent_id(optional): The unique identifier of the parent sensor if it exists.
poses: A PosePath object that defines the path of the camera.
coordinates (optional): A string representing the coordinate system the lidar is in, either "ego" or "world". It’s world by default.
intrinsics : An object containing the intrinsic parameters of the camera:
- fx : The focal length in the x direction.
- fy : The focal length in the y direction.
- cx : The x coordinate of the principal point.
- cy : The y coordinate of the principal point.
- width: The width of the camera image.
- height : The height of the camera image.
- distortion (optional): An object containing the following fields:
  - model : A string representing the distortion model used, one of "brown_conrady", "mod_equi_fish", "mod_kannala", "fisheye", "fisheye_rad_tan_prism", or "cylindrical".
  - params : An array of floats representing the distortion parameters required to apply the model.
video (optional): An object containing the following fields if the camera captures video:
- timestamps : An array of timestamps for each frame indicating the start of the frame
- content : A binary Uint8Array containing the video data encoded as mp4.
- fps : The frames per second of the video.
images (optional): An array of objects containing the images:
- timestamp : The timestamp of the image.
- content : A binary Uint8Array containing the image encoded as jpg.

Odometry Sensor

An OdometrySensor is a sensor that captures the movement of the vehicle (or "ego") that the sensors are attached to. It has the following fields:

id : The unique identifier of the sensor.
type : The string "odometry" indicating this is an odometry sensor.
parent_id (optional): The unique identifier of the parent sensor if it exists.
poses : A PosePath object that defines the poses of the odometry.

Annotations

Cuboid Annotation

A CuboidAnnotation is an annotation that labels an object as a cuboid. It has the following fields:

id : The unique identifier of the annotation.
type : The string cuboid indicating this is a cuboid annotation.
parent_id (optional): The id of a parent annotation.
stationary (optional): A boolean indicating whether the object is stationary or not.
label (optional): A string representing the label of the object.
path : An object that defines the path of the cuboid annotation, with the following fields:
- timestamps: The timestamps of the keyframes of the cuboid path.
- values: An array of arrays containing the [x, y, z, px, py, pz, roll, pitch, yaw] components of the cuboid at each path timestamp
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.
activations (optional): An array of objects that defines the per-sensor activations of the annotation in frameless scenes. There is one object per sensor, and each object contains the following fields:
- sensor_id The unique identifier of the sensor for which the cuboid is activated.
- timestamps : An array of timestamps (in microseconds) for each cuboid activation.
- durations : An array of durations (in microseconds) for each cuboid activation.
- cuboids (optional): An array of arrays containing the [dx, dy, dz, px, py, pz, pitch, roll, yaw] components of a computed cuboid for each activation timestamp.
projections (optional): An array of objects that define per-sensor projections of the annotation, with the following fields:
- sensor_id: The unique identifier of the 2D sensor for which the cuboid is projected.
- timestamps: An array of camera timestamps for each cuboid projection.
- boxes: An array of arrays containing the [x, y, width, height] components of the 2D bounding box of the object in the image. It could contain undefined if the box could not be projected or was deleted by the user.
- confirmed (optional): An array of boolean values indicating whether a value is confirmed or not by a user.
- cuboids (optional): An array of arrays containing the [dx, dy, dz, px, py, pz, pitch, roll, yaw] components of the cuboid for each projection timestamp.

2D Box Annotation

The Box2DAnnotation type represents a 2D bounding box annotation in the scene. It has the following fields:

id : The unique identifier of the annotation.
type : The string box_2d indicating this is a box annotation.
parent_id (optional): The id of a parent annotation.
stationary (optional): A boolean indicating whether the object is stationary or not.
label (optional): A string representing the label of the object.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.
sensor_id: The unique identifier of the sensor if the annotation is sensor-specific.
path : An object that defines the path of the box annotation, with the following fields:
- timestamps: The timestamps of the path.
- values: An array of arrays containing the [left, top, width, height] components of the box.

2D Polyline Annotation

The Polyline2DAnnotation type represents a 2d polyline annotation in the scene. It has the following fields:

id : The unique identifier of the annotation.
type : The string polyline_2d indicating this is a polyline annotation.
parent_id (optional): The id of a parent annotation.
stationary (optional): A boolean indicating whether the object is stationary or not.
label (optional): A string representing the label of the object.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.
sensor_id: The unique identifier of the sensor if the annotation is sensor-specific.
path : An object that defines the path of the polyline annotation, with the following fields:
- timestamps: The timestamps of the path.
- values: An array of arrays containing the [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]] vertices of the polyline per timestamp

2D Polygon Annotation

The Polygon2DAnnotation type represents a 2D polygon annotation in the scene. It has the following fields:

id : The unique identifier of the annotation.
type : The string polygon_2d indicating this is a 2D polygon annotation.
parent_id (optional): The id of a parent annotation.
stationary (optional): A boolean indicating whether the object is stationary or not.
label (optional): A string representing the label of the object.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.
sensor_id: The unique identifier of the sensor if the annotation is sensor-specific.
path : An object that defines the path of the polygon annotation, with the following fields:
- timestamps: The timestamps of the path.
- values: An array of arrays containing the [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]] vertices of the polygon.

2D Point Annotation

The Point2DAnnotation type represents a 2d Point annotation in the scene. It has the following fields:

id : The unique identifier of the annotation.
type : The string point_2d indicating this is a point annotation.
parent_id (optional): The id of a parent annotation.
stationary (optional): A boolean indicating whether the object is stationary or not.
label (optional): A string representing the label of the object.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.
sensor_id: The unique identifier of the sensor if the annotation is sensor-specific.
path : An object that defines the path of the point annotation, with the following fields:
- timestamps: The timestamps of the path.
- values: An array of arrays containing the [[1st timestamp's x, y], [2nd timestamp's x, y]] point coordinates

Polygon Annotation

The PolygonAnnotation type represents a polygon annotation in the scene. Polygon has the invariant that the points are on a plane. It has the following fields:

id : The unique identifier of the annotation.
type : The string polygon indicating this is a polygon annotation.
parent_id (optional): The id of a parent annotation.
stationary (optional): A boolean indicating whether the object is stationary or not.
label (optional): A string representing the label of the object.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.
sensor_id (optional): The unique identifier of the sensor if the annotation is sensor-specific.
path : An object that defines the path of the polygon annotation, with the following fields:
- timestamps: The timestamps of the path.
- values: An array of arrays containing the [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]] vertices of the polygon.

Topdown Polygon Annotation

The TopdownPolygonAnnotation type represents a topdown 2D polygon annotation with elevation data. Points in a topdown polygon don’t need to lie on a plane. It has the following fields:

id : The unique identifier of the annotation.
type : The string polygon_topdown indicating this is a polygon annotation.
parent_id (optional): The id of a parent annotation.
stationary (optional): A boolean indicating whether the object is stationary or not.
label (optional): A string representing the label of the object.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.
sensor_id (optional): The unique identifier of the sensor if the annotation is sensor-specific.
path : An object that defines the path of the polygon annotation, with the following fields:
- timestamps: The timestamps of the path.
- values: An array of arrays containing the [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]] vertices of the polygon.

Polyline Annotation

The PolylineAnnotation type represents a polyline annotation in the scene. It has the following fields:

id : The unique identifier of the annotation.
type : The string polyline indicating this is a polyline annotation.
is_closed (optional): Whether or not this annotation is closed. If true, the first and last vertices will be connected to represent a “3D polygonal loop”
parent_id (optional): The id of a parent annotation.
stationary (optional): A boolean indicating whether the object is stationary or not.
label (optional): A string representing the label of the object.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.
sensor_id (optional): The unique identifier of the sensor if the annotation is sensor-specific.
path : An object that defines the path of the polyline annotation, with the following fields:
- timestamps: The timestamps of the path.
- values: An array of arrays containing the [[1st timestamp's x_0, y_0, z_0, x_1, y_1, z_1, ..., x_n, y_n, z_n], [2nd timestamp's x_0, y_0, z_0,...]] vertices of the polyline.

Points Annotation / Keypoints

Both LidarTopdown 3D points and LidarAnnotation keypoints are represented by this

id : The unique identifier of the annotation.
type : The string points indicating this is a point annotation.
parent_id (optional): The id of a parent annotation.
stationary (optional): A boolean indicating whether the object is stationary or not.
labels (optional): An array of strings representing the label of each point, respecting the index of each point.
attributes (optional): An array of AttributePath objects along with an additional point_index field .
sensor_id (optional): The unique identifier of the sensor if the annotation is sensor-specific.
paths : An object that defines the path of the point annotation, with the following fields:
- id: Unique identifier for each point within the annotation (optional in old files).
- timestamps: The timestamps of the path.
- values: An array of arrays containing the [x, y, z] coordinates of the points.
projections (optional): An array of arrays containing objects that define per-sensor projections of the annotation, with the following fields:
- sensor_id: The unique identifier of the 2D sensor for which the cuboid is projected.
- timestamps: An array of camera timestamps for each cuboid projection.
- points: An array of arrays containing the [x, y] components of the 2D bounding box of the object in the image. It could contain undefined if the point could not be projected or was deleted by the user.
- confirmed (optional): An array of boolean values indicating whether a value is confirmed or not by a user.
- positions (optional): An array of arrays containing the [x, y, z] components of the point for each projection timestamp.

Event Annotation

The EventAnnotation type represents an event. It has the following fields:

id : The unique identifier of the annotation.
type : The string event indicating this is a event annotation.
parent_id (optional): The id of a parent annotation.
label (optional): A string representing the label of the object.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.
start: The timestamp of the start of the event.
duration (optional): The duration of the event
sensor_id: The unique identifier of the sensor if the event is sensor-specific.

Labeled Points Annotation (LSS)

The LabeledPointsAnnotation interface represents an annotation of labeled points in a lidar sensor.

id: A unique identifier for the annotation.
type: The string labeled_points indicating this is a cuboid annotation.
parent_id (optional): The id of a parent annotation.
label: a string representing the label assigned to the points in the annotation.
is_instance: a boolean value indicating whether the annotation represents an instance or a class.
labeled_points: an array of objects representing the labeled points grouped by sensor and frame.
Each object contains:
- sensor_id: the unique identifier for the sensor containing the labeled points.
- sensor_frame (optional):the frame number of the sensor, if the sensor has frames.
- point_ids: a Uint32Array containing the indices of the labeled points in the sensor frame.

Localization Adjustment Annotation

The LocalizationAdjustmentAnnotation represents a PosePath applied a scene to fix localization issues or convert from ego to world coordinates.

id : The unique identifier of the annotation.
type : The string localization_adjustment indicating this is a localization adjustment annotation.
parent_id (optional): The id of a parent annotation.
poses : A PosePath object that defines the poses of the adjustment.

Camera Calibration Annotation

The CameraCalibrationAnnotation represents .

id : The unique identifier of the annotation.
type : The string camera_calibration indicating this is a camera calibration annotation.
sensor_id : The unique identifier of the camera
time_offset (optional): Timestamp offset to apply to the camera. This is used if the camera timestamps are obviously not correct, but can be corrected with minor timestamps changes.
parent_id (optional): The id of a parent annotation.
poses : A PosePath object that representing the diff between the initial camera and calibrated camera extrinsics. These poses should be applied after the current camera extrinsics are applied.
intrinsics: Intrinsics for the calibrated camera

Object Annotation

The ObjectAnnotation represents an object within a scene, serving as a way to group related annotations associated with a specific object.

id : The unique identifier of the annotation.
type : The string event indicating this is an event annotation.
parent_id (optional): The id of a parent annotation.
label (optional): A string representing the label of the object.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

Link Annotation

The LinkAnnotation represents a link between two annotations. This used to represent relationships between two annotations.

id : The unique identifier of the annotation.
type : The string link indicating this is a link annotation.
label: A string representing the label of the object.
is_bidirectional: Whether this link is a bidirectional relationship
from_id : The id of the annotation that this links from
to_id: The id of the annotation this links to
parent_id (optional): The id of a parent annotation.
attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

Group Annotation

The GroupAnnotation represents a group to which multiple child annotations can belong to, via parent_id.

id : The unique identifier of the annotation.
type : The string group indicating this is a link annotation.
label (optional): A string representing the label of the object.
parent_id (optional): The id of a parent annotation.

Scene (Root Scene)

A Scene is an interface that represents a sensor fusion scene. It has the following fields:

version: A string representing the version of the scene format. It should be 1.0
sensors(optional): An array of sensor objects that describe the sensors in the scene.
annotations(optional): An array of annotation objects that describe the annotations in the scene.
attributes (optional): An array of AttributePath objects that describe the attributes of the scene-level and sensor-level attributes.
time_offset (optional): Scene-level field used to set a reference point in time after making the scene relative to that specific moment.

AnnotationPath

This is used as the path key in all geometric annotations. It captures the geometry per timestamp:

timestamps: List[int] - the timestamps of the path
values: List[List[float]] - An array of arrays containing [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]]

Attributes

Attributes are used to add additional information to annotations. The value of an attribute can be a string, number, or an array of strings.

Attribute Value

An attribute value can be string, number or string[].

Attribute Path

An AttributePath represents the values of an attribute at different timestamp. It has the following fields:

name: A string representing the name of the attribute.
sensor_id (optional): The unique identifier of the sensor if the attribute is sensor-specific.
static (optional): A boolean indicating whether the attribute is static or not. Default is false.
timestamps : An array of timestamps for each value.
values : An array of AttributeValue representing the values of the attribute at each timestamp.