Sensor Fusion Scene Format Overview

The sensor fusion scene format is external and is designed to represent a 3D scene containing different types of sensors and annotations and their associated data.

A 3D Scene consists of the following components:

  • Poses: Represents the position and orientation of a sensor or an object in the scene.

  • Sensors: Different types of sensors capturing data in the scene, including points, lidar, radar, camera, and odometry sensors.

  • Annotations: Contains information about objects in the scene, such as cuboids and attributes.

  • Scene: The root object that holds sensors and annotations.

Poses

A PosePath represents the position and orientation of a sensor or an object in the scene at different timestamps. The timestamps numbers may not be the same timestamps used in other Sensor objects, as pose timestamps might have been interpolated.

PosePath has the following field:

  • timestamps: An array of numbers representing the timestamps at which the sensor or object was at a specific position and orientation.

  • values: An array of arrays containing the [x, y, z, qx, qy, qz, qw]  components of the pose (position and scalar-last quaternion).

  • These values will be in either ego-centric or world coordinates, depending on the “coordinates” field of the sensor. If there are no “coordinates” field provided, the field will be parsed as world coordinates

  • If “ego” value is provided, a GPSSensor must be provided to use as the “ego” pose.

Sensors

Points Sensor

A PointsSensor is a sensor that captures points in 3D space. It has the following fields:

  • id : The unique identifier of the sensor.

  • type : The string "points" indicating this is a points sensor.

  • parent_id (optional): The unique identifier of the parent sensor if it exists.

  • points : An object containing the following fields:

    • positions : An array of 3D positions represented as a Float32Array.

    • colors (optional): An array of RGB colors represented as a Uint8Array.

Lidar Sensor

A LidarSensor is a sensor that captures 3D points with optional intensity, colors and per-point timestamp data. It has the following fields:

  • id : The unique identifier of the sensor.

  • type : The string "lidar" indicating this is a lidar sensor.

  • parent_id (optional): The unique identifier of the parent sensor if it exists.

  • poses : A PosePath object that defines the path of the sensor.

  • coordinates (optional): A string representing the coordinate system the lidar is in, either "ego" or "world". It’s world by default.

  • frames : An array of frame objects containing the following fields:

    • timestamp : The start timestamp of the frame.

    • points : An object containing the following fields:

      • positions : A binary array of 3D positions represented as a Float32Array.

      • colors (optional): A binary array of RGB colors represented as a Uint8Array

      • intensities (optional): A binary array of intensity values represented as a Uint8Array.

      • timestamps (optional): A binary array of timestamps represented as a Uint32Array or Uint64Array . If scene.time_unit == "nanosecond" this field will be parsed as Uint64Array, otherwise it will be parsed as Uint32Array

Radar Sensor

A RadarSensor is a sensor that captures 3D radar points with optional direction and length data values. It has the following fields:

  • id : The unique identifier of the sensor.

  • type : The string "radar" indicating this is a radar sensor.

  • parent_id (optional): The unique identifier of the parent sensor if it exists.

  • poses : A PosePath object that defines the path of the sensor.

  • coordinates (optional): A string representing the coordinate system the lidar is in, either "ego" or "world". It’s world by default.

  • frames : An array of frame objects containing the following fields:

    • timestamp : The start timestamp of the frame.

    • points : An object containing the following fields:

      • positions : A binary array of 3D positions represented as a Float32Array.

      • directions (optional): A 3D binary array of directions represented as a Float32Array.

      • lengths (optional): A binary array of length values represented as a Float32Array.

      • timestamps (optional): A binary array of timestamps represented as a Uint32Array or Uint64Array . If scene.time_unit == "nanosecond" this field will be parsed as Uint64Array, otherwise it will be parsed as Uint32Array

Camera Sensor

A CameraSensor is a sensor that captures 2D images or video. It has the following fields:

  • id: The unique identifier of the sensor.

  • type : The string "camera" indicating this is a camera sensor.

  • parent_id(optional): The unique identifier of the parent sensor if it exists.

  • poses: A PosePath object that defines the path of the camera.

  • coordinates (optional): A string representing the coordinate system the lidar is in, either "ego" or "world". It’s world by default.

  • intrinsics : An object containing the intrinsic parameters of the camera:

    • fx : The focal length in the x direction.

    • fy : The focal length in the y direction.

    • cx : The x coordinate of the principal point.

    • cy : The y coordinate of the principal point.

    • width: The width of the camera image.

    • height : The height of the camera image.

    • distortion (optional): An object containing the following fields:

      • model : A string representing the distortion model used, one of "brown_conrady", "mod_equi_fish", "mod_kannala", "fisheye", "fisheye_rad_tan_prism", or "cylindrical".

      • params : An array of floats representing the distortion parameters required to apply the model.

  • video (optional): An object containing the following fields if the camera captures video:

    • timestamps : An array of timestamps for each frame indicating the start of the frame

    • content : A binary Uint8Array containing the video data encoded as mp4.

    • fps : The frames per second of the video.

  • images (optional): An array of objects containing the images:

    • timestamp : The timestamp of the image.

    • content : A binary Uint8Array containing the image encoded as jpg.

Odometry Sensor

An OdometrySensor is a sensor that captures the movement of the vehicle (or "ego") that the sensors are attached to. It has the following fields:

  • id : The unique identifier of the sensor.

  • type : The string "odometry" indicating this is an odometry sensor.

  • parent_id (optional): The unique identifier of the parent sensor if it exists.

  • poses : A PosePath object that defines the poses of the odometry.

Annotations

Cuboid Annotation

A CuboidAnnotation is an annotation that labels an object as a cuboid. It has the following fields:

  • id : The unique identifier of the annotation.

  • type : The string cuboid indicating this is a cuboid annotation.

  • parent_id (optional): The id of a parent annotation.

  • stationary (optional): A boolean indicating whether the object is stationary or not.

  • label (optional): A string representing the label of the object.

  • path : An object that defines the path of the cuboid annotation, with the following fields:

    • timestamps: The timestamps of the keyframes of the cuboid path.

    • values: An array of arrays containing the [dx, dy, dz, px, py, pz, pitch, roll, yaw]  components of the cuboid at each path timestamp

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

  • activations (optional): An array of objects that defines the per-sensor activations of the annotation in frameless scenes. There is one object per sensor, and each object contains the following fields:

    • sensor_id The unique identifier of the sensor for which the cuboid is activated.

    • timestamps : An array of timestamps (in microseconds) for each cuboid activation.

    • durations : An array of durations (in microseconds) for each cuboid activation.

    • cuboids (optional): An array of arrays containing the [dx, dy, dz, px, py, pz, pitch, roll, yaw] components of a computed cuboid for each activation timestamp.

  • projections (optional): An array of objects that define per-sensor projections of the annotation, with the following fields:

    • sensor_id: The unique identifier of the 2D sensor for which the cuboid is projected.

    • timestamps: An array of camera timestamps for each cuboid projection.

    • boxes: An array of arrays containing the [x, y, width, height]  components of the 2D bounding box of the object in the image. It could contain undefined if the box could not be projected or was deleted by the user.

    • confirmed (optional): An array of boolean values indicating whether a value is confirmed or not by a user.

    • cuboids (optional): An array of arrays containing the [dx, dy, dz, px, py, pz, pitch, roll, yaw]  components of the cuboid for each projection timestamp.

2D Box Annotation

The Box2DAnnotation type represents a 2D bounding box annotation in the scene. It has the following fields:

  • id : The unique identifier of the annotation.

  • type : The string box_2d indicating this is a box annotation.

  • parent_id (optional): The id of a parent annotation.

  • stationary (optional): A boolean indicating whether the object is stationary or not.

  • label (optional): A string representing the label of the object.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

  • sensor_id: The unique identifier of the sensor if the annotation is sensor-specific.

  • path : An object that defines the path of the box annotation, with the following fields:

    • timestamps: The timestamps of the path.

    • values: An array of arrays containing the [left, top, width, height] components of the box.

2D Polyline Annotation

The Polyline2DAnnotation type represents a 2d polyline annotation in the scene. It has the following fields:

  • id : The unique identifier of the annotation.

  • type : The string polyline_2d indicating this is a polyline annotation.

  • parent_id (optional): The id of a parent annotation.

  • stationary (optional): A boolean indicating whether the object is stationary or not.

  • label (optional): A string representing the label of the object.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

  • sensor_id: The unique identifier of the sensor if the annotation is sensor-specific.

  • path :  An object that defines the path of the polyline annotation, with the following fields:

    • timestamps: The timestamps of the path.

    • values: An array of arrays containing the [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]] vertices of the polyline per timestamp

2D Polygon Annotation

The Polygon2DAnnotation type represents a 2D polygon annotation in the scene. It has the following fields:

  • id : The unique identifier of the annotation.

  • type : The string polygon_2d indicating this is a 2D polygon annotation.

  • parent_id (optional): The id of a parent annotation.

  • stationary (optional): A boolean indicating whether the object is stationary or not.

  • label (optional): A string representing the label of the object.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

  • sensor_id: The unique identifier of the sensor if the annotation is sensor-specific.

  • path : An object that defines the path of the polygon annotation, with the following fields:

    • timestamps: The timestamps of the path.

    • values: An array of arrays containing the [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]] vertices of the polygon.

2D Point Annotation

The Point2DAnnotation type represents a 2d Point annotation in the scene. It has the following fields:

  • id : The unique identifier of the annotation.

  • type : The string point_2d indicating this is a point annotation.

  • parent_id (optional): The id of a parent annotation.

  • stationary (optional): A boolean indicating whether the object is stationary or not.

  • label (optional): A string representing the label of the object.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

  • sensor_id: The unique identifier of the sensor if the annotation is sensor-specific.

  • path : An object that defines the path of the point annotation, with the following fields:

    • timestamps: The timestamps of the path.

    • values: An array of arrays containing the [[1st timestamp's x, y], [2nd timestamp's x, y]] point coordinates

Polygon Annotation

The PolygonAnnotation type represents a polygon annotation in the scene. Polygon has the invariant that the points are on a plane. It has the following fields:

  • id : The unique identifier of the annotation.

  • type : The string polygon indicating this is a polygon annotation.

  • parent_id (optional): The id of a parent annotation.

  • stationary (optional): A boolean indicating whether the object is stationary or not.

  • label (optional): A string representing the label of the object.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

  • sensor_id (optional): The unique identifier of the sensor if the annotation is sensor-specific.

  • path : An object that defines the path of the polygon annotation, with the following fields:

    • timestamps: The timestamps of the path.

    • values: An array of arrays containing the [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]] vertices of the polygon.

 

Topdown Polygon Annotation

The TopdownPolygonAnnotation type represents a topdown 2D polygon annotation with elevation data. Points in a topdown polygon don’t need to lie on a plane. It has the following fields:

  • id : The unique identifier of the annotation.

  • type : The string polygon_topdown indicating this is a polygon annotation.

  • parent_id (optional): The id of a parent annotation.

  • stationary (optional): A boolean indicating whether the object is stationary or not.

  • label (optional): A string representing the label of the object.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

  • sensor_id (optional): The unique identifier of the sensor if the annotation is sensor-specific.

  • path : An object that defines the path of the polygon annotation, with the following fields:

    • timestamps: The timestamps of the path.

    • values: An array of arrays containing the [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]] vertices of the polygon.

Polyline Annotation

The PolylineAnnotation type represents a polyline annotation in the scene. It has the following fields:

  • id : The unique identifier of the annotation.

  • type : The string polyline indicating this is a polyline annotation.

  • is_closed (optional): Whether or not this annotation is closed. If true, the first and last vertices will be connected to represent a “3D polygonal loop”

  • parent_id (optional): The id of a parent annotation.

  • stationary (optional): A boolean indicating whether the object is stationary or not.

  • label (optional): A string representing the label of the object.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

  • sensor_id (optional): The unique identifier of the sensor if the annotation is sensor-specific.

  • path : An object that defines the path of the polyline annotation, with the following fields:

    • timestamps: The timestamps of the path.

    • values: An array of arrays containing the [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]] vertices  vertices of the polyline.

Points Annotation / Keypoints

Both LidarTopdown 3D points and LidarAnnotation keypoints are represented by this

  • id : The unique identifier of the annotation.

  • type : The string points indicating this is a point annotation.

  • parent_id (optional): The id of a parent annotation.

  • stationary (optional): A boolean indicating whether the object is stationary or not.

  • labels (optional): An array of strings representing the label of  each point, respecting the index of each point.

  • attributes (optional): An array of AttributePath objects along with an additional point_index field .

  • sensor_id (optional): The unique identifier of the sensor if the annotation is sensor-specific.

  • paths : An object that defines the path of the point annotation, with the following fields:

    • id: Unique identifier for each point within the annotation (optional in old files).

    • timestamps: The timestamps of the path.

    • values: An array of arrays containing the [x, y, z] coordinates of the points.

  • projections (optional): An array of arrays containing objects that define per-sensor projections of the annotation, with the following fields:

    • sensor_id: The unique identifier of the 2D sensor for which the cuboid is projected.

    • timestamps: An array of camera timestamps for each cuboid projection.

    • points: An array of arrays containing the [x, y]  components of the 2D bounding box of the object in the image. It could contain undefined if the point could not be projected or was deleted by the user.

    • confirmed (optional): An array of boolean values indicating whether a value is confirmed or not by a user.

    • positions (optional): An array of arrays containing the [x, y, z]  components of the point for each projection timestamp.

Event Annotation

The EventAnnotation type represents an event. It has the following fields:

  • id : The unique identifier of the annotation.

  • type : The string event indicating this is a event annotation.

  • parent_id (optional): The id of a parent annotation.

  • label (optional): A string representing the label of the object.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

  • start: The timestamp of the start of the event.

  • duration (optional): The duration of the event

  • sensor_id: The unique identifier of the sensor if the event is sensor-specific.

Labeled Points Annotation (LSS)

The LabeledPointsAnnotation interface represents an annotation of labeled points in a lidar sensor.

  • id: A unique identifier for the annotation.

  • type: The string labeled_points indicating this is a cuboid annotation.

  • parent_id (optional): The id of a parent annotation.

  • label: a string representing the label assigned to the points in the annotation.

  • is_instance: a boolean value indicating whether the annotation represents an instance or a class.

  • labeled_points: an array of objects representing the labeled points grouped by sensor and frame.
    Each object contains:

    • sensor_id: the unique identifier for the sensor containing the labeled points.

    • sensor_frame (optional):the frame number of the sensor, if the sensor has frames.

    • point_ids: a Uint32Array containing the indices of the labeled points in the sensor frame.

Localization Adjustment Annotation

The LocalizationAdjustmentAnnotation represents a PosePath applied a scene to fix localization issues or convert from ego to world coordinates.

  • id : The unique identifier of the annotation.

  • type : The string localization_adjustment indicating this is a localization adjustment annotation.

  • parent_id (optional): The id of a parent annotation.

  • poses : A PosePath object that defines the poses of the adjustment.

Camera Calibration Annotation

The CameraCalibrationAnnotation represents .

  • id : The unique identifier of the annotation.

  • type : The string camera_calibration indicating this is a camera calibration annotation.

  • sensor_id : The unique identifier of the camera

  • time_offset (optional): Timestamp offset to apply to the camera. This is used if the camera timestamps are obviously not correct, but can be corrected with minor timestamps changes.

  • parent_id (optional): The id of a parent annotation.

  • poses : A PosePath object that representing the diff between the initial camera and calibrated camera extrinsics. These poses should be applied after the current camera extrinsics are applied.

  • intrinsics: Intrinsics for the calibrated camera

Object Annotation

The ObjectAnnotation  represents an object within a scene, serving as a way to group related annotations associated with a specific object.

  • id : The unique identifier of the annotation.

  • type : The string event indicating this is an event annotation.

  • parent_id (optional): The id of a parent annotation.

  • label (optional): A string representing the label of the object.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

The LinkAnnotation represents a link between two annotations. This used to represent relationships between two annotations.

  • id : The unique identifier of the annotation.

  • type : The string link indicating this is a link annotation.

  • label: A string representing the label of the object.

  • is_bidirectional: Whether this link is a bidirectional relationship

  • from_id : The id of the annotation that this links from

  • to_id: The id of the annotation this links to

  • parent_id (optional): The id of a parent annotation.

  • attributes (optional): An array of AttributePath objects that define the attributes of the annotation and per-sensor attributes.

Group Annotation

The GroupAnnotation represents a group to which multiple child annotations can belong to, via parent_id.

  • id : The unique identifier of the annotation.

  • type : The string group indicating this is a link annotation.

  • label (optional): A string representing the label of the object.

  • parent_id (optional): The id of a parent annotation.

Scene (Root Scene)

A Scene is an interface that represents a sensor fusion scene. It has the following fields:

  • version: A string representing the version of the scene format. It should be 1.0

  • sensors(optional): An array of sensor objects that describe the sensors in the scene.

  • annotations(optional): An array of annotation objects that describe the annotations in the scene.

  • attributes (optional): An array of AttributePath objects that describe the attributes of the scene-level and sensor-level attributes.

  • time_offset (optional): Scene-level field used to set a reference point in time after making the scene relative to that specific moment.

AnnotationPath

This is used as the path key in all geometric annotations. It captures the geometry per timestamp:

  • timestamps: List[int] - the timestamps of the path

  • values: List[List[float]] - An array of arrays containing [[1st timestamp's x_0, y_0, x_1, y_1, ..., x_n, y_n], [2nd timestamp's x_0, y_0, ...]]

Attributes

Attributes are used to add additional information to annotations. The value of an attribute can be a string, number, or an array of strings.

Attribute Value

An attribute value can be string, number or string[].

Attribute Path

An AttributePath represents the values of an attribute at different timestamp. It has the following fields:

  • name: A string representing the name of the attribute.

  • sensor_id (optional): The unique identifier of the sensor if the attribute is sensor-specific.

  • static (optional): A boolean indicating whether the attribute is static or not. Default is false.

  • timestamps : An array of timestamps for each value.

  • values : An array of AttributeValue representing the values of the attribute at each timestamp.

Updated 3 months ago