Taxonomy Chunking with Rapid

Rapid has an optional special pipeline for image annotation and text collection to increase quality and throughput on tasks with large taxonomies.

The process is called 'taxonomy chunking', and it involves breaking down large taxonomies into multiple smaller independent subtasks which can be worked on in parallel. The results are then combined back together to create the final response. The pipeline involves two or three stages:

  1. The chunk stage, where many labelers work in parallel on annotating different, independent chunks of the taxonomy on the same attachment(s). The majority of the work (drawing and adding attributes) is done in the chunk stage. These chunks are then combined back together automatically.

  2. The combination review stage, where a single labeler reviews the task as a whole. This labeler's main job is to ensure that there are not any large inconsistencies in the task, as well as adding annotations that pertain to the entire task. An example of this would be labeling global attributes that apply to all the labels together.

  3. The final review stage, only used in image annotation for drawing links between annotations. All linking is done in this stage, since in previous stages labelers do not have access to all possible annotations that may need linking.

You can use the taxonomy chunking pipeline from any image annotation or text collection task by specifying a param_chunks as a list of lists in your taxonomy. These lists should contain labels and global attributes to include in each chunk: note that links and annotation attributes should be excluded. In addition, each label and global attribute should appear in exactly one list.

For example taxonomies, see the bottom of this page.

Quality Task Stages

As with other Rapid projects, labelers will be served quality tasks for training and performance evaluation. However, with taxonomy chunking projects, labelers will be trained and evaluated on a specific stage of the pipeline. This means when you create quality tasks, a set of child tasks will be generated for each stage automatically.

Typically, you can just modify the parent quality task as you need to make changes to all the child tasks. Advanced users may wish to explore the child quality tasks and make specific edits to them. To view the child tasks, go to the Quality Lab and open a set of training or evaluation tasks.

18701870

In this view, the parent scenarios are the completed parent quality tasks that have been combined. It is preferred to make edits through the parent scenarios.

The first two stage views (1 and 2) correspond with the two given lists in param_chunks. The last stage view (3) corresponds to the final review task.

From here, you can proceed with viewing and editing the child quality tasks. Note that the stage 3 (final review) tasks are only in review phase evaluation tasks and not initial phase, as they are review only tasks.

Example Image Annotation Taxonomy

Consider the following taxonomy:

geometries: {
  box: {
    objects_to_annotate: [
      'plastic_bag',
      'vegetable',
      { choice: 'fruit', subchoices: ['apple', 'banana'] },
    ],
    min_width: 0,
    min_height: 0,
    examples: [],
  },
  polygon: {
    objects_to_annotate: ['camera'],
  },
},
annotation_attributes: {
  color: {
    choices: ['green', 'white', 'yellow'],
    description: 'color',
  },
  glare: {
    description: 'Is there glare on the image?',
    choices: ['sun glare', 'night headlight glare', 'none'],
    conditions: {
      is_global: true,
    },
  },
},
links: {
  'Camera Sees': [
    {
      is_bidirectional: false,
      from_allowed_labels: ['camera'],
      to_allowed_labels: ['vegetable', 'apple', 'banana'],
    },
  ],
  Contains: [
    {
      is_bidirectional: false,
      from_allowed_labels: ['plastic_bag'],
      to_allowed_labels: ['vegetable', 'apple', 'banana'],
    },
  ],
}

An example valid param_chunks could be:

param_chunks: [
  ['plastic_bag', 'fruit', 'glare'],
  ['vegetable', 'camera'],
]

Example Text Collection Taxonomy

Consider the following taxonomy:

{
  "fields": [
    {
      "type": "category",
      "field_id": "is_green",
      "title": "Does the picture include the color green?",
      "required": true,
      "choices": [
        {
          "label":"Yes",
          "value":"yes",
        },
        {
          "label":"No",
          "value":"no",
        }
      ],
    },
    {
      "type": "text",
      "field_id": "description",
      "title": "Describe this picture."
    },
    {
      "type": "field_set",
      "field_id": "vehicle",
      "title": "A vehicle in the image",
      "fields": [
        {
          "type": "text",
          "field_id": "vehicle_type",
          "title": "What type of vehicle is this?"
        },
        {
          "type": "number",
          "field_id": "num_wheels",
          "title": "How many wheels does this vehicle have?"
        }
      ]
    }
  ]
}

An example valid param_chunks could be:

param_chunks: [
  ['is_green', 'description'],
  ['vehicle']
]
Updated 4 days ago