Taxonomy Chunking with Rapid
Rapid has an optional special pipeline for image annotation and text collection to increase quality and throughput on tasks with large taxonomies. The process is called ‘taxonomy chunking’, and it involves breaking down large taxonomies into multiple smaller independent subtasks which can be worked on in parallel. The results are then combined back together to create the final response. The pipeline involves two or three stages:- The chunk stage, where many labelers work in parallel on annotating different, independent chunks of the taxonomy on the same attachment(s). The majority of the work (drawing and adding attributes) is done in the chunk stage. These chunks are then combined back together automatically.
- The combination review stage, where a single labeler reviews the task as a whole. This labeler’s main job is to ensure that there are not any large inconsistencies in the task, as well as adding annotations that pertain to the entire task. An example of this would be labeling global attributes that apply to all the labels together.
- The final review stage, only used in image annotation for drawing links between annotations. All linking is done in this stage, since in previous stages labelers do not have access to all possible annotations that may need linking.
Quality Task Stages
As with other Rapid projects, labelers will be served quality tasks for training and performance evaluation. However, with taxonomy chunking projects, labelers will be trained and evaluated on a specific stage of the pipeline. This means when you create quality tasks, a set of child tasks will be generated for each stage automatically. Typically, you can just modify the parent quality task as you need to make changes to all the child tasks. Advanced users may wish to explore the child quality tasks and make specific edits to them. To view the child tasks, go to the Quality Lab and open a set of training or evaluation tasks.