Get Data Annotated

In order to get your data in the hands of your annotators, you will need to create a Batch

A batch is merely a collection of tasks that you want to get labelled. It can be a subset of all the data you have uploaded, or the entire library. It is up to your discretion what you put in a batch, how many batches you have, etc. Please note that your annotators will only see tasks in their queue if you have created a batch -- simply having your data uploaded onto the platform will not automatically start the labeling process.

2 Different Types of Batches: Calibration & Production

Calibration Batches: Use these if you want to get feedback on your instructions from your annotators before scaling up to larger batch quantities. If you already have a team of labelers who are already well-versed in labeling your task, then feel free to skip making a calibration batch.

  • Select a small set of tasks (we recommend 20) that you'd like to get feedback on
  • Your labelers will have a comments field where they can write notes on the task instructions, such as areas of ambiguity, or unaddressed cases.
  • With this feedback, you can then go back and modify your instructions to make things more clear.
  • Repeat this process as many times as is helpful

Production Batches: Use these when you are fully ready to get your data labeled. If you are happy with your instructions and feel confident that labelers will be able to understand how to label your tasks with high accuracy, then you are ready for a production batch.

If you just want to test out the platform yourself, you can create either a Calibration or a Production batch and label it yourself.

Selecting Assets in a Batch

It is up to you how many tasks you'd like to include in each batch, and how you'd like to segment your batches. For example, if you have a constant stream of data coming in, you could split out your batches by day. Or, if there are certain tasks that you think are harder that you want to separate out, you can create a batch of just "hard tasks".

There is no limit to the number of tasks you can include in each batch. However, if you are running a calibration batch, we generally suggest you include ~20 tasks in the batch that are ideally representative of the larger dataset.

You can select data you want included in a batch in 3 ways:
1) You can choose your own tasks from your uploaded assets
2) You can have Scale select your tasks for you from your uploaded assets
3) You can directly upload a set of data into a batch

Selecting Assets Yourself:

Select Assets via Scale:

  • You can ask Scale to choose for you in 1 of 3 ways: Random selection, based on first uploaded, and based on last uploaded
  • You can then specify the number of tasks you'd like to include in the batch

Uploading Data Directly Into a Batch: Instead of filtering through a pre-uploaded data set for the tasks you want, you can just upload them as a batch when you are ready.

Other Tips & Tricks

Filtering out previously included tasks: You can select this nifty button to exclude any tasks you may have already sent out for production.

--

Querying your data: If you have metadata that you want to filter on to help you create your batches, you can leverage the query search tool build into the platform. For example, you're looking for all tasks uploaded from the same batch upload. Or you're looking for a specific file to get labelled. You can filter down your search by using the query functionality!

Updated 9 days ago