TextCollectionAttachment

An array of TextCollectionAttachment objects to be labeled.

Video Support

The video attachment should have

content

that is a link. Supported media types are listed on the MDN Web Docs.

HTML Support in TextCollection Attachments:

When creating a task in TextCollection, customers are able to pass Markdown as the string content. Markdown also allows the use of HTML tags within the Markdown syntax.

However, to ensure the security of the TextCollection platform, we sanitize all HTML tags passed within the Markdown syntax using the HTML-sanitize JavaScript package. This package removes all tags except for the specific set of allowed HTML tags mentioned on the table to the right.

By allowing only these specific HTML tags to be passed through the string, we ensure that the content displayed to the tasker is secure and adheres to our standards. Any HTML tags that are not included in the list of allowed tags will be removed from the string during the sanitization process.

By sanitizing the HTML tags, we prevent any potential security risks that could arise from the use of unauthorized HTML tags, and maintain a high level of security on our platform.

Parameter	Type	Description
type*	string	One of `pdf`, `image`, `text`, `video`, `website`, or `audio`.
content*	string	Content or link to relevant file.
forms	array	Array of `field_id` strings from `FormField`. If this value is set, only show the corresponding attachment if one of the referenced form fields is active.

HTML tags allowed:

Content sectioning	'address', 'article', 'aside', 'footer', 'header','h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hgroup', 'main', 'nav', 'section'.
Text content	'blockquote', 'dd', 'div', 'dl', 'dt', 'figcaption', 'figure', 'hr', 'li', 'main', 'ol', 'p', 'pre', 'ul',
Inline text semantics	'a', 'abbr', 'b', 'bdi', 'bdo', 'br', 'cite', 'code', 'data', 'dfn', 'em', 'i', 'kbd', 'mark', 'q', 'rb', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'time', 'u', 'var'
Table content	'caption', 'col', 'colgroup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr'
Additional Tags	'img', 'iframe'

UnitField

objects define simple components for data collection.

Conditional Fields

Sometimes a field should only be presented if specific choices are selected for other fields. In these cases, you can specify the conditions — the dependent questions and corresponding sets of choices.

The

conditions

property should have the following structure: an array of objects, which define one set of conditions allowing the field to be shown. The operators AND (

{ }

), OR (

[ ]

), and NOT (

not

) are supported, so you could specify an arbitrary set of fields and choices. Each set may contain objects or arrays with the following:

Key: the
field_id
of the dependent field
Value: an object specifying the desired choices for the dependent field.

For example conditions, please check out the code on the right.

Conditions currently only work with dependent fields of type CategoryField. It is valid syntax on other fields, but may raise errors or undefined behavior.

Parameters

typestringrequired

One of text, boolean, number, datetime, or category, select, time_range.

field_idstringrequired

A unique identifier for the field, which should not change among tasks within a project.

titlestringrequired

Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project. Must not be an empty string.

descriptionstring

A brief description about what the response should be. This may change among tasks within a project.

hintstring

Longer explanation of why the field exists and how it should be used. Renders as a tooltip.

requiredboolean

Determines whether or not a response for this field is required. The default is false.

min_responses_requiredinteger

The minimum number of separate annotations allowed for this field. Must be larger than 0. The default is 1.

max_responses_requiredinteger

The maximum number of separate annotations allowed for this field. Must be larger than or equal to min_responses_required, with an upper bound of 100. The default is 1.

conditionsarray of objects

A set of conditions which must be satisfied for this field to be shown. Default is undefined.

Additional Fieldsobject

See the TextField, BooleanField, NumberField, DatetimeField, and CategoryField sections.

Example

// Example of UnitField with conditions
{
  type: "category",
  field_id: "occlusion",
  title: "Is there occlusion in the image?",
  choices: [{label: 'None', value: '0' },
            {label: 'A little', value: '1'},
            {label: 'A lot', value: '2'}],
  conditions: [{}],
},
{
  type: "category",
  field_id: "occlusion_detail",
  title: "What is the cause of the occlusion?",
  choices: [{label: 'Rain', value: 'rain'},
            {label: 'Shadow', value: 'shadow'}],
  conditions: [{
    occlusion: ['1', '2'], // show if 1 or 2 are selected
    // equivalently {not: [[], ['0']}
    // equivalently [{not: []}, {not: ['0']}]
    // equivalently [['1'],['2']]
  }],
},
{
  type: "text",
  field_id: "a_lot_of_shadow",
  title: "Please describe why there is so much shadow.",
  conditions: [{
    // show if 2 and shadow are selected in their respective fields
    occlusion: ['2'], 
    occlusion_detail: ['shadow'],
  }],
},

TextField

Subclass of UnitField and returns a

string

response.

Parameters

max_charactersinteger

The maximum number of characters allowed in the field.

show_word_counterboolean

To display word count in text fields, we can include `show_word_count = true` in the text field’s object.

show_markdown_previewboolean

To enable a markdown preview for the text field, we can include `show_markdown_preview = true` in the text field’s object.

max_tokensinteger

To enable maximum word counts to a specific text field, we can include `max_tokens = 1000` to set the maximum words in a text response to be 1000 words.

min_tokensinteger

To enable minimum and maximum word counts to a specific text field, we can include `min_tokens = 100` to set the minimum words in a text response to be 100 words.

disable_pastingboolean

To disable copying and pasting to a specific text field, we can include `disable_pasting = true`.

BooleanField

Subclass of UnitField and returns a

boolean

response. Has no additional parameters.

NumberField

Subclass of UnitField and returns a

string

response based on the annotated number.

Parameters

use_sliderboolean

Set to true to use a slider instead of textbox.

minfloat

Sets the minimum value of the slider.

maxfloat

Sets the maximum value of the slider.

stepfloat

Sets the step value of the slider.

prefixstring

A string label for the lowest numerical value response.

suffixstring

A string label for the greatest numerical value.

mid_labelstring

A string label for the middle numerical value.

Example

{
  "type": "number",
  "field_id": "item_price",
  "title": "Item Price",
  "description": "Leave empty if not applicable.",
  "required": false,
  "use_slider": true,
  "min": 0,
  "max": 100
}

DatetimeField

Subclass of UnitField and returns a

DatetimeAnnotation

response.

Definition:
DatetimeSpec

An enum that consists of

year

month

day

hour

, and

minute

Definition:
DatetimeAnnotation

An interface that contains optional number fields including

year

month

day

hour

, and

minute

Parameters

includearray of objectsrequired

An array of DatetimeSpec elements. Must contain at least one element.

Example

{
  "type": "datetime",
  "field_id": "release_date",
  "title": "Date of Product Release",
  "description": "Leave empty if not applicable.",
  "include": ["year", "month", "day"],
  "defaults": {
    "year": 2021,
    "month": 4,
    "day": 13
  }
}

CategoryField

Subclass of UnitField and returns an array of selected

CategoryChoiceValue

elements in its response.

CategoryChoice

elements with subchoices are only used for navigation. The only selectable

CategoryChoice

elements are those with no subchoices.

Parameters

choicesarray of objectsrequired

An array of CategoryChoice elements to define the relevant choice.

min_choicesinteger

Minimum number of choices to select.

max_choicesinteger

Maximum number of choices to select. If this value is greater than 1, the form renders a checkbox. Otherwise, it renders a radio button.

CategoryChoice

labelstringrequired

The label of the choice field. This description may change among tasks within a project.

CategoryChoiceValuearray of objects

The value of the choice field. Must be a string, number, or boolean.

hintstring

The tooltip text shown for this choice.

subchoicesarray of objects

An array of CategoryChoice elements to define the relevant subchoices.

Example

{
  "type": "category",
  "field_id": "genre",
  "title": "Select all genres that apply.",
  "choices": [
    {
      "label": "Hip-Hop/Rap",
      "value": "hip-hop-rap",
      "hint":
        "It consists of a stylized rhythmic music that commonly accompanies rapping, a rhythmic and rhyming speech that is chanted.",
      "subchoices": [
        { "label": "Dirty South", "value": "dirty-south" },
        { "label": "Industrial Hip Hop", "value": "industrial-hip-hop" },
        { "label": "Nerdcore", "value": "nerdcore" },
        { "label": "Rap", "value": "rap" },
      ]
    },
        {
      "label": "R&B/Soul",
      "value": "rb-soul",
      "subchoices": [
        { "label": "Disco", "value": "disco" },
        { "label": "Funk", "value": "funk" },
        { "label": "Motown", "value": "motown" },
      ]
        },
  ],
  "min_choices": 1,
  "max_choices": 5
}

TimerangeField

Subclass of UnitField.

Parameters

default_secondsarray of integersrequired

Must have length 2, and be in range [0, 24 * 60 * 60]

increment_secondsinteger

Must be between 1 and 60 * 60

default_from_fieldstring

Must be a valid field_id

Example

{
  "type": "time_range",
  "field_id": "hours",
  "title": "Store Hours",
  "defaults_seconds": [
    28800,
    72000
  ],
  "increment_seconds": 300,
  "max_responses_required": 2, 
  "min_responses_required": 0
}

SelectField

Subclass of UnitField.

Parameters

choicesarray of objects

An array of selectable options, choices is not required if choices_from_field is present.

choices_from_fieldstring

Must be a valid field_id

Example

{
  "type": "select",
	"field_id": "sentiment",
  "title": "Sentiment",
  "description": "Choose a sentiment that best describes this text",
  "required": True,
  "choices_from_field": "Options",
}

RankingField

objects allow you to define task to rank task attachments.

Returns a

list

response with ordered options.

Parameters

titlestring

A brief description about what the response should be. This may change among tasks within a project.

hintstring

An array of child UnitField and FieldSet objects. Must contain at least 2 elements.

first_labelstring

Determines whether or not all.

last_labelstring

num_items_to_rankinteger

The number of options required to rank (can be less than number of attachments).

requiredboolean

Determines whether or not all num_items_to_rank fields should filled.

Example

{
	"type": "ranking_order",
  "field_id": "relevance_ranking",
  "title": "Rank titles based on their relevance to the article",
  "hint": "From the most relevant to the least one",
  "first_label": "Best",
  "last_label": "Worst",
  "num_items_to_rank": 3
}

FormField

objects allow you to create several mini-forms associated with different attachments. These mini-forms will be populated by the object's child fields.

Returns a

dictionary

response with key-value pairs defined by its child fields.

Parameters

typestringrequired

For FormField Objects, this should be set to form

field_idstringrequired

A unique identifier for the field, which should not change among tasks within a project.

titlestringrequired

Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project.

descriptionstring

A brief description about what the response should be. This may change among tasks within a project.

fieldsarray of objectsrequired

An array of child UnitField and FieldSet objects. Any FieldSet objects here must have incline set to true

📘Note
FormField
objects can only be located on the top level of the
fields
task parameter. If one
FormField
object is used, all the other top-level objects must also be
FormField
objects.

Example

{
  "type": "form",
  "field_id": "form_query",
  "title": "Query Intention",
  "fields": [
    {
      "type": "text",
      "field_id": "query_intention",
      "title": "Query Intention",
      "hint": "Please investigate the search links."
    },
  ]
}

Text Collection Response Format

The

response

object, which is part of the callback POST request and permanently stored as part of the task object, will have an

annotations

field. The

annotations

object is a dictionary in which each key is a

field_id

defined in the task parameters and each value is the respective annotation for that field.

Each annotation will be of the type defined by its field above. If

max_responses_required

is applicable and greater than 1, the annotation will be an array of the type.

📘Note
See the Callback section for more details about callbacks.

Example

{
  "response": {
    "annotations": {
      "category_name": "Soup", //TextField
      "category_items": [ //FieldSet with max_responses_required greater than one
        {
          "item_name": "Tom Yum Chicken Soup", //TextField
          "item_price": "11.79" //NumberField
        },
        {
          "item_name": "Tom Yum Beef Soup", //TextField
          "item_price": "11.79" //NumberField
        }
      ],
      "category_metadata": { //FieldSet
        "gluten_friendly": true, //BooleanField
        "labels": [ //TextField with max_responses_required greater than one
          "Free Range", 
          "All Natural"
        ] 
      }
    }
  },
  "task_id": "5774cc78b01249ab09f089dd",
  "task": {
    // populated task for convenience
  }
}

Text Collection Hypothesis

When creating a

textcollection

task, you can provide prelabels in the

hypothesis

field, so that workers don't have to start from scratch to annotate the image.

In order to add pre-labels in a task using hypothesis, you’ll need to provide these in the

hypothesis

field of the payload when creating the task. The schema of the hypothesis object must match the schema of the task response.

Verify the task response field schema for the desired task type.
Review your project taxonomy (label names, attribute conditions, annotation types, etc).
Generate pre-labels that are formatted to match the aforementioned schema and taxonomy.
Create a task, including a hypothesis field that contains the pre-labels at the same top-level as other task fields such as project and instructions.

The hypothesis format will largely mirror Scale’s task response format. In this particular task type,

annotations

field is mandatory inside the hypothesis object.

The only difference between

hypothesis

and the

response

format is that inside every field you want to pre-annotate, you'll need to add two more field fields:

type

describes the field type (category, select, text, etc.)

field_id

describes the identification given to this field for tracking (field name)

You can find these two fields in your task taxonomy

Note: For Text types fields the response format differs from the other types. For this particular field type,

response

field will be an array of a single string instead of an array of arrays containing strings.

task_payload_with_hypothesis

{
 ...
 "batch": "regular_batch_name",
 "hypothesis": {
   "annotations": {
     "(EXAMPLE) Multiple Choice Question": {
       "type": "category",
       "field_id": "(EXAMPLE) Multiple Choice Question",
       "response": [
         [
           "B"
         ]
       ]
     }
   }
 },
 ...
}

task_taxonomy

{
   "fields": [
     {
       "type": "category",
       "field_id": "(EXAMPLE) Multiple Choice Question",
       "title": "Which option best fits this task?",
       "choices": [
         {
           "label": "A",
           "value": "A"
         },
         {
           "label": "B",
           "value": "B"
         },
         {
           "label": "C",
           "value": "C"
         }
       ],
       "min_choices": 1,
       "max_choices": 1,
       "description": "Select one of the following. "
     }
   ]
 }

task_payload_with_hypothesis_text_field

{
   ...
   "hypothesis": {
       "annotations": {
           "Product Description": {
               "type": "text",
               "field_id": "(EXAMPLE) Text Input Field",
               "response": [
                   "Dolore in dolor occaecat deserunt ex in qui non amet est."
               ]
           }
       }
   }
   ...
}

NamedEntityRecognitionLabel

objects define the taxonomy of labels to use to annotate spans of text.

Parameters

namestringrequired

A unique identifier for this label.

display_namestring

An alias for this label to display to taskers.

descriptionstring

A description of what this label should represent. Displayed to taskers to improve quality.

childrenarray of objects

An array of NamedEntityRecognitionLabel objects to group underneath this label. Specifying this field causes this label itself to no longer be used for labeling text spans.

attributes (optional)object

NamedEntityRecognitionAttribute

objects define form fields for individual annotations.

Parameters

typestring

Only 'select' for now.

optionsarray of objects

List of select option objects.

display_namestring

Optional display name.

descriptionstring

Optional description.

AttributeSelectOption

objects define possible values for select attributes.

Parameters

valuestring

The value that will show up in the response if this option is selected.

display_namestring

Optional display name if different from the value.

NamedEntityRecognitionRelationshipDefinition

objects specify the types of relationship that can exist between two text spans.

A relationship can either be named or unnamed. A named relationship is useful if you need to distinguish between multiple types of relationship that could exist between the same two text spans. For instance, if you're annotating a description of someone's family history, you might want to distinguish a "child of" relationship from a "sibling of" relationship.

A task can only specify one type of relationship. Either all the relationships in a task must be named, or all must be unnamed.

Parameters

namestring

A unique identifier for this type of relationship. Required for named relationships; disallowed for unnamed relationships.

display_namestring

A description for this relationship to display to taskers. Should be able to be used to construct a short phrase describing the relationship. For example, a relationship between two text spans "A" and "B" with display_name "is parent of" would be rendered to taskers as "A is parent of B". Required for named relationships; disallowed for unnamed relationships.

is_directedboolean

A field indicating whether the directionality of this relationship matters. For example, a "is parent of" relationship would likely be directed, whereas a "is sibling of" relationship would likely not be directed. Optional for named relationships; disallowed for unnamed relationships.

source_labelstring

A string referencing the name field of a NamedEntityRecognitionLabel object. If set, mandates that the source text span of this field must be labeled with the corresponding NamedEntityRecognitionLabel, or one of its children. Optional for both named and unnamed relationships.

target_labelstring

A string referencing the name field of a NamedEntityRecognitionLabel object. If set, mandates that the target text span of this field must be labeled with the corresponding NamedEntityRecognitionLabel, or one of its children. Optional for both named and unnamed relationships.

Named Entity Recognition Callback Format

The

response

object is part of the callback POST request and is permanently stored as part of the task object.

NamedEntityRecognitionResponse

The structure of a response object for named entity recognition consists of two arrays: one for entity annotations and another for relationships between these entities.

NamedEntityRecognitionAnnotation

The format for an individual entity annotation within the named entity recognition response, detailing the unique identifier, position, and content of the recognized text span, as well as its label and any optional attributes.

NamedEntityRecognitionRelationship

In tasks with undirected relationships, the

source_ref

and

target_ref

fields are interchangeable. In tasks with links that do not have relationship names, the

name

field will be left blank.

Example

{
  "annotations": [
    {
      "id": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
      "start": 10,
      "end": 17,
      "text": "Alex Wang",
      "label": "person",
    },
    {
      "id": "a76da53e-4ebd-4466-aed7-80db6fb98329",
      "start": 22,
      "end": 31,
      "text": "Transform",
      "label": "conference",
    }
  ],
  "relationships": [
    {
      "id": "ade8e9e9-ef9c-4fc7-9517-62d79a15c1cb",
      "source_ref": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
      "target_ref": "a76da53e-4ebd-4466-aed7-80db6fb98329",
      "name": "speaker_at",
    }
  ]
}

NamedEntityRecognitionResponse

Field	Type	Description
annotations	object array	List of `NamedEntityRecogntionAnnotation` objects.
relationships	object array	List of `NamedEntityRecognitionRelationship` objects.

NamedEntityRecognitionAnnotation

Field	Type	Description
id	string	Unique identifier.
start	number	Start index of the text span.
end	number	End index of the text span.
text	string	Text of the text span.
label	string	References the `name` field of a label in the task params.
attributes (optional)	object	The keys of the object reference keys of the `attributes` object for the corresponding label in the task params.

NamedEntityRecognitionRelationship

Field	Type	Description
id	string	Unique identifier.
source_ref	string	References the `id` of the annotation that is the source of the directed relationship.
target_ref	string	References the `id` of the annotation that is the target of the directed relationship.
name (optional)	string	References the `name` of relationship definitions in the task params.

TextCollectionAttachment

Video Support

HTML Support in TextCollection Attachments:

UnitField

Conditional Fields

Parameters

TextField

Parameters

BooleanField

NumberField

Parameters

DatetimeField

Definition: DatetimeSpec

Definition: DatetimeAnnotation

Parameters

CategoryField

Parameters

CategoryChoice

TimerangeField

Parameters

SelectField

Parameters

RankingField

Parameters

FormField

Parameters

📘Note

Text Collection Response Format

📘Note

Text Collection Hypothesis

NamedEntityRecognitionLabel

Parameters

Parameters

Parameters

NamedEntityRecognitionRelationshipDefinition

Parameters

Named Entity Recognition Callback Format

Definition:
DatetimeSpec

Definition:
DatetimeAnnotation