TextCollectionAttachment

An array of TextCollectionAttachment objects to be labeled.

Video Support

The video attachment should have

content
that is a link. Supported media types are listed on the MDN Web Docs.

HTML Support in TextCollection Attachments:

When creating a task in TextCollection, customers are able to pass Markdown as the string content. Markdown also allows the use of HTML tags within the Markdown syntax.

However, to ensure the security of the TextCollection platform, we sanitize all HTML tags passed within the Markdown syntax using the HTML-sanitize JavaScript package. This package removes all tags except for the specific set of allowed HTML tags mentioned on the table to the right.

By allowing only these specific HTML tags to be passed through the string, we ensure that the content displayed to the tasker is secure and adheres to our standards. Any HTML tags that are not included in the list of allowed tags will be removed from the string during the sanitization process.

By sanitizing the HTML tags, we prevent any potential security risks that could arise from the use of unauthorized HTML tags, and maintain a high level of security on our platform.

Parameter

Type

Description

type*

string

One of pdf, image, text, video, website, or audio.

content*

string

Content or link to relevant file.

forms

array

Array of field_id strings from FormField. If this value is set, only show the corresponding attachment if one of the referenced form fields is active.

HTML tags allowed:

Content sectioning

'address', 'article', 'aside', 'footer', 'header','h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hgroup', 'main', 'nav', 'section'.

Text content

'blockquote', 'dd', 'div', 'dl', 'dt', 'figcaption', 'figure', 'hr', 'li', 'main', 'ol', 'p', 'pre', 'ul',

Inline text semantics

'a', 'abbr', 'b', 'bdi', 'bdo', 'br', 'cite', 'code', 'data', 'dfn', 'em', 'i', 'kbd', 'mark', 'q', 'rb', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'time', 'u', 'var'

Table content

'caption', 'col', 'colgroup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr'

Additional Tags

'img', 'iframe'

UnitField

UnitField
objects define simple components for data collection.

Conditional Fields

Sometimes a field should only be presented if specific choices are selected for other fields. In these cases, you can specify the conditions — the dependent questions and corresponding sets of choices.

The

conditions
property should have the following structure: an array of objects, which define one set of conditions allowing the field to be shown. The operators AND (
{ }
), OR (
[ ]
), and NOT (
not
) are supported, so you could specify an arbitrary set of fields and choices. Each set may contain objects or arrays with the following:

  • Key: the

    field_id
    of the dependent field

  • Value: an object specifying the desired choices for the dependent field.

For example conditions, please check out the code on the right.

Conditions currently only work with dependent fields of type CategoryField. It is valid syntax on other fields, but may raise errors or undefined behavior.

Parameters

typestringrequired

One of text, boolean, number, datetime, or category, select, time_range.


field_idstringrequired

A unique identifier for the field, which should not change among tasks within a project.


titlestringrequired

Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project. Must not be an empty string.


descriptionstring

A brief description about what the response should be. This may change among tasks within a project.


hintstring

Longer explanation of why the field exists and how it should be used. Renders as a tooltip.


requiredboolean

Determines whether or not a response for this field is required. The default is false.


min_responses_requiredinteger

The minimum number of separate annotations allowed for this field. Must be larger than 0. The default is 1.


max_responses_requiredinteger

The maximum number of separate annotations allowed for this field. Must be larger than or equal to min_responses_required, with an upper bound of 100. The default is 1.


conditionsarray of objects

A set of conditions which must be satisfied for this field to be shown. Default is undefined.


Additional Fieldsobject

See the TextField, BooleanField, NumberField, DatetimeField, and CategoryField sections.


Example

// Example of UnitField with conditions
{
  type: "category",
  field_id: "occlusion",
  title: "Is there occlusion in the image?",
  choices: [{label: 'None', value: '0' },
            {label: 'A little', value: '1'},
            {label: 'A lot', value: '2'}],
  conditions: [{}],
},
{
  type: "category",
  field_id: "occlusion_detail",
  title: "What is the cause of the occlusion?",
  choices: [{label: 'Rain', value: 'rain'},
            {label: 'Shadow', value: 'shadow'}],
  conditions: [{
    occlusion: ['1', '2'], // show if 1 or 2 are selected
    // equivalently {not: [[], ['0']}
    // equivalently [{not: []}, {not: ['0']}]
    // equivalently [['1'],['2']]
  }],
},
{
  type: "text",
  field_id: "a_lot_of_shadow",
  title: "Please describe why there is so much shadow.",
  conditions: [{
    // show if 2 and shadow are selected in their respective fields
    occlusion: ['2'], 
    occlusion_detail: ['shadow'],
  }],
},

TextField

Subclass of UnitField and returns a

string
response.

Parameters

max_charactersinteger

The maximum number of characters allowed in the field.


show_word_counterboolean

To display word count in text fields, we can include `show_word_count = true` in the text field’s object.


show_markdown_previewboolean

To enable a markdown preview for the text field, we can include `show_markdown_preview = true` in the text field’s object.


max_tokensinteger

To enable maximum word counts to a specific text field, we can include `max_tokens = 1000` to set the maximum words in a text response to be 1000 words.


min_tokensinteger

To enable minimum and maximum word counts to a specific text field, we can include `min_tokens = 100` to set the minimum words in a text response to be 100 words.


disable_pastingboolean

To disable copying and pasting to a specific text field, we can include `disable_pasting = true`.


Example

{
  "type": "text",
  "field_id": "summary",
  "title": "Summary",
  "min_responses_required": 1,
  "max_responses_required": 3,
  "max_characters": 500,
  "required": true
}

BooleanField

Subclass of UnitField and returns a

boolean
response. Has no additional parameters.

Example

{
  "type": "boolean",
  "field_id": "availability",
  "title": "Item Availability",
  "description": "Choose true if available."
}

NumberField

Subclass of UnitField and returns a

string
response based on the annotated number.

Parameters

use_sliderboolean

Set to true to use a slider instead of textbox.


minfloat

Sets the minimum value of the slider.


maxfloat

Sets the maximum value of the slider.


stepfloat

Sets the step value of the slider.


prefixstring

A string label for the lowest numerical value response.


suffixstring

A string label for the greatest numerical value.


mid_labelstring

A string label for the middle numerical value.


Example

{
  "type": "number",
  "field_id": "item_price",
  "title": "Item Price",
  "description": "Leave empty if not applicable.",
  "required": false,
  "use_slider": true,
  "min": 0,
  "max": 100
}

DatetimeField

Subclass of UnitField and returns a

DatetimeAnnotation
response.

Definition:
DatetimeSpec

An enum that consists of

year
,
month
,
day
,
hour
, and
minute
.

Definition:
DatetimeAnnotation

An interface that contains optional number fields including

year
,
month
,
day
,
hour
, and
minute
.

Parameters

includearray of objectsrequired

An array of DatetimeSpec elements. Must contain at least one element.


Example

{
  "type": "datetime",
  "field_id": "release_date",
  "title": "Date of Product Release",
  "description": "Leave empty if not applicable.",
  "include": ["year", "month", "day"],
  "defaults": {
    "year": 2021,
    "month": 4,
    "day": 13
  }
}

CategoryField

Subclass of UnitField and returns an array of selected

CategoryChoiceValue
elements in its response.

CategoryChoice
elements with subchoices are only used for navigation. The only selectable
CategoryChoice
elements are those with no subchoices.

Parameters

choicesarray of objectsrequired

An array of CategoryChoice elements to define the relevant choice.


min_choicesinteger

Minimum number of choices to select.


max_choicesinteger

Maximum number of choices to select. If this value is greater than 1, the form renders a checkbox. Otherwise, it renders a radio button.


CategoryChoice

labelstringrequired

The label of the choice field. This description may change among tasks within a project.


CategoryChoiceValuearray of objects

The value of the choice field. Must be a string, number, or boolean.


hintstring

The tooltip text shown for this choice.


subchoicesarray of objects

An array of CategoryChoice elements to define the relevant subchoices.


Example

{
  "type": "category",
  "field_id": "genre",
  "title": "Select all genres that apply.",
  "choices": [
    {
      "label": "Hip-Hop/Rap",
      "value": "hip-hop-rap",
      "hint":
        "It consists of a stylized rhythmic music that commonly accompanies rapping, a rhythmic and rhyming speech that is chanted.",
      "subchoices": [
        { "label": "Dirty South", "value": "dirty-south" },
        { "label": "Industrial Hip Hop", "value": "industrial-hip-hop" },
        { "label": "Nerdcore", "value": "nerdcore" },
        { "label": "Rap", "value": "rap" },
      ]
    },
        {
      "label": "R&B/Soul",
      "value": "rb-soul",
      "subchoices": [
        { "label": "Disco", "value": "disco" },
        { "label": "Funk", "value": "funk" },
        { "label": "Motown", "value": "motown" },
      ]
        },
  ],
  "min_choices": 1,
  "max_choices": 5
}

TimerangeField

Subclass of UnitField.

Parameters

default_secondsarray of integersrequired

Must have length 2, and be in range [0, 24 * 60 * 60]


increment_secondsinteger

Must be between 1 and 60 * 60


default_from_fieldstring

Must be a valid field_id


Example

{
  "type": "time_range",
  "field_id": "hours",
  "title": "Store Hours",
  "defaults_seconds": [
    28800,
    72000
  ],
  "increment_seconds": 300,
  "max_responses_required": 2, 
  "min_responses_required": 0
}

SelectField

Subclass of UnitField.

Parameters

choicesarray of objects

An array of selectable options, choices is not required if choices_from_field is present.


choices_from_fieldstring

Must be a valid field_id


Example

{
  "type": "select",
	"field_id": "sentiment",
  "title": "Sentiment",
  "description": "Choose a sentiment that best describes this text",
  "required": True,
  "choices_from_field": "Options",
}

RankingField

RankingField
objects allow you to define task to rank task attachments.

Returns a

list
response with ordered options.

Parameters

titlestring

A brief description about what the response should be. This may change among tasks within a project.


hintstring

An array of child UnitField and FieldSet objects. Must contain at least 2 elements.


first_labelstring

Determines whether or not all.


last_labelstring


num_items_to_rankinteger

The number of options required to rank (can be less than number of attachments).


requiredboolean

Determines whether or not all num_items_to_rank fields should filled.


Example

{
	"type": "ranking_order",
  "field_id": "relevance_ranking",
  "title": "Rank titles based on their relevance to the article",
  "hint": "From the most relevant to the least one",
  "first_label": "Best",
  "last_label": "Worst",
  "num_items_to_rank": 3
}

FormField

FormField
objects allow you to create several mini-forms associated with different attachments. These mini-forms will be populated by the object's child fields.

Returns a

dictionary
response with key-value pairs defined by its child fields.

Parameters

typestringrequired

For FormField Objects, this should be set to form


field_idstringrequired

A unique identifier for the field, which should not change among tasks within a project.


titlestringrequired

Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project.


descriptionstring

A brief description about what the response should be. This may change among tasks within a project.


fieldsarray of objectsrequired

An array of child UnitField and FieldSet objects. Any FieldSet objects here must have incline set to true


📘Note

FormField
objects can only be located on the top level of the
fields
task parameter. If one
FormField
object is used, all the other top-level objects must also be
FormField
objects.

Example

{
  "type": "form",
  "field_id": "form_query",
  "title": "Query Intention",
  "fields": [
    {
      "type": "text",
      "field_id": "query_intention",
      "title": "Query Intention",
      "hint": "Please investigate the search links."
    },
  ]
}

Text Collection Response Format

The

response
object, which is part of the callback POST request and permanently stored as part of the task object, will have an
annotations
field. The
annotations
object is a dictionary in which each key is a
field_id
defined in the task parameters and each value is the respective annotation for that field.

Each annotation will be of the type defined by its field above. If

max_responses_required
is applicable and greater than 1, the annotation will be an array of the type.

📘Note

See the Callback section for more details about callbacks.

Example

{
  "response": {
    "annotations": {
      "category_name": "Soup", //TextField
      "category_items": [ //FieldSet with max_responses_required greater than one
        {
          "item_name": "Tom Yum Chicken Soup", //TextField
          "item_price": "11.79" //NumberField
        },
        {
          "item_name": "Tom Yum Beef Soup", //TextField
          "item_price": "11.79" //NumberField
        }
      ],
      "category_metadata": { //FieldSet
        "gluten_friendly": true, //BooleanField
        "labels": [ //TextField with max_responses_required greater than one
          "Free Range", 
          "All Natural"
        ] 
      }
    }
  },
  "task_id": "5774cc78b01249ab09f089dd",
  "task": {
    // populated task for convenience
  }
}

Text Collection Hypothesis

When creating a

textcollection
task, you can provide prelabels in the
hypothesis
field, so that workers don't have to start from scratch to annotate the image.

In order to add pre-labels in a task using hypothesis, you’ll need to provide these in the

hypothesis
field of the payload when creating the task. The schema of the hypothesis object must match the schema of the task response.

  1. Verify the task response field schema for the desired task type.

  2. Review your project taxonomy (label names, attribute conditions, annotation types, etc).

  3. Generate pre-labels that are formatted to match the aforementioned schema and taxonomy.

  4. Create a task, including a hypothesis field that contains the pre-labels at the same top-level as other task fields such as project and instructions.

The hypothesis format will largely mirror Scale’s task response format. In this particular task type,

annotations
field is mandatory inside the hypothesis object.

The only difference between

hypothesis
and the
response
format is that inside every field you want to pre-annotate, you'll need to add two more field fields:

type
describes the field type (category, select, text, etc.)
field_id
describes the identification given to this field for tracking (field name)

You can find these two fields in your task taxonomy

Note: For Text types fields the response format differs from the other types. For this particular field type,

response
field will be an array of a single string instead of an array of arrays containing strings.

task_payload_with_hypothesis

{
 ...
 "batch": "regular_batch_name",
 "hypothesis": {
   "annotations": {
     "(EXAMPLE) Multiple Choice Question": {
       "type": "category",
       "field_id": "(EXAMPLE) Multiple Choice Question",
       "response": [
         [
           "B"
         ]
       ]
     }
   }
 },
 ...
}

task_taxonomy

{
   "fields": [
     {
       "type": "category",
       "field_id": "(EXAMPLE) Multiple Choice Question",
       "title": "Which option best fits this task?",
       "choices": [
         {
           "label": "A",
           "value": "A"
         },
         {
           "label": "B",
           "value": "B"
         },
         {
           "label": "C",
           "value": "C"
         }
       ],
       "min_choices": 1,
       "max_choices": 1,
       "description": "Select one of the following. "
     }
   ]
 }

task_payload_with_hypothesis_text_field

{
   ...
   "hypothesis": {
       "annotations": {
           "Product Description": {
               "type": "text",
               "field_id": "(EXAMPLE) Text Input Field",
               "response": [
                   "Dolore in dolor occaecat deserunt ex in qui non amet est."
               ]
           }
       }
   }
   ...
}

NamedEntityRecognitionLabel


NamedEntityRecognitionLabel
objects define the taxonomy of labels to use to annotate spans of text.

Parameters

namestringrequired

A unique identifier for this label.


display_namestring

An alias for this label to display to taskers.


descriptionstring

A description of what this label should represent. Displayed to taskers to improve quality.


childrenarray of objects

An array of NamedEntityRecognitionLabel objects to group underneath this label. Specifying this field causes this label itself to no longer be used for labeling text spans.


attributes (optional)object


NamedEntityRecognitionAttribute
objects define form fields for individual annotations.

Parameters

typestring

Only 'select' for now.


optionsarray of objects

List of select option objects.


display_namestring

Optional display name.


descriptionstring

Optional description.


AttributeSelectOption
objects define possible values for select attributes.

Parameters

valuestring

The value that will show up in the response if this option is selected.


display_namestring

Optional display name if different from the value.


NamedEntityRecognitionRelationshipDefinition


NamedEntityRecognitionRelationshipDefinition
objects specify the types of relationship that can exist between two text spans.

A relationship can either be named or unnamed. A named relationship is useful if you need to distinguish between multiple types of relationship that could exist between the same two text spans. For instance, if you're annotating a description of someone's family history, you might want to distinguish a "child of" relationship from a "sibling of" relationship.

A task can only specify one type of relationship. Either all the relationships in a task must be named, or all must be unnamed.

Parameters

namestring

A unique identifier for this type of relationship. Required for named relationships; disallowed for unnamed relationships.


display_namestring

A description for this relationship to display to taskers. Should be able to be used to construct a short phrase describing the relationship. For example, a relationship between two text spans "A" and "B" with display_name "is parent of" would be rendered to taskers as "A is parent of B". Required for named relationships; disallowed for unnamed relationships.


is_directedboolean

A field indicating whether the directionality of this relationship matters. For example, a "is parent of" relationship would likely be directed, whereas a "is sibling of" relationship would likely not be directed. Optional for named relationships; disallowed for unnamed relationships.


source_labelstring

A string referencing the name field of a NamedEntityRecognitionLabel object. If set, mandates that the source text span of this field must be labeled with the corresponding NamedEntityRecognitionLabel, or one of its children. Optional for both named and unnamed relationships.


target_labelstring

A string referencing the name field of a NamedEntityRecognitionLabel object. If set, mandates that the target text span of this field must be labeled with the corresponding NamedEntityRecognitionLabel, or one of its children. Optional for both named and unnamed relationships.


Named Entity Recognition Callback Format

The

response
object is part of the callback POST request and is permanently stored as part of the task object.

NamedEntityRecognitionResponse

The structure of a response object for named entity recognition consists of two arrays: one for entity annotations and another for relationships between these entities.

NamedEntityRecognitionAnnotation

The format for an individual entity annotation within the named entity recognition response, detailing the unique identifier, position, and content of the recognized text span, as well as its label and any optional attributes.

NamedEntityRecognitionRelationship

In tasks with undirected relationships, the

source_ref
and
target_ref
fields are interchangeable. In tasks with links that do not have relationship names, the
name
field will be left blank.

Example

{
  "annotations": [
    {
      "id": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
      "start": 10,
      "end": 17,
      "text": "Alex Wang",
      "label": "person",
    },
    {
      "id": "a76da53e-4ebd-4466-aed7-80db6fb98329",
      "start": 22,
      "end": 31,
      "text": "Transform",
      "label": "conference",
    }
  ],
  "relationships": [
    {
      "id": "ade8e9e9-ef9c-4fc7-9517-62d79a15c1cb",
      "source_ref": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
      "target_ref": "a76da53e-4ebd-4466-aed7-80db6fb98329",
      "name": "speaker_at",
    }
  ]
}

NamedEntityRecognitionResponse

Field

Type

Description

annotations

object array

List of NamedEntityRecogntionAnnotation objects.

relationships

object array

List of NamedEntityRecognitionRelationship objects.

NamedEntityRecognitionAnnotation

Field

Type

Description

id

string

Unique identifier.

start

number

Start index of the text span.

end

number

End index of the text span.

text

string

Text of the text span.

label

string

References the name field of a label in the task params.

attributes (optional)

object

The keys of the object reference keys of the attributes object for the corresponding label in the task params.

NamedEntityRecognitionRelationship

Field

Type

Description

id

string

Unique identifier.

source_ref

string

References the id of the annotation that is the source of the directed relationship.

target_ref

string

References the id of the annotation that is the target of the directed relationship.

name (optional)

string

References the name of relationship definitions in the task params.

Updated 4 months ago