Working with Data

Data is at the center of what Scale does, and as you may expect, getting data into and out of the Scale platform is one of the first things any customer will need to do.

Whether you are doing a university hackathon project or working with classified data like our customers at the US Army and Department of Defense, we have capabilities to get your data quickly and securely uploaded to our platform.

In almost all cases, we will need you to upload data to our platform so we can get it annotated, processed, or otherwise in a usable state.

Example Data Annotation / Nucleus Workflow

  1. An API call is submitted to Scale with parameters to create a Task or upload a Dataset Item to Nucleus.

  2. In the API call, you will reference an attachment or piece of content, typically in the form of a URL or URI. This content can be hosted and made accessible to Scale in many different ways (see below)

  3. Scale will automatically make a securely stored copy of the attachment and do any required attachment processing on it to optimize your content for what's to come.

  4. For data annotation workflows, if the attachment provided is not accessible to Scale's servers, the task is moved to an error state, the customer is not charged for this task, and it is not labeled or processed.

Ways to Share Data with Scale

Cloud Storage

Scale has built-in support for

Using our built-in cloud hosting integrations is the most preferred way to share large amounts of data with Scale.

Public Access

Depending on how your data is hosted, a simple, publicly accessible link to it could be sufficient.

A good test to see if your public link is truly accessible by Scale's attachment processing servers is to see if you can open the URL in an "Incognito" tab of your browser and view or download the content you intend to share.

Scale File Storage

If you are unable to use a Cloud Storage link and publicly hosting the data isn't a wise option, Scale has its own File Upload API that you can use. Scale Files can only be used to submit data to the platform. If your organization doesn't use one of the Cloud Storage providers above, uploading the data directly to Scale could be a great and secure solution to your needs.

IP Whitelisting

If you do not use a Cloud Storage Provider above, but your data can be shared with a static set of IP addresses, this can be a great solution to add an additional layer of security to your data.

Scale Rapid Upload Options

Scale Rapid allows for quick labeling of small projects - one of the features to make getting to labeled data as quickly as possible is the ability to upload files directly from your browser, or provide a .csv file of attachment URLs (attachment URLs still must be accessible to Scale with one of the above options). The Uploading to Rapid instructions have you covered with the details.

Scale Nucleus Privacy Mode

For Scale Nucleus, you are able to upload only metadata about your data to Scale without having to upload the data itself. In other words, the pixels don't leave your system, but you can still use our cloud-hosted application like any other web experience.

Benefits of Attachment Processing

If you were wondering why Scale needs a copy of the data in the first place for data annotation, we wanted to highlight a few of the capabilities having our own copy of the data provides:

  1. Protects customer brand, security, and anonymity by not having direct content URLs exposed.

  2. Protects customer's data from leaving the Scale platform; all links to attachments are expiring and non-public.

  3. Mitigates the risk that a customer's own data availability could impact tasking ability, for example, an outage on the customer's side wouldn't disrupt labeling workflows.

  4. Ensures taskers only see valid content for labeling.

  5. Ensures work done on the platform can be reviewed and billed for without question - we can always prove the work and quality done on tasks submitted regardless of what happens to the customer's initial attachments.

Updated 18 days ago