Product
Product

How to Use Scale for Your ML Hackathon Project

byon March 30, 2022

Project Ideas and Inspiration


Scale recently sponsored Treehacks, Stanford University’s annual hackathon.

During this fully virtual hackathon, nearly 1300 active hackers and 114 teams

submitted projects for judging. Out of all teams, 21 submitted projects using

Scale’s API.


The winning project for Best ML Hack was

Fine Tuned Heads,

which proposed a new workflow to assist with the implementation of Computer

Vision. Rather than training a single model from scratch, the encoding and

fine-tuning aspects of a task were separated. The Fine Tuned Heads team

created an automatic data collection pipeline using Flask to automatically

execute scripts to grab .m3u8 URLs from Youtube, then used Scale Rapid to

automate uploading and generating batches of ground truth labels for these

images. According to the Fine Tuned Heads team, “this approach can be used to

improve the efficiency of inference in ML tasks, and by engineering our

self-supervised model with an efficient architecture and deploying on

efficient hardware, we can reduce the carbon footprint by over 100x.''


An additional winner of the API Challenge was

AcceCity, which provides

an ML-powered mapping platform that enables cities, urban planners,

neighborhood associations, disability activists, and more to identify key

areas to prioritize investment in. For their project, the AcceCity team first

used the Google Maps API to send images of street views to their backend. They

looked for handicapped parking, sidewalks, disability ramps, and crosswalks

and used computer vision, by custom-fitting a zero shot learning model called

CLIP from OpenAI, to automatically detect those objects from the images. They

then tested the model using labeled data from the Scale Rapid API.


Other interesting projects include using Scale Rapid to label map data to

promote sustainable living, and classify patient medical image data.


Beyond the examples from the Hackathon, one can use Scale Rapid for a variety

of ML projects, including to

anime-fy images and to

train a

sentiment analysis model for movie reviews. Use our open datasets to draw

further inspiration for your projects.


How can you use Scale to build a Successful ML Hackathon Project?


One of the most challenging problems preventing teams from creating novel ML

projects in a short timespan is the “Cold Start” problem, which is when an ML

model cannot draw inferences for users or items about which it has not yet

gathered sufficient information. Especially during the compressed timespan of

a hackathon, obtaining high-quality training data for models is nearly

impossible.


Scale Rapid is the fastest way to obtain production-level quality labels, with

no data minimums. Using Scale Rapid, your hackathon team can experiment

quickly by:


  1. Setting up projects in minutes and receiving initial data within hours

  2. Iterating over edge cases and instructions by getting real-time feedback
  3. Scale to production-level pipelines with precision quality control


To get started, sign up for an account with your university email on our

University Program Page to

receive $250 in Rapid Credits, or check out our

quick start python notebook.

Rapid currently supports the following annotation types:


  • General Image Annotation
  • 2D Semantic Segmentation
  • Text Collection/Categorization
  • Document Transcription
  • Named Entity Recognition
  • Video Playback Annotation
  • Lidar Annotation

Annotation Types Supported in Scale Rapid



If you have your own unlabeled data, add it to your project using local file

or csv upload, import it from previous projects, or upload it from the cloud

(S3, GCP, and Azure). Otherwise, check out our

open datasets.


Once you receive task responses, visualize your data in Scale’s dataset

management platform, Scale Nucleus, by navigating to the data tab under your

project and clicking “Visualize in Nucleus”. Nucleus helps you understand

the strengths and weaknesses in your dataset, analyze the long tail of your

dataset, collaboratively edit labels, and explore metrics, failure cases,

and confusion matrices linked to your training data.


Scale Nucleus <> Scale Rapid Integration



Quickly find similar images using the “Autotag” feature in Nucleus, and

further refine your set of images by providing a few examples of positive

and negative examples. After identifying this subset of your dataset, you

can send it back for labeling through Scale Rapid.


Scale AI’s products help enable and strengthen a team’s ML development cycle

immensely by delivering high quality data for any project. As you or your

team continues to maintain and improve your project post-hackathon, use

Scale AI for consistent, reliable data in order to create, validate, and

maintain production for your high-performing ML models.


The future of your industry starts here.