How to Use Scale for Your ML Hackathon Project

by Bihan Jiang on March 30th, 2022

How to Use Scale for Your ML Hackathon Project cover

Project Ideas and Inspiration

Scale recently sponsored Treehacks, Stanford University’s annual hackathon. During this fully virtual hackathon, nearly 1300 active hackers and 114 teams submitted projects for judging. Out of all teams, 21 submitted projects using Scale’s API.

The winning project for Best ML Hack was Fine Tuned Heads, which proposed a new workflow to assist with the implementation of Computer Vision. Rather than training a single model from scratch, the encoding and fine-tuning aspects of a task were separated. The Fine Tuned Heads team created an automatic data collection pipeline using Flask to automatically execute scripts to grab .m3u8 URLs from Youtube, then used Scale Rapid to automate uploading and generating batches of ground truth labels for these images. According to the Fine Tuned Heads team, “this approach can be used to improve the efficiency of inference in ML tasks, and by engineering our self-supervised model with an efficient architecture and deploying on efficient hardware, we can reduce the carbon footprint by over 100x.''

An additional winner of the API Challenge was AcceCity, which provides an ML-powered mapping platform that enables cities, urban planners, neighborhood associations, disability activists, and more to identify key areas to prioritize investment in. For their project, the AcceCity team first used the Google Maps API to send images of street views to their backend. They looked for handicapped parking, sidewalks, disability ramps, and crosswalks and used computer vision, by custom-fitting a zero shot learning model called CLIP from OpenAI, to automatically detect those objects from the images. They then tested the model using labeled data from the Scale Rapid API.

Other interesting projects include using Scale Rapid to label map data to promote sustainable living, and classify patient medical image data.

Beyond the examples from the Hackathon, one can use Scale Rapid for a variety of ML projects, including to anime-fy images and to train a sentiment analysis model for movie reviews. Use our open datasets to draw further inspiration for your projects.

How can you use Scale to build a Successful ML Hackathon Project?

One of the most challenging problems preventing teams from creating novel ML projects in a short timespan is the “Cold Start” problem, which is when an ML model cannot draw inferences for users or items about which it has not yet gathered sufficient information. Especially during the compressed timespan of a hackathon, obtaining high-quality training data for models is nearly impossible.

Scale Rapid is the fastest way to obtain production-level quality labels, with no data minimums. Using Scale Rapid, your hackathon team can experiment quickly by:

  1. Setting up projects in minutes and receiving initial data within hours
  2. Iterating over edge cases and instructions by getting real-time feedback
  3. Scale to production-level pipelines with precision quality control

To get started, sign up for an account with your university email on our University Program Page to receive $250 in Rapid Credits, or check out our quick start python notebook.

Rapid currently supports the following annotation types:

  • General Image Annotation
  • 2D Semantic Segmentation
  • Text Collection/Categorization
  • Document Transcription
  • Named Entity Recognition
  • Video Playback Annotation
  • Lidar Annotation
Annotation Types Supported in Scale Rapid

If you have your own unlabeled data, add it to your project using local file or csv upload, import it from previous projects, or upload it from the cloud (S3, GCP, and Azure). Otherwise, check out our open datasets.

Once you receive task responses, visualize your data in Scale’s dataset management platform, Scale Nucleus, by navigating to the data tab under your project and clicking “Visualize in Nucleus”. Nucleus helps you understand the strengths and weaknesses in your dataset, analyze the long tail of your dataset, collaboratively edit labels, and explore metrics, failure cases, and confusion matrices linked to your training data.

Scale Nucleus <> Scale Rapid Integration

Quickly find similar images using the “Autotag” feature in Nucleus, and further refine your set of images by providing a few examples of positive and negative examples. After identifying this subset of your dataset, you can send it back for labeling through Scale Rapid.

Scale AI’s products help enable and strengthen a team’s ML development cycle immensely by delivering high quality data for any project. As you or your team continues to maintain and improve your project post-hackathon, use Scale AI for consistent, reliable data in order to create, validate, and maintain production for your high-performing ML models.