Project Ideas and Inspiration
Scale recently sponsored Treehacks, Stanford University’s annual hackathon.
During this fully virtual hackathon, nearly 1300 active hackers and 114 teams
submitted projects for judging. Out of all teams, 21 submitted projects using
Scale’s API.
The winning project for Best ML Hack was
which proposed a new workflow to assist with the implementation of Computer
Vision. Rather than training a single model from scratch, the encoding and
fine-tuning aspects of a task were separated. The Fine Tuned Heads team
created an automatic data collection pipeline using Flask to automatically
execute scripts to grab .m3u8 URLs from Youtube, then used Scale Rapid to
automate uploading and generating batches of ground truth labels for these
images. According to the Fine Tuned Heads team, “this approach can be used to
improve the efficiency of inference in ML tasks, and by engineering our
self-supervised model with an efficient architecture and deploying on
efficient hardware, we can reduce the carbon footprint by over 100x.''
An additional winner of the API Challenge was
AcceCity, which provides
an ML-powered mapping platform that enables cities, urban planners,
neighborhood associations, disability activists, and more to identify key
areas to prioritize investment in. For their project, the AcceCity team first
used the Google Maps API to send images of street views to their backend. They
looked for handicapped parking, sidewalks, disability ramps, and crosswalks
and used computer vision, by custom-fitting a zero shot learning model called
CLIP from OpenAI, to automatically detect those objects from the images. They
then tested the model using labeled data from the Scale Rapid API.
Other interesting projects include using Scale Rapid to label map data to
promote sustainable living, and classify patient medical image data.
Beyond the examples from the Hackathon, one can use Scale Rapid for a variety
of ML projects, including to
anime-fy images and to
train a
sentiment analysis model for movie reviews. Use our open datasets to draw
further inspiration for your projects.
How can you use Scale to build a Successful ML Hackathon Project?
One of the most challenging problems preventing teams from creating novel ML
projects in a short timespan is the “Cold Start” problem, which is when an ML
model cannot draw inferences for users or items about which it has not yet
gathered sufficient information. Especially during the compressed timespan of
a hackathon, obtaining high-quality training data for models is nearly
impossible.
Scale Rapid is the fastest way to obtain production-level quality labels, with
no data minimums. Using Scale Rapid, your hackathon team can experiment
quickly by:
- Setting up projects in minutes and receiving initial data within hours
- Iterating over edge cases and instructions by getting real-time feedback
- Scale to production-level pipelines with precision quality control
To get started, sign up for an account with your university email on our
University Program Page to
receive $250 in Rapid Credits, or check out our
Rapid currently supports the following annotation types:
- General Image Annotation
- 2D Semantic Segmentation
- Text Collection/Categorization
- Document Transcription
- Named Entity Recognition
- Video Playback Annotation
- Lidar Annotation
If you have your own unlabeled data, add it to your project using local file
or csv upload, import it from previous projects, or upload it from the cloud
(S3, GCP, and Azure). Otherwise, check out our
Once you receive task responses, visualize your data in Scale’s dataset
management platform, Scale Nucleus, by navigating to the data tab under your
project and clicking “Visualize in Nucleus”. Nucleus helps you understand
the strengths and weaknesses in your dataset, analyze the long tail of your
dataset, collaboratively edit labels, and explore metrics, failure cases,
and confusion matrices linked to your training data.
Quickly find similar images using the “Autotag” feature in Nucleus, and
further refine your set of images by providing a few examples of positive
and negative examples. After identifying this subset of your dataset, you
can send it back for labeling through Scale Rapid.
Scale AI’s products help enable and strengthen a team’s ML development cycle
immensely by delivering high quality data for any project. As you or your
team continues to maintain and improve your project post-hackathon, use
Scale AI for consistent, reliable data in order to create, validate, and
maintain production for your high-performing ML models.