Aerin is an engineering manager at Scale AI, leading a Catalog ML team. Below are Aerin's stories about what she does and why she loves working at Scale.
What brought you to Scale?
Before I joined Scale, I worked on a project called NL2SQL which stands for "Natural Language to Structured Query Language" at Microsoft. NL2SQL is a translation problem that automatically converts a user's spoken-language query into SQL.
Just a few months after my team had started working on it, we were beating the SOTA (state-of-the-art) performance on research benchmarks. We achieved this by evaluating different ML model architectures and improving them. In other words, we somewhat solved the NL2SQL problem in research after only a few months of work.
However, in production, we weren’t breaking the state of the art. The model still made a few mistakes on the user's input. And it became clear to me that changing the model architectures would only get us so far, and that the most crucial thing we could do to improve the performance of NL2SQL was to improve the training data. This was done by generating edge cases synthetically and therefore increasing the training data coverage.
So, we started synthetically generating training data (NL-SQL pairs) by following syntax requirements, snowflake schema, etc. However, the data that we have artificially created couldn't possibly cover every possible scenario or edge case. Because the true edge cases are ones that you can't predict or plan for, like the questions that people have typed into our search query bar. Another way to increase the coverage of the training data was to label the user inputs and feed them back to the training data. Taking care of that lifecycle, also called an auto-labeling pipeline, was quite an operation. Even though I was given a lot of resources to manage the auto labeling pipeline, I couldn't undertake any other engineering work because it required so much overhead.
So I started looking for the best third-party training data provider in the market, and that's how I met Scale, and the rest is history.
What do you enjoy most about working at Scale?
I like that I get to work on problems that are important and challenging enough to customers that they are willing to pay to have them solved. This means that the problems are real.
I also like that Scale's work is always about data, which is my favorite subject. For every business, I believe data is the most valuable and private asset. Say if you're developing a machine learning model, the one and only input that determines how well your model will perform is your data. The fact that customers trust us with their data and allow us to manage it energizes me, and I feel a great deal of commitment and responsibility. Because the data we provide with them is being used as a bloodline for their business, this relationship just can't work without trust. Having a close relationship with customers is also a rewarding part of my job.
Another highlight of my time here at Scale is the everyday connection with my team. When the engineering team works well, it feels like a well-oiled machine, and you will notice tangible results from that. We've done some seemingly impossible jobs together, and it feels amazing to see the whole team pulling together to complete a challenging project. When working on projects, you learn a great deal by openly debating and choosing the best technical solution with your team. If you're an engineer, I believe that seeing and interacting with other engineers is one of the best ways to learn and grow. That's also something I enjoy at Scale.
The team traveled to Cancun to celebrate the project's completion.
What makes working for a startup unique from other jobs?
I've worked at both a big tech company and a startup, and they are indeed quite different, and each one is unique in its own way.
At a startup, leading a project feels like driving a sports car. You're close to the ground and can feel the movements you make. You can turn quickly, and you can observe fast motion while you are steering. You can see the results of your team's work fairly quickly and clearly. For example, you complete a pilot project successfully, and the customer signs a large contract. There is also a good chance of crashing - for example, you fail to deliver a project and lose a customer. At a startup, you will see the results of what your team does right away.
At a big company, running a project is more like flying a spaceship (I can only imagine this). You’re farther from the ground, it's harder to make quick decisions because you are not the only stakeholder, and the course has been set long in advance. There are also very well-built manuals and processes to guide you along the way.
So far, I've enjoyed seeing the immediate, direct effects of my team's work at Scale.
What's your favorite part about being an engineering manager at Scale?
One of the things I like about being an engineering manager is that I get to work on large-scale projects with a strong, cohesive team, which results in a scalable solution and a greater overall team output. Apart from the collaborative and multiplier nature of the job, I also enjoy problem solving at Scale.
People think that a software engineer's job is to write code, but I don't think that's a complete picture of an engineer. A more accurate job description would be "a problem solver", since our main goal is to build things that help people solve problems. You won't be able to solve problems if you blindly write code, especially from the wrong angle.
I believe the most important thing to do in order to solve problems is to accurately diagnose them. So the ability to diagnose problems accurately is, in my opinion, the most important skill for an engineer to have. If your diagnosis is wrong, you will rarely be effective in solving actual problems. And you can only make the right diagnosis if you have a full, concrete understanding of the problem, the data, and the use cases.
While surgeons can rely on other professionals like physicians, x-ray technicians, to aid with diagnosis, ML/SW engineers have to diagnose problems on their own. We have to carefully examine the available data, debug the code thoroughly, then choose the best solution and also implement. The ML/SWE profession comes with a significant amount of both power and responsibility.
If you are someone who enjoys doing things like this (proper diagnosis and implementation), you will do well in the field of engineering.
Tell me about interesting projects that you ran at Scale?
Every project at Scale is interesting and unique in its own way, as each project has different data modality (3D Lidar, 2D imagery, NLP, tabular, etc.) I enjoy that I get to work on diverse problems and apply the peculiarities of each data modality to ML model development. For example, 3D data often comes with the new axes - yaw, pitch, roll, unlike 2D data and the sensor's effect on the LIDAR data is significant. NLP also has its own quirks, such as different types of tokenizers, vector representations, etc.
These days, my team is working on multi-modal (Language + Vision) problems. For example, given an image of a model wearing clothes, can we describe what she’s wearing in detail (e.g. short sleeve, polka-dots, long boots, etc.) in natural language? To this end, one popular research/exploratory approach is to train a multi-task (e.g., Visual Question Answering, Image Captioning, Text-to-Image Generation, Visual Grounding, etc.) model and use it for attribute prediction.
Although I can't talk about the details of the projects I work on for clients, I do make an intentional effort to make my team's work more universally applicable, scientifically relevant, and therefore publishable. The attribute prediction work (link) that was just accepted at CVPR 2022 is a great example. One of the most common requests we get from clients is to assign visual characteristics such as color, shape, or motion to detected objects. But there aren't many researchers working on this topic publicly (though there are a lot of practitioners working on it in industry), and there aren't many benchmark datasets for attribute prediction. So we did in-depth research on this topic, presented our findings at CVPR, and received overwhelmingly positive feedback from the ML community.
The Fitzpatrick 17k project with MIT is also a good example. The Fitzpatrick scale is a 6-point scale that tells how dark your skin tone is. And this project’s motives are quite fascinating. Because medical imaging doesn't have much data on darker skin, ML models trained on these datasets have trouble recognizing dermatological illnesses on darker skin. We wanted to bring attention to this problem and encourage more inclusion of images with darker skin tones in datasets. So we annotated 17k clinical images in DermaAmin and Atlas Dermatologico, the two major dermatology atlases, using the Fitzpatrick Scale.
We trained a few models on this new dataset and then looked at how well they work on images with different skin tones. And yes, adding more diverse data to the training set was an effective way to reduce bias.
Because of this work, researchers and practitioners are now a little more likely to take into account and examine biases in training datasets before developing ML models in the healthcare sector. Ultimately, we want to make sure that ML models trained on imbalanced datasets don't make healthcare inequalities worse by accident. With this study, we received an honorable mention award at the CVPR skin imaging workshop.
If you are interested in any of the topics that I have said above or in working to solve intriguing ML/SWE challenges, email your resume at email@example.com!