
When Toni presses “submit” on a task, she holds her breath.
There is only one correct answer. One ground truth. The model either gets it right or it doesn’t.
Sometimes her kids are nearby when she does it.
“They’ll ask, ‘Did you stump the models?’” she says, laughing. “And we cheer when I do.”
Toni is Associate Dean for Biology Undergraduate Education and a Senior Lecturer in Biology at Brown University. She holds a PhD in biomedical engineering and began her academic journey in chemical and biomolecular engineering at Johns Hopkins University. Her career has spanned engineering, biology, and advanced research into 3D cell culture models and organoids — tiny engineered tissues designed to better predict how drugs behave in the human body.
But alongside her work in academia, Toni is also a contributor on Outlier, applying that same interdisciplinary rigor to help evaluate and improve AI systems.
Toni’s path is not unusual among contributors. Scale’s Economic Impact Report found that 84% of contributors hold advanced degrees or professional credentials in fields like medicine, engineering, and the sciences. The systems powering today’s AI are increasingly shaped by experts whose primary careers exist far beyond tech.
From Engineering to Living Systems
As an undergraduate studying engineering, Toni loved the discipline. But she found herself drawn toward biology.
“I really found that not only did I like the subject matter,” she says, “it really made me think a lot more critically than I ever realized.”
That curiosity led her to pursue research at the intersection of engineering and life sciences. During her PhD, she worked on building 3D cell culture models, small organoids grown in the lab to simulate how real tissues respond to drugs.
The goal was twofold. First, to create better in vitro models that could more accurately predict what would happen in the human body. Second, to explore whether functional tissues could eventually support regenerative medicine.
The work required pulling from multiple domains at once: chemistry, materials science, cellular biology, engineering principles. It demanded precision, synthesis, and constant evaluation of what counted as reliable evidence.
That mindset, rigorous, interdisciplinary, detail-oriented, is exactly what she brings to AI evaluation.
Discovering “Humanity’s Last Exam”
Toni first learned about Outlier after reading about “Humanity’s Last Exam” in The New York Times, a global effort to design questions that push AI systems to their limits.
She was intrigued by the technology and by the challenge.
“One of the things I was always curious about is how I was going to get better,” she says. “I wanted to get a better grasp of what I truly knew at that point.”
Through Outlier, she found a way to test herself alongside the models.
The tasks she gravitates toward live in biology, medicine, engineering, and chemistry, especially the interdisciplinary edge cases. Drug discovery questions that require synthesizing insights across multiple papers. Problems where a small conceptual mistake can cascade into a major error.
Unlike many scientific discussions, these evaluations are not open-ended. There is one final answer. That kind of rigor is exactly what the Economic Impact Report identifies as the emerging backbone of the AI economy: high-skill, knowledge-driven work that depends on domain expertise, not just volume. As models become more capable, the bar for evaluation rises with them.
“When you press submit to see what models you’ve actually stumped,” she says, “I would hold my breath.”
It’s not about proving the model wrong. It’s about understanding where it struggles.
Toni has observed that AI systems are increasingly strong at tightly scoped, detail-oriented questions. Where they still falter is synthesis, pulling together engineering principles, biological nuance, and chemical reasoning all at once.
That’s where her background becomes powerful.
AI as a Scientific Collaborator
For Toni, AI is not a replacement for scientific expertise, it is a collaborator.
“AI is really important in the way that it relates to science and medicine,” she explains. It can help diagnose patients more efficiently, surface relevant research faster, and support less invasive approaches to care.
But she is equally clear about the limits.
“Humankind and humanity and the human aspect will never go away.”
Accuracy matters. Context matters. Ethical judgment matters. When models are used to support decisions in medicine, research, or education, they must be grounded in correct, well-reasoned information from the start.
“It’s really important that AI models are going to give accurate information,” she says. “People are relying on it for so many different things.”
When models encounter complex, high-level reasoning tasks, work from experts like Toni provides a benchmark grounded in real domain knowledge
Building the Future of Science Together
When Toni thinks about where she wants to spend her time and energy, she thinks about impact.
“I really want to be able to think about things that are going to have the largest impact and the broadest reach,” she says.
As an educator, she influences students. As a researcher, she contributes to scientific discovery. Through her work evaluating AI systems, she sees another layer of scale, helping shape tools that will touch laboratories, classrooms, hospitals, and industries around the world. According to Scale’s Economic Impact Report, this kind of flexible, expert-driven contribution is already shaping a growing global workforce built around AI evaluation and training. For professionals like Toni, it’s a way to extend their expertise beyond a single institution and into systems used worldwide.
She believes scientific progress depends not only on building better tools, but on telling compelling stories about why they matter.
“Yes, it’s important to be building AI,” she says. “But if somebody is truly invested in science and technology, they need to know how to make other people care and be a part of it as well.”
That collective effort is what excites her most. Scientists, engineers, educators, and experts from around the world contribute their perspectives to solve complex problems together.
“It’s going to be exciting to see how far it comes,” Toni says. “I find myself empowered, excited to tell people about it and try to get other people and their perspectives and their expertise as part of this as well.”
Behind every advanced model is reasoning shaped by people like Toni — experts who bring depth, nuance, and lived experience into the system.
And sometimes, when she presses submit, she still holds her breath.