Company

Scale AI Partnering with the U.S. AI Safety Institute to Evaluate AI Models

byon February 10, 2025

Scale AI and the United States AI Safety Institute (AISI) are partnering to develop improved methods to test frontier AI models. This will include novel evaluations jointly developed by the AISI and Scale’s research arm, the Safety, Evaluation, and Alignment Lab (SEAL).

Relying on the methods jointly developed under this agreement, model builders of all sizes will be able to voluntarily and efficiently access reliable new forms of testing with Scale and to test their models once and share results with a global network of AI safety institutes, if desired. The evaluations will assess model performance in specific domains that may include subjects such as math, reasoning, and AI coding.

Voluntary pre-deployment testing is a critical step in the AI development lifecycle because it enables model builders to better understand the models’ capabilities, which builders can leverage to improve performance before deployment. 

AISI will use evaluation data to better understand AI technology and inform the creation of standards and policies. Similar to college readiness tests, these AI evaluations will provide AI developers with information on their models’ real world performance.

This agreement marks a new phase in public-private sector collaboration to advance AI science, enabling the U.S. government to create benchmarks, while independent entities conduct the technical testing. This approach will reduce barriers for model builders to pursue voluntary, independent assessments because AI builders of any size can test with Scale and choose whether to share the results with the U.S. AISI. Efficient third-party testing also better supports governments. Without it, governments would need to build out an in-house testing infrastructure–a costly and time-consuming project that is unlikely to meet the growing demand for well-vetted AI systems. 

“SEAL’s rigorous evaluations set the standard for how cutting-edge AI systems meet the highest standards. This agreement with the U.S. AISI is a landmark step, providing model builders an efficient way to vet the technology before reaching the real world,” according to Summer Yue, Director of Research at Scale AI. 

In the coming weeks, Scale will share more details for companies eager to take advantage of this testing. We look forward to expanding this opportunity with additional members of the global AISI International Network to establish similar programs for third-party AI evaluations. 

About AI Safety Institutes

As AI has become mainstream, AISIs have been established globally to research, test, and provide guidance on advanced AI systems to ensure that governments around the world can harness the enormous potential of AI, without redundant work in each country.  The United States and United Kingdom established the first AISIs, and today, a global network of ten nations—including Singapore, Japan, Kenya, and Canada—are collaborating to foster AI adoption, share information, and advance the AI science field. 

About Scale AI’s Safety, Evaluation, and Alignment Lab

Scale is proud to have played a role in helping AISIs understand and advance AI measurement science through SEAL's rigorous evaluations and research. Since 2023, the Lab has been working to pioneer AI evaluation techniques. SEAL has focused on thorough AI evaluations by measuring AI knowledge in sensitive domains like nuclear and cybersecurity (WMDP), developing a stress-testing framework for web-browsing AIs (BrowserArt), and proving the power of multi-turn human-led assessments (MHJ). SEAL’s newest reasoning benchmark (Humanity’s Last Exam), co-created with the Center for AI Safety, now serves as a top-level test for the most advanced AI systems.


The future of your industry starts here.