Company Updates & Technology Articles
March 5, 2025
Scale is proud to have been awarded a prime contract by the Defense Innovation Unit (DIU) for Thunderforge - the DoD’s flagship program leveraging AI for military planning and wargaming. Thunderforge represents our commitment to advancing U.S. military capabilities. Following its initial deployment, Thunderforge will expand throughout combatant commands, leveraging Scale AI's agentic applications and GenAI evaluation expertise.
March 3, 2025
Scale AI, a leader in building frontier AI solutions, and Inception, a G42 company developing AI-native products for enterprises, have announced a strategic partnership aimed at accelerating global AI adoption across the public and private sector. The partnership agreement was signed between Ashish Koshy, COO of Inception and Trevor Thompson, Global Managing Director at Scale AI.
February 27, 2025
Scale AI, in collaboration with the Center for Strategic and International Studies (CSIS), is proud to introduce the Critical Foreign Policy Decision (CFPD) Benchmark—a pioneering effort to evaluate large language models (LLMs) on national security and foreign policy decision-making tendencies.
February 23, 2025
The Ministry of Communications and Information Technology (MCIT) and Scale AI, the leader in frontier AI solutions, are announcing a strategic, long-term partnership to drive Qatar’s digital transformation.
February 19, 2025
Contested basing environments require scalable solutions for perimeter security. Scale AI, the Defense Innovation Unit, and the U.S. Air Force are demonstrating the value of computer vision to force protection challenges globally.
February 11, 2025
Scale researchers have discovered a groundbreaking method for AI safety testing called J2 (Jailbreaking to Jailbreak), where language models are taught to systematically test their own and other models' safety measures. This hybrid approach combines human-like strategic reasoning with automated scalability, achieving success rates of over 90% in vulnerability testing, nearly matching professional human red-teaming effectiveness. While highlighting significant advances in automated security testing, these findings also reveal important challenges for the future of AI safety.
Scale AI leads groundbreaking research to build safer, more capable AI systems through innovative approaches in post-training optimization, agent development, and evaluation frameworks. Their comprehensive work spans from improving model performance and reliability to developing robust safety measures, all while maintaining a commitment to open collaboration and industry-wide advancement. Through the Safety, Evaluations, and Alignment Lab (SEAL) and various research initiatives, Scale AI is shaping the future of responsible AI development.
February 10, 2025
Scale’s AISI-approved AI model evaluations are setting a new standard for pre-deployment testing. By offering voluntary, efficient, and third-party validated assessments, we are empowering AI developers to create more reliable models—without the complexities that typically slow down the process.
January 24, 2025
Text2SQL systems promise to democratize access to enterprise data but often fail to handle the complexity of real-world database queries, even if they perform well on test datasets. We found that Reinforcement Learning from Human Feedback (RLHF) is a viable approach for active learning from incorrect production queries to improve Text2SQL accuracy.
January 23, 2025
Scale AI and the Center for AI Safety (CAIS) are proud to publish the results of Humanity’s Last Exam, a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge at the frontiers of human expertise.