Company
Company

Scale AI Partners with DoD’s Chief Digital and Artificial Intelligence Office (CDAO) to Test and Evaluate LLMs

byon February 20, 2024

Scale AI, the leading test and evaluation (T&E) partner for frontier artificial intelligence companies, is proud to share that we are partnering with the U.S. Department of Defense’s (DoD) Chief Digital and Artificial Intelligence Office (CDAO) to create a comprehensive T&E framework for the responsible use of large language models (LLMs) within the DoD.

Through this partnership, Scale will develop benchmark tests tailored to DoD use cases, integrate them into Scale’s T&E platform, and support CDAO’s T&E strategy for using LLMs. The outcomes will provide the CDAO a framework to deploy AI safely by measuring model performance, offering real-time feedback for warfighters, and creating specialized public sector evaluation sets to test AI models for military support applications, such as organizing the findings from after action reports.

This work will enable the DoD to mature its T&E policies to address generative AI by measuring and assessing quantitative data via benchmarking and assessing qualitative feedback from users. The evaluation metrics will help identify generative AI models that are ready to support military applications with accurate and relevant results using DoD terminology and knowledge bases.The rigorous T&E process aims to enhance the robustness and resilience of AI systems in classified environments, enabling the adoption of LLM technology in secure environments.

Alexandr Wang, founder and CEO of Scale AI, emphasized Scale’s commitment to protecting the integrity of future AI applications for defense and solidifying the U.S.’s global leadership in the adoption of safe, secure, and trustworthy AI. “Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly. Scale is honored to partner with the DoD on this framework,” said Wang.

For decades, T&E has been standard in product development across industries, ensuring products meet safety requirements for market readiness, but AI safety standards have yet to be codified. Scale’s methodology, published last summer, is one of the industry’s first comprehensive technical methodologies for LLM T&E. Its adoption by the DoD reflects Scale’s commitment to understanding the opportunities and limitations of LLMs, mitigating risks, and meeting the unique needs of the military. 

Learn more about Scale’s approach to test and evaluation at https://scale.com/llm-test-evaluation


The future of your industry starts here.