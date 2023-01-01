Products
Test & Evaluation
Ensure The Safe, Secure Deployment of LLMs.
Test & Evaluate
What is Large Language Model Test & Evaluation?
Continuous Evaluation
Continuously evaluate and monitor the performance of your AI systems.
Red Teaming
Identify severe risks and vulnerabilities for AI systems.
AI System Certification
Certify AI applications for safety and capability.
Get Early Access
Understand LLM Capabilities, Risks, and Vulnerabilities
Approach
Our Approach to Hybrid Test & Evaluation
Hybrid Red Teaming
Red teaming seeks to elicit undesirable model behavior as a way to assess safety and vulnerabilities. The most effective red teaming pairs automated attack techniques with human experts across a diverse threat surface area.
Hybrid Model Evaluation
Continuous model evaluation is critical for assessing model capability and helpfulness over time. A scalable hybrid approach to evaluation leverages LLM-based evaluation combined with human insights where they are most valuable.
Ecosystem
An holistic and effective approach to model test and evaluation requires participation and coordination from a broad ecosystem of institutional stakeholders.
Risks
Key Identifiable Risks of LLMs
Misinformation
LLMs producing false, misleading, or inaccurate information.
Unqualified Advice
Advice on sensitive topics (i.e. medical, legal, financial) that may result in material harm to the user.
Bias
Responses that reinforce and perpetuate stereotypes that harm specific groups.
Privacy
Disclosing personally identifiable information (PII) or leaking private data.
Cyberattacks
A malicious actor using a language model to conduct or accelerate a cyberattack.
Dangerous Substances
Assisting bad actors in acquiring or creating dangerous substances or items(e.g. bioweapons, bombs).
Experts
Expert Red Teamers
Red Team
Experienced Security & Red Teaming Professionals.
Technical
Coding, STEM, and PhD Experts Across 25+ Other Domains.
Defense
Specialized National Security Expertise.
Creatives
Native English Fluency.
Trusted
Trusted by Federal Agencies and World Class Companies
“Automated systems should be developed with consultation from diverse communities, stakeholders, and domain experts to identify concerns, risks, and potential impacts of the system. Systems should undergo pre-deployment testing, risk identification and mitigation, and ongoing monitoring that demonstrate they are safe and effective based on their intended use, mitigation of unsafe outcomes including those beyond the intended use, and adherence to domain-specific standards.”
Blueprint for an AI Bill of Rights
Office of Science and Technology Policy, White House
“Robust red-teaming is essential for building successful products, ensuring public confidence in AI, and guarding against significant national security threats. Model safety and capability evaluations, including red teaming, are an open area of scientific inquiry, and more work remains to be done.”
Moving AI governance forward
OpenAI
