
The increasing complexity of advanced AI systems, particularly new forms like domain-specific agents and LLM assistants, is rapidly outpacing current safety testing. Traditional lab-based evaluations often fail to reflect real-world deployment environments, missing crucial vulnerabilities introduced by these sophisticated systems. In response, Scale researchers, in their new paper “A Red Teaming Roadmap Towards System-Level Safety,” propose that red teaming practices must evolve to align with real-world usage and operate directly within AI products and systems in their actual deployment contexts.
Specifically, the paper outlines that:
This post provides an overview of the argument for expanding the scope of red teaming, how this approach aligns with our previous research, and what it means for the wider AI community.
To red team effectively, we must first define what is being tested. The authors distinguish between models (neural networks trained to perform actions), products (applications built on models), and systems (products within their broader context that includes users and their environment – for instance, a university student using an educational app built on top of a model like ChatGPT). This distinction matters because each presents unique safety requirements and potential harms. Downstream developers often have specific safety specifications driven by their use cases that model-level safety alone cannot address. Therefore, the authors argue, red teaming should prioritize safety vulnerabilities within defined product scenarios rather than focusing solely on abstract model-level harms.

This framework reveals significant gaps in current red teaming practices. Most safety efforts remain model-centric and neglect deployment context, overlooking vulnerabilities that only emerge when a model operates within a product or system. The authors also critique the excessive focus on abstract social biases when product safety with realistic threat models would be far more impactful. While preventing biased outputs matters, greater risks lie in AI systems that can be exploited for concrete harm - manipulating users, leaking sensitive data, or enabling fraud. Rather than theoretical concerns driving safety priorities, actual deployment risks should guide where we focus our red teaming efforts.
To help guide our community toward a more holistic approach to red teaming, the authors suggest an approach that is more product-focused, grounded in realistic threats, and system-aware. These are the three pillars that define this approach:
The foundational shift, according to the paper, is moving from "universal" ethical principles to specific safety requirements tailored to each product's actual use case. There is no useful, widely shared definition of "harmful" behavior; what's dangerous in a medical advisor could be acceptable in a writing tool. Products diverge from their underlying models based on their users, business models, integrated tools, and deployment environments. A benign language model becomes potentially hazardous when given access to code interpreters, web browsers, or payment systems, much like household cleaners that are safe individually but dangerous when mixed. Red teaming must evaluate specific, actionable safety specifications unique to each product. This means probing every component users interact with, from the UI to the tools that could be exploited for unintended purposes.
Red teaming research is not merely a rote exercise; it instead must be relevant to real-world harms and reflect what motivated attackers might realistically attempt. Though there are many different types of threat models, this paper uses four overarching example categories of increasing complexity that must be addressed differently:
Red teaming entire systems involves incorporating the environment, users, and the AI product's interactions within that ecosystem. This is essential for addressing modes of harm and implementing stronger safety practices. System-level safety requires:
By allowing these three pillars to function simultaneously, the authors believe that red teaming can become far more effective, relevant, and impactful to help ensure AI systems are developed and deployed safely and responsibly.
Red teaming is at an inflection point. To keep pace with AI's rapid growth, it must evolve from an academic exercise into a practical discipline focused on how systems can fail in the real world. This requires an industry-wide shift toward standardized, product-specific safety frameworks that model entire user contexts, not just isolated outputs. This paper offers the roadmap to make that critical transition while the risks are still manageable.