Validate Model Safety and Benchmark Against Competitors for Responsible Deployment
To validate its most advanced foundation model to date, Amazon engaged Allice for a manual red-teaming evaluation of Nova Premier, testing the model's readiness for safe and secure deployment.
Validating Foundation Model Safety for Responsible Deployment

Company Size
Industry
About
"Through this hands-on evaluation, Alice strengthened Nova’s security posture and supported Amazon’s broader Responsible AI goals, ensuring the model could be deployed with greater confidence."
To help validate its most advanced model to date, Amazon partnered with Alice to red-team Nova Premier against high-risk prompts. The results positioned Nova as safer than its competitors, marking a major step toward secure enterprise deployment.
Challenge
Amazon aimed to rigorously validate the safety of its most capable foundation model, Nova Premier ahead of public release. With increasing risks associated with advanced generative models, they sought to benchmark it against real-world adversarial threats across critical responsible AI (RAI) categories.
Solution
Alice partnered with Amazon as a third-party red teamer to perform manual, blind evaluations of Nova Premier on Amazon Bedrock. Testing spanned prompts across Amazon’s eight RAI categories, including safety, fairness and bias, and privacy and security. ALice also benchmarked Nova Premier against other LLMs for comparison.
Impact
The collaboration demonstrated how expert-led manual red teaming complements automated testing, offering a comprehensive snapshot of model robustness.
Globally trusted for good reason.
Alice is led, supported, and backed by experts in communicative tech integrity. See how we use our unparalleled threat intelligence to continuously protect over 3 billion people worldwide.
Get a DemoWhat’s New from Alice
Securing Agentic AI: The OWASP Approach
In this episode, Mo Sadek is joined by Steve Wilson (Chief AI and Product Officer at Exabeam, founder and co-chair of the OWASP GenAI Security Project) to explore how OWASP is shaping practical guidance for agentic AI security. They dig into prompt injection, guardrails, red teaming, and what responsible adoption can look like inside real organizations.
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
