Validate Model Safety and Benchmark Against Competitors for Responsible Deployment
To validate its most advanced foundation model to date, Amazon engaged Allice for a manual red-teaming evaluation of Nova Premier, testing the model's readiness for safe and secure deployment.
Validating Foundation Model Safety for Responsible Deployment

Company Size
Industry
About

"Through this hands-on evaluation, Alice strengthened Nova’s security posture and supported Amazon’s broader Responsible AI goals, ensuring the model could be deployed with greater confidence."
To help validate its most advanced model to date, Amazon partnered with Alice to red-team Nova Premier against high-risk prompts. The results positioned Nova as safer than its competitors, marking a major step toward secure enterprise deployment.
Challenge
Amazon aimed to rigorously validate the safety of its most capable foundation model, Nova Premier ahead of public release. With increasing risks associated with advanced generative models, they sought to benchmark it against real-world adversarial threats across critical responsible AI (RAI) categories.
Solution
Alice partnered with Amazon as a third-party red teamer to perform manual, blind evaluations of Nova Premier on Amazon Bedrock. Testing spanned prompts across Amazon’s eight RAI categories, including safety, fairness and bias, and privacy and security. ALice also benchmarked Nova Premier against other LLMs for comparison.
Impact
The collaboration demonstrated how expert-led manual red teaming complements automated testing, offering a comprehensive snapshot of model robustness.
Trusted by security and product teams in the world's most regulated industries
Alice brings years of adversarial intelligence expertise to AI security. We give enterprise teams the coverage that generic guardrails and one-time audits can't match.
Get a DemoWhat’s New from Alice
Your LLM Has No Idea What It's Doing
Diana Kelley, CISO at Noma Security and former Cybersecurity CTO at Microsoft, joins Mo to work through the real mechanics of LLM risk: why the context window flattens the trust boundary between system instructions and user data, why that makes reliable internal guardrails essentially impossible, and why agentic AI is less a new threat category and more a stress test for the hygiene debt organizations never fully paid off.
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
Exposing the Hidden Risks of AI Toys
AI-powered toys are entering children’s everyday lives, but new research reveals serious safety gaps. Alice testing shows how child-like interactions can lead to inappropriate content, unsafe conversations, and risky behaviors.
