Intern for AI Security RL Gyms

About the Position

Alice builds adversarial evaluation infrastructure used by the world's leading AI labs to stress-test their most capable agents before deployment. We're hiring interns for our RL Gym environments: adversarial training grounds where AI agents face prompt injection, social engineering, and data exfiltration scenarios.

The Role: You design prompt injection scenarios in YAML, run them against frontier models, validate success rates, and submit passing tasks.

The Task

You build adversarial prompt injection tasks for Alice's RL Gym platform. Each task is a self-contained YAML scenario simulating a realistic AI agent deployment, testing whether the agent can be manipulated into violating its safety policies.

What a task includes:

An agent persona with a system prompt, security policies, and a defined set of tools.
A simulated environment (e-commerce site, messaging platform, enterprise app) with content the agent interacts with.
An adversarial payload embedded in the environment (in messages, product listings, tool responses, documents) that attempts to trick the agent into a policy violation.
A benign user request that naturally leads the agent to encounter the payload.
Deterministic evaluation criteria specifying exactly what constitutes success (agent resists) vs. failure (agent is compromised).

Requirements

What We're Looking For

Strong problem-solving skills and curiosity about AI security.
Interest in adversarial thinking and understanding how AI agents can be manipulated through prompt injection or other attack techniques.
Basic understanding of prompt injection concepts (or willingness to learn).
Comfortable writing structured content in YAML or able to learn quickly.
Familiar with using the command line (CLI); experience with Docker is a plus but not required.
Detail-oriented and able to follow technical guidelines consistently.
Good command of English.
Background in Computer Science, Cybersecurity, AI, Software Engineering, or related fields is preferred.

What We Offer

Hands-on experience working on one of the most cutting-edge AI safety and security projects.
Internship Allowance.
Mentorship from experienced AI security and red teaming professionals.
Opportunity to contribute to evaluation environments used by leading AI labs.

If you’re eager to learn, innovate, and grow in the field of data engineering, we’d love to hear from you. Apply today to be part of a team that values creativity and technical excellence!

Please note that only shortlisted candidates will be contacted.

About Alice

THE CHALLENGES ALONG THE WAY

1. Being Both Strategist and Executioner

One of the hardest parts of this role is that you’re both the visionary and the builder; the one drawing the map and paving the road.
That means switching between high-level strategy and hands-on experimentation daily, and doing it while bringing others along with you. There’s no playbook for this kind of work. You’re paving an unpaved road, one small experiment at a time.

‍

2. Balancing Security and Innovation

ActiveFence is the leading provider of security and safety solutions for online experiences, safeguarding more than 3 billion users, top foundation models, and the world’s largest enterprises and tech platforms every day.
As a trusted ally to major technology firms and Fortune 500 brands that build user-generated and GenAI products, ActiveFence empowers security, AI, and policy teams with low-latency Real-Time Guardrails and a continuous Red Teaming program that pressure-tests systems with adversarial prompts and emerging threat techniques. Powered by deep threat intelligence, unmatched harmful-content detection, and coverage of 117+ languages, ActiveFence enables organizations to deliver engaging and trustworthy experiences at global scale while operating safely and responsibly across all threat landscapes.