Intern for AI Security RL Gyms

Posted 12 Hours Ago
Be an Early Applicant
Hiring Remotely in Vietnam
Remote or Hybrid
Internship
Security • Software • Generative AI
Trust, safety, and security for the GenAI era 🛡️
The Role
Create adversarial prompt-injection tasks in YAML for RL Gym environments: define agent personas, simulated environments, adversarial payloads, benign user requests, run tasks against frontier models, validate success rates, and submit passing tasks with deterministic evaluation criteria.
Summary Generated by Built In
Description

Alice builds adversarial evaluation infrastructure used by the world's leading AI labs to stress-test their most capable agents before deployment. We're hiring interns for our RL Gym environments: adversarial training grounds where AI agents face prompt injection, social engineering, and data exfiltration scenarios.

The Role: You design prompt injection scenarios in YAML, run them against frontier models, validate success rates, and submit passing tasks.

The Task

You build adversarial prompt injection tasks for Alice's RL Gym platform. Each task is a self-contained YAML scenario simulating a realistic AI agent deployment, testing whether the agent can be manipulated into violating its safety policies.

What a task includes:

  1. An agent persona with a system prompt, security policies, and a defined set of tools.
  2. A simulated environment (e-commerce site, messaging platform, enterprise app) with content the agent interacts with.
  3. An adversarial payload embedded in the environment (in messages, product listings, tool responses, documents) that attempts to trick the agent into a policy violation.
  4. A benign user request that naturally leads the agent to encounter the payload.
  5. Deterministic evaluation criteria specifying exactly what constitutes success (agent resists) vs. failure (agent is compromised).
Requirements

What We're Looking For

  • Strong problem-solving skills and curiosity about AI security.
  • Interest in adversarial thinking and understanding how AI agents can be manipulated through prompt injection or other attack techniques.
  • Basic understanding of prompt injection concepts (or willingness to learn).
  • Comfortable writing structured content in YAML or able to learn quickly.
  • Familiar with using the command line (CLI); experience with Docker is a plus but not required.
  • Detail-oriented and able to follow technical guidelines consistently.
  • Good command of English.
  • Background in Computer Science, Cybersecurity, AI, Software Engineering, or related fields is preferred.

What We Offer

  • Hands-on experience working on one of the most cutting-edge AI safety and security projects.
  • Internship Allowance.
  • Mentorship from experienced AI security and red teaming professionals.
  • Opportunity to contribute to evaluation environments used by leading AI labs.

If you’re eager to learn, innovate, and grow in the field of data engineering, we’d love to hear from you. Apply today to be part of a team that values creativity and technical excellence!

Please note that only shortlisted candidates will be contacted.

About Alice

ActiveFence is the leading tool stack for Trust & Safety teams, worldwide. By relying on ActiveFence’s end-to-end solution, Trust & Safety teams – of all sizes – can keep users safe from the widest spectrum of online harms, unwanted content, and malicious behavior, including child safety, disinformation, fraud, hate speech, terror, nudity, and more.

Using cutting-edge AI and a team of world-class subject-matter experts to continuously collect, analyze, and contextualize data, ActiveFence ensures that in an ever-changing world, customers are always two steps ahead of bad actors. As a result, Trust & Safety teams can be proactive and provide maximum protection to users across a multitude of abuse areas in 70+ languages.

Backed by leading Silicon Valley investors such as CRV and Norwest, ActiveFence has raised $100M to date; employs 300 people worldwide, and has contributed to the online safety of billions of users across the globe.

Skills Required

  • Strong problem-solving skills and curiosity about AI security
  • Interest in adversarial thinking and prompt injection attack techniques
  • Basic understanding of prompt injection concepts or willingness to learn
  • Comfortable writing structured content in YAML or able to learn quickly
  • Familiar with using the command line (CLI)
  • Experience with Docker
  • Detail-oriented and able to follow technical guidelines consistently
  • Good command of English
  • Background in Computer Science, Cybersecurity, AI, Software Engineering, or related fields
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Ramat Gan
413 Employees

What We Do

Alice is a trust, safety, and security company built for the AI era. We safeguard the communicative technologies people use to create, collaborate, and interact - whether with each other or with machines. In a world where AI has fundamentally changed the nature of risk, Alice provides end-to-end coverage across the entire AI lifecycle. We support frontier model labs, enterprises, and UGC platforms with a comprehensive suite of solutions: from model hardening evaluations and pre-deployment red-teaming to runtime guardrails and ongoing drift detection. Alice represents the next chapter of our growth and the natural evolution of ActiveFence, our industry-leading solution for UGC safety, as we expand our mission to secure the future of AI. Advance unafraid: alice.io

Similar Jobs

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Quality Auditor, Footwear

eCommerce • Fashion • Retail • Sales • Wearables • Design
Remote or Hybrid
Haiphong, VNM
16000 Employees

Mondelēz International Logo Mondelēz International

Analytics Manager

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
4 Locations
90000 Employees

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Sr. Analyst, Costing

eCommerce • Fashion • Retail • Sales • Wearables • Design
Remote or Hybrid
Haiphong, VNM
16000 Employees

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Quality Auditor, Footwear

eCommerce • Fashion • Retail • Sales • Wearables • Design
Remote or Hybrid
Haiphong, VNM
16000 Employees

Similar Companies Hiring

Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
LTX Thumbnail
Conversational AI • Generative AI
Jerusalem, Israel
360 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account