Research Engineer Intern, Evaluations

Reposted 9 Days Ago
Be an Early Applicant
San Francisco, CA, USA
In-Office
Internship
Artificial Intelligence • Software • Automation
The Role
Intern will create evaluation frameworks for AI agents, benchmark models, and develop automated assessments for data-focused tasks in AI systems.
Summary Generated by Built In
Research Engineer Intern, Evaluations & Benchmarks
Location: San Francisco (Hybrid)

About TensorStax:

TensorStax is building fully autonomous AI systems to manage and optimize mission-critical data infrastructure. Our research integrates reinforcement learning and language models to enhance reasoning over large-scale data lakes and warehouses, detect failures in pipelines, and autonomously construct and optimize data workflows with high precision.

We are looking for a Research Engineer Intern to design evaluation frameworks and benchmarks that assess the autonomy, adaptability, and reliability of AI agents in data engineering environments. This role is ideal for candidates passionate about AI evaluations, language model benchmarking, and autonomous data systems.

What You’ll Do:

  • Develop evaluation environments to test AI agents' ability to reason, plan, and act autonomously within mission-critical data pipelines.
  • Design benchmarks to assess model capabilities in failure detection, pipeline optimization, and agentic decision-making in data workflows.
  • Implement automated assessment frameworks for language model-based agents operating over data lakes and warehouses.
  • Work with synthetic and real-world datasets to create robust testing environments for AI-driven data automation.
  • Collaborate with research engineers to refine reward shaping strategies, guiding models toward more efficient and agentic behaviors in data-intensive tasks.

What We’re Looking For:

  • Experience in language model research, with a focus on benchmarking LLMs in mission-critical domains.
  • Strong background in AI evaluation methodologies, reinforcement learning, and RLHF techniques.
  • Familiarity with benchmarking language models for structured and unstructured data tasks.
  • Proficiency in Python and experience with ML frameworks like PyTorch or JAX.
  • Hands-on experience with data lakes, warehouses, and data engineering tools (Snowflake, BigQuery, dbt, Spark, Kafka).
  • High agency—proactive, resourceful, and comfortable working in a fast-paced research environment with minimal supervision.
  • Attention to detail—ability to design rigorous, reproducible experiments and evaluations.

Bonus Points:

  • Contributions to open-source AI benchmarks (e.g., SweBench, BIRD, SPIDER).
  • Contributions to open-source agentic frameworks.
  • Experience developing custom RL environments for AI evaluation.
  • Strong understanding of ETL, ELT, and data transformation pipelines.

Benefits:

  • Competitive internship stipend.
  • 100% employer-covered health, dental, and vision insurance (for eligible interns).
  • Access to Bay Club or Equinox in San Francisco.
  • Opportunity to work at the cutting edge of AI evaluations and autonomous data engineering research.

Skills Required

  • Experience in language model research focusing on benchmarking LLMs
  • Strong background in AI evaluation methodologies and reinforcement learning
  • Proficiency in Python and experience with ML frameworks like PyTorch or JAX
  • Hands-on experience with data lakes and data engineering tools
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
4 Employees

What We Do

Autonomous AI to help build and maintain data pipelines using your infrastructure.

Similar Jobs

UL Solutions Logo UL Solutions

Field Evaluations Engineer - West US Region

Automotive • Professional Services • Software • Consulting • Energy • Chemical • Renewable Energy
Remote or Hybrid
Cañada De Los Coches, CA, USA
15000 Employees
97K-120K Annually

Wells Fargo Logo Wells Fargo

Relationship Banker Reseda

Fintech • Financial Services
Remote or Hybrid
California, USA
205000 Employees
27K-41K Hourly
Hybrid
Ontario, CA, USA
205000 Employees

Wells Fargo Logo Wells Fargo

Client Performance Analyst 1

Fintech • Financial Services
Hybrid
San Diego, CA, USA
205000 Employees
82K-125K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account