Software Engineer, Data Infrastructure

Posted 2 Days Ago
2 Locations
In-Office
216K-300K Annually
Senior level
Artificial Intelligence • Big Data • Machine Learning
The Data Platform for AI: High quality training and validation data for AI applications.
The Role
Design and build a novel, highly scalable data infrastructure to aggregate and stage simulation outputs and user inputs for LLM consumption. Architect massive batch pipelines, complex relational data models, and low-latency distributed processing. Lead technical direction, solve ambiguous first-principles problems, and optimize data for downstream AI decision systems supporting government customers.
Summary Generated by Built In

Scale AI is seeking a highly skilled and motivated Software Engineer to join our dynamic Public Sector Engineering team. As a part of this team, you will play a critical role in supporting Scale’s government customers by scoping and developing onsite solutions. Our scalable, high-performance platform is the foundation for these customer solutions, and your expertise will be instrumental in designing and implementing systems that can handle interactions with existing customer systems to help our products integrate into existing customer workflows.   

The Role
  • We are looking for an exceptional Senior Software Engineer to architect and build the foundational data infrastructure that will serve as the brain of a project ecosystem.
  • We are not looking for someone to stitch together off-the-shelf data frameworks. You will be responsible for designing highly novel data models and processing pipelines capable of handling massive quantities of output data from complex simulations.
  • At the core of this role is the challenge of building a foundational data ensemble—a unified architecture that seamlessly aggregates, structures, and stages diverse sources of simulation outputs and user inputs. Your systems will manage enormous batch throughput jobs with strict, minimal latency requirements, ensuring that downstream AI systems and language models have the exact context they need to actionably reason over complex, multi-dimensional scenarios.
Key Responsibilities
  • Architect the Data Ensemble: Design and implement the architecture to ensemble various sources of injected context (deeply structural simulation data, historical game states, and dynamic user inputs) into a unified, highly queryable format optimized for LLM consumption.
  • Massive Batch Infrastructure: Build highly scalable, resilient data architectures from scratch. You will optimize for moving, transforming, and processing massive quantities of simulation output data via enormous batch jobs, maintaining the minimal latency required for rapid wargame iterations.
  • Complex Data Modeling: Design sophisticated, highly relational data models that accurately represent massive, state-based simulation environments, making them easily interpretable by machine learning models.
  • First-Principles Problem Solving: Navigate highly ambiguous product requirements to design custom, ground-up systems where existing open-source or enterprise tools simply cannot handle the structural complexity or scale.
  • Technical Leadership: Set the technical standard for the data infrastructure team, driving rigorous code quality, system performance, and architectural clarity.
What We’re Looking For
  • Experience: 5+ years of backend or data infrastructure experience, operating at a Senior, Staff, or Principal level.
  • Engineering Excellence: Deep, expert-level proficiency in systems languages (e.g., Rust, Go, C++, or highly optimized Python/Java, Spark, PySpark) and a fundamental understanding of memory management, compute limits, and distributed systems architecture.
  • High-Throughput / Low-Latency Data: Proven track record of processing massive datasets. You understand how to optimize massive batch jobs and parallel processing across distributed simulation nodes without sacrificing speed.
  • Information Retrieval & Context Surfacing: You don't need a background in AI agents, but you must be an expert in surfacing the right needle from an ocean of hay to feed decision-making engines. We highly value engineers with backgrounds in:
    • Search & RecSys: Building complex information retrieval systems or recommendation engines.
    • Gaming / MMOs: Managing complex state, data relationships, and telemetry for massive, highly populated simulations.
    • High-Frequency Trading (HFT): Processing disparate, massive streams of data for algorithmic decision-making.
  • Mission-Driven: A strong desire to build robust, foundational technology that supports national security and defense modernization.
Nice to Have
    • Security Clearance: An active Secret or TS/SCI clearance is a nice to have for this role. If you do not have an active clearance, you must be eligible and willing to obtain one.
    • Experience with LLM context optimization, vector embeddings, or agentic AI frameworks (e.g., advanced RAG architectures).
    • Deep domain experience working with wargaming data, complex systems modeling, or distributed simulation protocols.
    • Previous experience in a high-growth, 0-to-1 startup environment.

Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.

Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is:
$216,000$300,000 USD

PLEASE NOTE: Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants.

About Us:

At Scale, our mission is to develop reliable AI systems for the world's most important decisions. Our products provide the high-quality data and full-stack technologies that power the world's leading models, and help enterprises and governments build, deploy, and oversee AI applications that deliver real impact. We work closely with industry leaders like Meta, Ernst & Young, Mayo Clinic, Time Inc., the Government of Qatar, and U.S. government agencies including the Army and Air Force. We are expanding our team to accelerate the development of AI applications.

We believe that everyone should be able to bring their whole selves to work, which is why we are proud to be an inclusive and equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability status, gender identity or Veteran status. 

We are committed to working with and providing reasonable accommodations to applicants with physical and mental disabilities. If you need assistance and/or a reasonable accommodation in the application or recruiting process due to a disability, please contact us at [email protected]. Please see the United States Department of Labor's Know Your Rights poster for additional information.

We comply with the United States Department of Labor's Pay Transparency provision

PLEASE NOTE: We collect, retain and use personal data for our professional business purposes, including notifying you of job opportunities that may be of interest and sharing with our affiliates. We limit the personal data we collect to that which we believe is appropriate and necessary to manage applicants’ needs, provide our services, and comply with applicable laws. Any information we collect in connection with your application will be treated in accordance with our internal policies and programs designed to protect personal data. Please see our privacy policy for additional information.

Skills Required

  • 5+ years of backend or data infrastructure experience at Senior, Staff, or Principal level
  • Deep proficiency in systems languages (Rust, Go, C++, highly optimized Python or Java) and Spark
  • Fundamental understanding of memory management, compute limits, and distributed systems architecture
  • Proven track record processing massive datasets and optimizing large batch jobs and parallel processing across distributed nodes
  • Designing sophisticated, highly relational data models for state-based simulation environments
  • Expertise in information retrieval, search, or recommendation systems (RecSys) or equivalent experience (gaming/MMO or HFT backgrounds valued)
  • First-principles problem solving and designing custom systems when existing tools are insufficient
  • Technical leadership setting code quality, system performance, and architectural standards for a data infrastructure team
  • Mission-driven interest in building foundational technology for national security and defense modernization
  • Willingness/eligibility to obtain security clearance; active Secret or TS/SCI is a plus
  • Experience with LLM context optimization, vector embeddings, or advanced RAG/agentic AI frameworks
  • Deep domain experience with wargaming data, complex systems modeling, or distributed simulation protocols
  • Previous experience in high-growth, 0-to-1 startup environments

Scale AI Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Scale AI and has not been reviewed or approved by Scale AI.

  • Healthcare Strength Healthcare coverage is described as comprehensive across medical, dental, and vision, with flexibility to choose plans that fit individual or family needs. A monthly wellness stipend further supports physical and mental wellbeing expenses.
  • Equity Value & Accessibility Equity-based compensation is included in eligible packages, positioning ownership as a meaningful component of total rewards for many full-time roles. An employee stock purchase plan also provides an additional pathway to participate in potential upside.
  • Leave & Time Off Breadth Paid time off is positioned as generous with a flexible policy intended to support recharging and burnout prevention. Paid holidays and paid sick days are also part of the time-off offering.

Scale AI Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
San Francisco, CA
523 Employees
Year Founded: 2016

What We Do

Scale accelerates the development of AI applications by helping machine learning teams generate high-quality ground truth data. Our advanced LiDAR, image, video and NLP annotation APIs allow machine learning teams at companies like OpenAI, Lyft, Pinterest, and Airbnb focus on building differentiated models vs. labeling data.

Similar Jobs

Peregrine Technologies, Inc. Logo Peregrine Technologies, Inc.

Staff Software Engineer

Big Data • Information Technology • Software • Analytics
In-Office
New York, NY, USA
200K-275K Annually

Ideogram Logo Ideogram

Software Engineer

Artificial Intelligence • Digital Media
In-Office
2 Locations
22 Employees
100K-100K Annually

Decagon Logo Decagon

Senior Software Engineer

Artificial Intelligence • Software
In-Office
New York City, NY, USA
49 Employees
200K-400K Annually
In-Office or Remote
Stony Brook, NY, USA
96 Employees
140K-200K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account