Senior AI Engineer, Inference

Reposted 20 Days Ago
Bellevue, WA
In-Office
180K-230K Annually
Senior level
Artificial Intelligence • Software
The Role
Develop and optimize scalable inference systems for RAG applications, focusing on performance, latency, and cross-functional collaboration.
Summary Generated by Built In


Role

We’re looking for a Senior AI Engineer to join our Inference Team, where you’ll lead the design and development of our Retrieval-Augmented Generation (RAG) infrastructure. In this role, you will work closely with ML engineers, research scientists, and product teams to power both web search and API-based experiences for millions of users with fast, accurate, and context-aware responses. 

You will architect scalable systems that combine LLMs and vector retrieval, optimizing for relevance, recall, latency, and cost. This is a high-impact role focused on AI/ML inference, retrieval performance, and significant ownership in both technical decision-making and long-term architecture. 

  

Responsibilities

  • Design, build and scale a production-grade inference stack for RAG-based applications. 

  • Develop efficient retrieval pipelines using OpenSearch or similar vector databases, with a focus on high recall and response relevance. 

  • Optimize performance and latency for both real-time and batch queries. 

  • Identify and address bottlenecks in the inference stack to improve response times and system efficiency. 

  • Ensure high reliability, observability, and monitoring of deployed systems. 

  • Collaborate with cross-functional teams to integrate LLMs and retrieval components into user-facing applications. 

  • Evaluate and integrate modern RAG frameworks and tools to accelerate development. 

  • Guide architectural decisions, mentor team members, and uphold engineering excellence. 

  

Qualifications

  • Masters or PhD degree in AI or related field, or equivalent practical experience. 

  • 8+ years of experience in software engineering, with a focus on AI/ML systems or distributed systems. 

  • Hands-on experience building and deploying retrieval-augmented generation (RAG) systems. 

  • Deep knowledge of OpenSearch, Elasticsearch, or similar search engines. 

  • Strong coding skills in Python and/or other backend languages (e.g., Rust, Java). 

  • Experience with vector search, embedding pipelines, and dense retrieval techniques. 

  • Proven ability to optimize inference stacks for latency, reliability, and scalability. 

  • Excellent problem-solving, analytical, and debugging skills. 

  • Strong sense of ownership, ability to work independently, and a self-starter mindset in fast-paced environments. 

  • Passion for building impactful technology aligned with our mission. 

  

Preferred Qualifications 

  • Experience with frameworks like LlamaIndex or LangChain. 

  • Familiarity with vector databases such as Pinecone, Qdrant, or FAISS. 

  • Exposure to LLM fine-tuning, semantic search, embeddings, and prompt engineering. 

  • Previous work on systems handling millions of users or queries per day. 

  • Familiarity with cloud infrastructure (AWS, GCP, or Azure) and containerization tools (Docker, Kubernetes). 


Work Environment

Location: This position is Onsite. This role is based at our Bellevue WA (or Pasadena, CA) office location, and employees are expected to work on-site during regular business hours. 

  

Compensation

The compensation for this position will be competitive and commensurate with experience. The estimated salary range for this role is 180,000 - 230,000 USD. 

What We Offer

  • Opportunity to work at the forefront of AI technology 

  • Collaborative and innovative work environment 

  • Competitive salary and benefits package 

  • Professional development and growth opportunities 

  • Chance to make a significant impact on the company's success 

 

Equal Employment Opportunity

  • ProRata is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All employment decisions are made based on qualifications, merit, and business needs.  

   

California Specific Notices

  • At-Will Employment: Employment at ProrataAI is at-will. This means that either the employee or the employer may terminate employment at any time, with or without cause or prior notice. 

  • Salary Disclosure: In compliance with California law, salary information is provided to ensure transparency and fairness. 

  • California Consumer Privacy Act (CCPA): ProrataAI complies with the CCPA. Personal information collected during the recruitment process will be used for employment purposes only.  

*This open position is not eligible for Company sponsorship of a visa that would require a new H-1B visa petition that is subject to the $100,000 payment requirement announced in the Presidential Proclamation titled “Restriction on Entry of Certain Nonimmigrant Workers,” dated September 19, 2025 (or any extensions or modifications of the Proclamation).

Top Skills

AWS
Azure
Docker
Elasticsearch
GCP
Java
Kubernetes
Opensearch
Python
Rust
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Pasadena, California
60 Employees
Year Founded: 2024

What We Do

ProRata.ai enables the alignment of usage and revenues between AI platforms and content owners.

Similar Jobs

NVIDIA Logo NVIDIA

Software Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
In-Office
4 Locations
21960 Employees
148K-288K Annually

Atlassian Logo Atlassian

Program Manager

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
In-Office or Remote
Seattle, WA, USA
11000 Employees
184K-295K Annually

Capital One Logo Capital One

Sales Manager

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
Tacoma, WA, USA
55000 Employees
108K-144K Annually

Whatnot Logo Whatnot

Software Engineer

eCommerce • Mobile
In-Office
4 Locations
750 Employees
140K-150K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account