ProRata.ai

Senior AI Engineer, Inference

Sorry, this job was removed at 08:09 p.m. (CST) on Friday, Nov 14, 2025

Bellevue, WA

In-Office

180K-230K Annually

Artificial Intelligence • Software

The Role

Role

We’re looking for a Senior AI Engineer to join our Inference Team, where you’ll lead the design and development of our Retrieval-Augmented Generation (RAG) infrastructure. In this role, you will work closely with ML engineers, research scientists, and product teams to power both web search and API-based experiences for millions of users with fast, accurate, and context-aware responses.

You will architect scalable systems that combine LLMs and vector retrieval, optimizing for relevance, recall, latency, and cost. This is a high-impact role focused on AI/ML inference, retrieval performance, and significant ownership in both technical decision-making and long-term architecture.

Responsibilities

Design, build and scale a production-grade inference stack for RAG-based applications.
Develop efficient retrieval pipelines using OpenSearch or similar vector databases, with a focus on high recall and response relevance.
Optimize performance and latency for both real-time and batch queries.
Identify and address bottlenecks in the inference stack to improve response times and system efficiency.
Ensure high reliability, observability, and monitoring of deployed systems.
Collaborate with cross-functional teams to integrate LLMs and retrieval components into user-facing applications.
Evaluate and integrate modern RAG frameworks and tools to accelerate development.
Guide architectural decisions, mentor team members, and uphold engineering excellence.

Qualifications

Masters or PhD degree in AI or related field, or equivalent practical experience.
8+ years of experience in software engineering, with a focus on AI/ML systems or distributed systems.
Hands-on experience building and deploying retrieval-augmented generation (RAG) systems.
Deep knowledge of OpenSearch, Elasticsearch, or similar search engines.
Strong coding skills in Python and/or other backend languages (e.g., Rust, Java).
Experience with vector search, embedding pipelines, and dense retrieval techniques.
Proven ability to optimize inference stacks for latency, reliability, and scalability.
Excellent problem-solving, analytical, and debugging skills.
Strong sense of ownership, ability to work independently, and a self-starter mindset in fast-paced environments.
Passion for building impactful technology aligned with our mission.

Preferred Qualifications

Experience with frameworks like LlamaIndex or LangChain.
Familiarity with vector databases such as Pinecone, Qdrant, or FAISS.
Exposure to LLM fine-tuning, semantic search, embeddings, and prompt engineering.
Previous work on systems handling millions of users or queries per day.
Familiarity with cloud infrastructure (AWS, GCP, or Azure) and containerization tools (Docker, Kubernetes).

Work Environment

Location: This position is Onsite. This role is based at our Bellevue WA (or Pasadena, CA) office location, and employees are expected to work on-site during regular business hours.

Compensation

The compensation for this position will be competitive and commensurate with experience. The estimated salary range for this role is 180,000 - 230,000 USD.

What We Offer

Opportunity to work at the forefront of AI technology
Collaborative and innovative work environment
Competitive salary and benefits package
Professional development and growth opportunities
Chance to make a significant impact on the company's success

Equal Employment Opportunity

ProRata is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All employment decisions are made based on qualifications, merit, and business needs.

California Specific Notices

At-Will Employment: Employment at ProrataAI is at-will. This means that either the employee or the employer may terminate employment at any time, with or without cause or prior notice.
Salary Disclosure: In compliance with California law, salary information is provided to ensure transparency and fairness.
California Consumer Privacy Act (CCPA): ProrataAI complies with the CCPA. Personal information collected during the recruitment process will be used for employment purposes only.

*This open position is not eligible for Company sponsorship of a visa that would require a new H-1B visa petition that is subject to the $100,000 payment requirement announced in the Presidential Proclamation titled “Restriction on Entry of Certain Nonimmigrant Workers,” dated September 19, 2025 (or any extensions or modifications of the Proclamation).

View all jobs at ProRata.ai

View ProRata.ai Profile

Report Job