Infrrd

Senior AI Systems Engineer

Reposted 19 Days Ago

Be an Early Applicant

Bangalore, Bengaluru Urban, Karnataka, IND

In-Office

60K-150K Annually

Senior level

Artificial Intelligence • Software

The Role

The Senior AI Systems Engineer builds automated systems for document extraction accuracy, develops evaluation pipelines, mentors junior engineers, and optimizes LLM processes.

Summary Generated by Built In

Hello there! Infrrd here — it’s pronounced In-fur-d.

We’re an Enterprise AI company that uses AI and Machine Learning to help global
organisations automate data extraction from complex documents — invoices, contracts,
insurance claims, and more. Our customers are some of the world’s leading enterprises in
mortgage, insurance, and manufacturing, and we’ve been profitable and independent since
2016.

Job Purpose:

To build the automated systems that measure, diagnose, and improve document extraction and classification accuracy at scale. This role eliminates the manual bottleneck in the accuracy improvement cycle — replacing brute-force prompt iteration with agentic evaluation pipelines, automated feedback loops, and intelligent internal tooling. The engineer in this role makes the entire team faster without proportionally increasing headcount, and enables systematic accuracy improvement as a repeatable engineering capability rather than an ad-hoc effort.

Job Duties and Responsibilities

Design and build agentic evaluation pipelines: error detection → root cause → hypothesis generation → prompt variant testing → A/B measurement → production promotion, with minimal human intervention.
Own the accuracy measurement infrastructure: automate error analysis, data quality pipelines, and batch evaluation frameworks across document types and customer configurations.
Build and evolve internal accuracy tooling from manual utilities into automated improvement platforms: classification and extraction correction loops, NTP rule generation, performance reporting.
Take prototype methodologies and productionize them into reliable, scalable systems the team can operate independently.
Build LLM-based extraction and classification pipelines using few-shot and RAG strategies for complex, real-world document types.
Design and maintain A/B testing infrastructure for prompt and model changes : no untested changes go to production.
Create live dashboards tracking extraction accuracy, NTP rates, and false positive rates across document types and customer configurations.
Optimize LLM costs while maintaining quality: prompt compression, output token minimization, model selection and migration strategies.
Write production-grade data pipelines with error handling, retries, logging, and monitoring.
Collaborate with platform engineering and applied research functions on architecture and methodology translation.
Mentor 1–2 junior engineers; build tooling and documentation they can operate independently.

Required Qualifications

BE / MTech in Computer Science, AI/ML, Computational Data Science (CDS), Computer Science & Automation (CSA), or related discipline.

Experience Range
8-10 years total; minimum 4-6 years building production LLM or AI systems; minimum 4-6 years in evaluation, quality measurement, or accuracy improvement work.

"Must-have" Skills

Production-grade Python : clean, tested, maintainable systems; not just scripts (pytest, FastAPI or Flask)
Hands-on LLM API experience (OpenAI, Anthropic, Gemini, AWS Bedrock or equivalent) with systematic, measurement-driven prompt engineering — methodology over instinct
Agentic pipeline design: multi-step reasoning, tool use, orchestration frameworks (LangChain, LlamaIndex or equivalent), automated evaluation and feedback loops
Evaluation framework design for LLM systems: precision/recall/F1, confusion matrices, A/B testing, per-class error analysis
Analytical depth sufficient to design meaningful accuracy metrics and interpret why a model fails on a specific document or field type
MongoDB or equivalent NoSQL: queries, aggregations, indexing pandas / numpy for data processing and batch analysis
Git, code reviews, CI/CD basics (GitHub Actions or Jenkins)
Clear written communication: able to explain model behaviour and accuracy findings to non-technical stakeholders

"Would-be-nice" Skills

Document AI: PDF parsing, layout-aware extraction, OCR, structured form extraction
RAG pipeline design and vector search (Pinecone, Weaviate, or similar)
Classification systems with large label spaces (50+ classes)
Async Python (asyncio, aiohttp) for pipeline throughput
Embedding models and semantic similarity for document matching
Prior experience working alongside a Research or Applied Science team as the engineering counterpart

Working Knowledge (Tools)
Python, FastAPI / Flask, MongoDB, Git, GitHub Actions / Jenkins, LLM APIs (OpenAI / Anthropic / Gemini or equivalent), LangChain / LlamaIndex, Pandas / Numpy, Pytest, Docker

General Knowledge

NLP concepts, LLM prompt engineering patterns, REST APIs, RAG pipelines, vector databases, JSON data structures

Thorough Knowledge

Agentic workflow design and orchestration, LLM evaluation metrics (F1 / Precision / Recall, per-class analysis, confusion matrices), production Python systems (error handling, retries, logging, monitoring), NoSQL aggregations, systematic A/B testing for model changes, prompt optimization methodology

Curious about the kind of engineering challenges we solve? Check out a few of our tech talks:

Pay Stubs Data Extraction

NLP-powered Table Extraction for Insurance Policy Data

Ally | Agentic AI for Mortgage

Automated Data Extraction from Engineering & Construction Drawings

By submitting your application, you agree that your personal information and resume may be collected, processed, and stored by us for recruitment purposes, including consideration for future roles.

Skills Required

BE / MTech in Computer Science, AI/ML, or related discipline
8-10 years total experience in AI systems
4-6 years building production LLM or AI systems
4-6 years in evaluation, quality measurement, or accuracy improvement work
Production-grade Python competence
Hands-on LLM API experience
Agentic pipeline design experience
Evaluation framework design for LLM systems
Analytical skills for accuracy metrics design
Experience with MongoDB or equivalent NoSQL
Familiar with Git, code reviews, CI/CD
Clear written communication skills

View all jobs at Infrrd

View Infrrd Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Jose, California

227 Employees

Year Founded: 2016

What We Do

At Infrrd, we help businesses handle messy, unstructured documents. Our AI-powered platform extracts, organizes, and automates data processing—so you don’t have to. From invoices and contracts to engineering drawings and handwritten notes, we handle the complex stuff that traditional OCR and automation tools can't. Our technology learns from real-world data, making it faster, more accurate, and more intelligent over time. But we don’t stop at data extraction. Infrrd automates entire workflows, cutting down manual effort, reducing errors, and speeding up processes for industries like insurance, mortgage, real estate, engineering, and logistics. Tailored for Your Industry One-size-fits-all? Not here. Infrrd’s AI is built for your industry—whether you’re in insurance, mortgage, real estate, engineering, or logistics. With over a decade of experience, we’ve fine-tuned our platform to handle industry-specific challenges with unmatched accuracy. Innovation That Keeps You Ahead We don’t just follow trends—we create them. Our dedicated R&D team is constantly pushing the boundaries of automation, bringing game-changing AI advancements to market first. With 11+ international patents (and counting!), we’re committed to making document processing smarter, faster, and easier. Meet Ally | Your AI-Powered Intelligent Agent Ally is our latest breakthrough—an AI-driven agent that takes workflow automation to the next level. Pre-trained on industry knowledge, Ally eliminates manual work, handling everything from data extraction to intuitive decision-making.