Hello there! Infrrd here — it’s pronounced In-fur-d.
We’re an Enterprise AI company that uses AI and Machine Learning to help global
organisations automate data extraction from complex documents — invoices, contracts,
insurance claims, and more. Our customers are some of the world’s leading enterprises in
mortgage, insurance, and manufacturing, and we’ve been profitable and independent since
2016.
Job Purpose:
To build the automated systems that measure, diagnose, and improve document extraction and classification accuracy at scale. This role eliminates the manual bottleneck in the accuracy improvement cycle — replacing brute-force prompt iteration with agentic evaluation pipelines, automated feedback loops, and intelligent internal tooling. The engineer in this role makes the entire team faster without proportionally increasing headcount, and enables systematic accuracy improvement as a repeatable engineering capability rather than an ad-hoc effort.
Job Duties and Responsibilities
- Design and build agentic evaluation pipelines: error detection → root cause → hypothesis generation → prompt variant testing → A/B measurement → production promotion, with minimal human intervention.
Own the accuracy measurement infrastructure — automate error analysis, data quality pipelines, and batch evaluation frameworks across document types and customer configurations. - Build and evolve internal accuracy tooling from manual utilities into automated improvement platforms — classification and extraction correction loops, NTP rule generation, performance reporting.
- Take prototype methodologies and productionize them into reliable, scalable systems the team can operate independently.
- Build LLM-based extraction and classification pipelines using few-shot and RAG strategies for complex, real-world document types.
- Design and maintain A/B testing infrastructure for prompt and model changes — no untested changes go to production.
- Create live dashboards tracking extraction accuracy, NTP rates, and false positive rates across document types and customer configurations.
- Optimize LLM costs while maintaining quality: prompt compression, output token minimization, model selection and migration strategies.
- Write production-grade data pipelines with error handling, retries, logging, and monitoring.
- Collaborate with platform engineering and applied research functions on architecture and methodology translation.
- Mentor 1–2 junior engineers; build tooling and documentation they can operate independently.
Required Qualifications
BE / MTech in Computer Science, AI/ML, Computational Data Science (CDS), Computer Science & Automation (CSA), or related discipline.
Experience Range
8-10 years total; minimum 4-6 years building production LLM or AI systems; minimum 4-6 years in evaluation, quality measurement, or accuracy improvement work.
"Must-have" Skills
- Production-grade Python — clean, tested, maintainable systems; not just scripts (pytest, FastAPI or Flask)
- Hands-on LLM API experience (OpenAI, Anthropic, Gemini, AWS Bedrock or equivalent) with systematic, measurement-driven prompt engineering — methodology over instinct
- Agentic pipeline design — multi-step reasoning, tool use, orchestration frameworks (LangChain, LlamaIndex or equivalent), automated evaluation and feedback loops
- Evaluation framework design for LLM systems — precision/recall/F1, confusion matrices, A/B testing, per-class error analysis
- Analytical depth sufficient to design meaningful accuracy metrics and interpret why a model fails on a specific document or field type
- MongoDB or equivalent NoSQL — queries, aggregations, indexing pandas / numpy for data processing and batch analysis
- Git, code reviews, CI/CD basics (GitHub Actions or Jenkins)
- Clear written communication — able to explain model behaviour and accuracy findings to non-technical stakeholders
"Would-be-nice" Skills
- Document AI: PDF parsing, layout-aware extraction, OCR, structured form extraction
- RAG pipeline design and vector search (Pinecone, Weaviate, or similar)
- Classification systems with large label spaces (50+ classes)
- Async Python (asyncio, aiohttp) for pipeline throughput
- Embedding models and semantic similarity for document matching
- Prior experience working alongside a Research or Applied Science team as the engineering counterpart
Working Knowledge (Tools)
Python, FastAPI / Flask, MongoDB, Git, GitHub Actions / Jenkins, LLM APIs (OpenAI / Anthropic / Gemini or equivalent), LangChain / LlamaIndex, Pandas / Numpy, Pytest, Docker
General Knowledge
NLP concepts, LLM prompt engineering patterns, REST APIs, RAG pipelines, vector databases, JSON data structures
Thorough Knowledge
Agentic workflow design and orchestration, LLM evaluation metrics (F1 / Precision / Recall, per-class analysis, confusion matrices), production Python systems (error handling, retries, logging, monitoring), NoSQL aggregations, systematic A/B testing for model changes, prompt optimization methodology
By submitting your application, you agree that your personal information and resume may be collected, processed, and stored by us for recruitment purposes, including consideration for future roles.
Skills Required
- BE / MTech in Computer Science, AI/ML, or related discipline
- 8-10 years total experience in AI systems
- 4-6 years building production LLM or AI systems
- 4-6 years in evaluation, quality measurement, or accuracy improvement work
- Production-grade Python competence
- Hands-on LLM API experience
- Agentic pipeline design experience
- Evaluation framework design for LLM systems
- Analytical skills for accuracy metrics design
- Experience with MongoDB or equivalent NoSQL
- Familiar with Git, code reviews, CI/CD
- Clear written communication skills
What We Do
At Infrrd, we help businesses handle messy, unstructured documents. Our AI-powered platform extracts, organizes, and automates data processing—so you don’t have to. From invoices and contracts to engineering drawings and handwritten notes, we handle the complex stuff that traditional OCR and automation tools can't. Our technology learns from real-world data, making it faster, more accurate, and more intelligent over time. But we don’t stop at data extraction. Infrrd automates entire workflows, cutting down manual effort, reducing errors, and speeding up processes for industries like insurance, mortgage, real estate, engineering, and logistics. Tailored for Your Industry One-size-fits-all? Not here. Infrrd’s AI is built for your industry—whether you’re in insurance, mortgage, real estate, engineering, or logistics. With over a decade of experience, we’ve fine-tuned our platform to handle industry-specific challenges with unmatched accuracy. Innovation That Keeps You Ahead We don’t just follow trends—we create them. Our dedicated R&D team is constantly pushing the boundaries of automation, bringing game-changing AI advancements to market first. With 11+ international patents (and counting!), we’re committed to making document processing smarter, faster, and easier. Meet Ally | Your AI-Powered Intelligent Agent Ally is our latest breakthrough—an AI-driven agent that takes workflow automation to the next level. Pre-trained on industry knowledge, Ally eliminates manual work, handling everything from data extraction to intuitive decision-making.








