Data Architect - Annotation (India)

Posted 24 Days Ago
Be an Early Applicant
Hiring Remotely in India
Remote
5-5 Annually
Senior level
Artificial Intelligence • Food • Software
The Role
The Data Architect - Annotation will manage data workflows, ensure data quality for AI systems, perform statistical analysis, and work closely with engineering teams to enhance annotation processes.
Summary Generated by Built In

As a Data Architect - Annotation, you’ll serve as the critical bridge between the Prompt Engineering team and the Data Labeling team, ensuring that the data feeding our AI systems is clean, consistent, and production-ready. You will own the workflows that generate, organize, and maintain high-quality datasets across multiple modalities, while using LLMs, automation, and statistical analysis to detect anomalies and improve data quality at scale. 

Your work will directly influence the reliability of our VoiceAI and AI-driven products by ensuring that labeling pipelines, annotation standards, and evaluation data are robust enough to support high-stakes, real-world restaurant operations.

Essential Job Functions:

  • Data Operations & Workflow Ownership
    • Act as the transition point between Prompt Engineering and Data Labeling, translating model and product requirements into concrete data and annotation workflows. 
    • Design, implement, and maintain scalable data workflows for dataset generation, curation, and ongoing maintenance. 
    • Ensure data quality and consistency across labeling projects, with a focus on operational reliability for production AI systems. 
  • Annotation & Quality Management
    • Create, review, and maintain high-quality annotations across multiple modalities, including text, audio, conversational transcripts, and structured datasets. 
    • Identify labeling inconsistencies, data errors, and edge cases; propose and enforce corrective actions and improvements to annotation standards. 
    • Utilize platforms such as Labelbox, Label Studio, or Langfuse to manage large-scale labeling workflows and enforce consistent task execution. 
  • Automation, Tooling & LLM-Assisted QA
    • Use Python and SQL for data extraction, validation, transformation, and workflow automation across labeling pipelines. 
    • Leverage LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation of annotation outputs. 
    • Implement automated QA checks and anomaly-detection mechanisms to scale quality assurance for large datasets. 
  • Analysis, Metrics & Continuous Improvement
    • Analyze annotation performance metrics and quality trends to surface actionable insights that improve labeling workflows and overall data accuracy. 
    • Apply statistical analysis to detect data anomalies, annotation bias, and quality issues, and partner with stakeholders to mitigate them. 
    • Collaborate with ML and Operations teams to refine labeling guidelines and enhance instructions based on observed patterns and error modes. 
  • Cross-Functional Collaboration & Documentation
    • Work closely with Prompt Engineering, Data Labeling, and ML teams to ensure that data operations align with model requirements and product goals. 
    • Document data standards, annotation guidelines, and workflow best practices for use by internal teams and external labeling partners. 

Requirements
  • Experience with data annotation and hands-on use of platforms such as Labelbox, Label Studio, or Langfuse for managing large-scale labeling workflows. 
  • Proficiency in Python and SQL for data extraction, validation, and workflow automation in a data operations or data engineering context. 
  • Hands-on experience using LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation. 
  • Demonstrated experience working with large-scale / high-volume datasets. 
  • At least one prior role where data workflow automation is explicitly part of the job scope or responsibilities. 
  • Ability to perform statistical analysis to detect data anomalies, annotation bias, and quality issues. 
  • Strong requirement-elicitation and communication skills, with a process-driven and detail-oriented mindset when working with cross-functional teams. 

Qualifications: 

  • B.S. or higher in a quantitative discipline (Data Science, Computer Science, Engineering, or related field) 
  • 5+ years of relevant experience with a B.S. degree, or 3+ years of experience with a Master's degree
  • Demonstrated proficiency in SQL for reporting and Python for automation and scripting
  • Academic or applied research experience related to the NLP, LLM Benchmarking dataset is a strong plus
  • Must be flexible to work during US hours (until at least 1:30 PM EST)for this role.

Top Skills

Claude
Gemini
Gpt-4
Label Studio
Labelbox
Langfuse
Python
SQL
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Chicago, IL
299 Employees
Year Founded: 2016

What We Do

Checkmate empowers enterprise restaurant brands with powerful ordering solutions and hands-on support. Our scalable technology enables restaurants to drive sales across channels, including custom websites, apps, kiosks, catering, third-party marketplaces, voice AI, and more. With seamless integrations, smarter analytics, and 24/7 service, Checkmate helps brands conquer their digital goals. Restaurants can launch unique ordering experiences, centrally manage menus, recapture revenue, leverage customer data, and continually adapt with new integrations. Regardless of how you want to grow, Checkmate has the tools and guidance to power, manage, and evolve your digital business.

Similar Jobs

Tufin Logo Tufin

Technical Support

Security • Cybersecurity
Remote or Hybrid
Gurugram, Haryana, IND
500 Employees

Rapid7 Logo Rapid7

Senior Technical Trainer

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
Pune, Maharashtra, IND
2400 Employees

BlackLine Logo BlackLine

Software Engineer

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Remote or Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
1810 Employees

Mondelēz International Logo Mondelēz International

Product Manager

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
Mumbai, Maharashtra, IND
90000 Employees

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account