AI/ML Engineer - Web Data Quality - Remote

Posted 10 Days Ago
Be an Early Applicant
5 Locations
In-Office or Remote
Mid level
Information Technology • Software • Database
The Role
The AI/ML Engineer will implement AI-driven quality checks, automate QA processes, and utilize GenAI for data validation. Responsibilities include collaborating with teams, monitoring data quality, and communicating insights effectively.
Summary Generated by Built In
Description

About Us

At Zyte, we eat data for breakfast and you can eat your breakfast anywhere and work for Zyte. Founded in 2010, we are a globally distributed team of over 250 Zytans working from over 28 countries who are on a mission to enable our customers to extract the data they need to continue to innovate and grow their businesses. We believe that all businesses deserve a smooth pathway to data

For more than a decade, Zyte has led the way in building powerful, easy-to-use tools to collect, format, and deliver web data, quickly, dependably, and at scale. And today, the data we extract helps thousands of organizations make smarter business decisions, secure competitive advantage, and drive sustainable growth. Today, over 3,000 companies and 1 million developers rely on our tools and services to get the data they need from the web.

Data QA is an important function within Zyte. The Data QA team works to ensure that the quality and usability of the data scraped by our web scrapers meets and exceeds the expectations of our enterprise clients. 

Are you passionate about data and data quality and integrity?

Do you enjoy using Python and AI to analyze and manipulate data, detect data quality issues, and visualize your findings?

Are you highly customer-focused with excellent attention to detail?

Owing to growing business and the need for ever more sophisticated Data QA, we are looking for a talented Data Scientist to join our team. As a Zyte Engineer, you work on AI-based data wrangling, data manipulation, and data visualisation techniques and apply them in the verification and validation of data quality as it pertains to data extracted from the web.

Requirements

Roles & Responsibilities:

    • Design and implement AI-driven quality checks: build models to detect anomalies, identify schema drift, and classify data errors in real time.
    • Automate and scale QA: replace manual and rule-based validation with ML-powered solutions that continuously improve.
    • Leverage GenAI for validation: use embedding models, LLMs, and prompt-driven pipelines to perform semantic checks on scraped data.
    • Develop monitoring & alerting pipelines: quantify data quality via KPIs, dashboards, and automated reports for stakeholders.
    • Experiment & innovate: research and prototype new AI techniques for QA, e.g. using embeddings, synthetic data, and reinforcement learning to stress-test scrapers.
    • Collaborate cross-functionally: work with developers, product managers, and account teams to integrate AI-based QA into production workflows.
    • Communicate insights: present findings with clear visualizations, metrics, and evidence-based recommendations to technical and non-technical audiences.

Requirements:

    • Proficiency in Python & PyData stack (NumPy, pandas, scikit-learn, PyTorch/TensorFlow preferred).
    • 3+ years in a data science, applied ML, or data engineering role (ideally with exposure to QA or data validation at scale).
    • Hands-on experience with GenAI tools: LLM APIs (OpenAI, Anthropic, Google), prompt engineering, cost/token optimization.
    • Strong ML fundamentals: anomaly detection, classification, clustering, embeddings, evaluation metrics.
    • Experience with big data frameworks (Spark, BigQuery, or similar).
    • Ability to work with very large datasets (millions+ of records).
    • Version control skills (GitHub/Bitbucket).
    • Excellent communication in English, both technical and non-technical.

Desired Skills:

    • Prior experience in data quality automation or web data QA.
    • Familiarity with LangChain, MCP, Marvin, or similar orchestration frameworks.
    • Experience building QA dashboards or visualization layers.
    • Background in statistics or applied mathematics.
    • Previous remote/distributed work experience.
Benefits

As a new Zytan, you will:

Become part of a self-motivated, progressive, multi-cultural team.

Have the freedom and flexibility to work from where you do your best work.

Attend conferences and meet with team members from across the globe.

Work with cutting-edge open source technologies and tools.

Top Skills

BigQuery
Bitbucket
Genai Tools
Git
Llm Apis
Numpy
Pandas
Pydata Stack
Python
PyTorch
Scikit-Learn
Spark
TensorFlow
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Cork
219 Employees
Year Founded: 2010

What We Do

At Zyte, we’re all about empowering data-driven organizations to ethically and accurately collect web data to power their business. With over 14 years experience and our early authorship and ongoing maintenance of Scrapy, we’ve shaped the web scraping industry from Day 1.

We help our clients…

- With easy-to-use ways to collect, format and deliver web data, quickly, dependably and at scale,
- Spend more time gleaning insights from highly accurate, business-critical data, and
- Spend less money on the total cost of ownership in web data extraction.

Zyte API abstracts away a historically disparate web data extraction tech stack into a single tool. Zyte API automates most anti-bot and proxy management, so developers can spend more time on strategy.

Zyte API is a full-stack solution that crawls, unblocks and extracts data in minutes with the power of AI. Developers skip the hassle of creating manual parsing code and extract public data at unlimited scale.

Zyte Data is an expert web data extraction team in your pocket. Our white glove service extracts any web data your business needs, regardless of project size and complexity. This includes a dedicated team and round-the-clock support.

Zyte’s legal team is our backbone and is made up of the leading minds in web data extraction compliance. They stay on top of the ever-changing and opaque laws that loom over the industry. They evaluate compliance risks and inform customers about best practices.

Zyte is certified by and a co-founder of the Ethical Web Data Collection Initiative (EWDCI) which recognizes web data providers operating with the highest level of ethical and legal standards.

Come work for us!

We encourage a flexible and diverse work environment, so we embraced the benefits of remote work from our very early beginnings. Our team includes over 200 employees in over 30 countries. All sharing the same drive, to do more with web data.

Similar Jobs

GitLab Logo GitLab

Back-end Engineer

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
31 Locations
98K-210K Annually

GitLab Logo GitLab

Back-end Engineer

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
31 Locations
98K-210K Annually

GitLab Logo GitLab

Intermediate Backend (Ruby/Go) Engineer, GitLab Delivery: Operate

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
In-Office or Remote
35 Locations
98K-210K

GitLab Logo GitLab

Senior Security Engineer

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
28 Locations

Similar Companies Hiring

Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Sailor Health Thumbnail
Telehealth • Software • Social Impact • Healthtech
New York City, NY
20 Employees
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account