Machine Learning Engineer - Web Data Quality

Reposted 8 Days Ago
Be an Early Applicant
Hiring Remotely in Rio de Janeiro, Rio, Rio de Janeiro
In-Office or Remote
Mid level
Information Technology • Software • Database
The Role
You will design and implement AI systems to assess and improve the quality of web datasets, collaborating with data, product, and engineering teams.
Summary Generated by Built In

At Zyte, we make the world’s web data accessible to everyone. Our technology powers data extraction at scale, helping businesses and researchers unlock the full potential of the web.

We’re a remote-first, multicultural team of engineers, data scientists, and innovators who believe in curiosity, collaboration, and continuous learning. If you’re passionate about building reliable AI systems and improving the quality of web data, we’d love to hear from you.

About the Role

As a Machine Learning Engineer (Web Data Quality), you’ll design and implement intelligent systems that automatically detect, measure, and improve the quality of large-scale web datasets. You’ll work at the intersection of data science, AI, and distributed systems, collaborating closely with product, engineering, and data teams to make data accuracy measurable, scalable, and actionable.


RequirementsWhat You’ll Do
  • Develop and deploy ML models for anomaly detection, schema drift, and content validation
  • Build and improve data quality pipelines leveraging modern data and MLOps tools
  • Design and optimize embeddings and GenAI models to enhance data consistency
  • Collaborate with engineers to integrate AI systems into production workflows
  • Conduct experiments, evaluate performance, and iterate for continuous improvement
  • Stay up to date on AI/ML and GenAI research to guide innovation within Zyte
Required
  • 3+ years of experience in Machine Learning / Data Science / AI Engineering
  • Strong Python skills and experience with ML frameworks (PyTorch, TensorFlow, scikit-learn)
  • Experience with data validation, anomaly detection, or data quality systems
  • Familiarity with data pipelines (Airflow, Spark, or similar)
  • Understanding of model evaluation, metrics, and deployment best practices
  • Excellent problem-solving, communication, and collaboration skills
Preferred
  • Experience with LangChain, LlamaIndex, or GenAI model orchestration
  • Familiarity with data labeling tools and active learning approaches
  • Contributions to open-source or public ML projects
  • Experience working in a remote, cross-functional team environment

Benefits
  • 35 days of paid time off
  • Health & wellness support
  • Inclusive and supportive team environment
  • Attend conferences and meet with team members from across the globe.
  • Work with cutting-edge open source technologies and tools

Top Skills

Airflow
Python
PyTorch
Scikit-Learn
Spark
TensorFlow
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Cork
219 Employees
Year Founded: 2010

What We Do

At Zyte, we’re all about empowering data-driven organizations to ethically and accurately collect web data to power their business. With over 14 years experience and our early authorship and ongoing maintenance of Scrapy, we’ve shaped the web scraping industry from Day 1.

We help our clients…

- With easy-to-use ways to collect, format and deliver web data, quickly, dependably and at scale,
- Spend more time gleaning insights from highly accurate, business-critical data, and
- Spend less money on the total cost of ownership in web data extraction.

Zyte API abstracts away a historically disparate web data extraction tech stack into a single tool. Zyte API automates most anti-bot and proxy management, so developers can spend more time on strategy.

Zyte API is a full-stack solution that crawls, unblocks and extracts data in minutes with the power of AI. Developers skip the hassle of creating manual parsing code and extract public data at unlimited scale.

Zyte Data is an expert web data extraction team in your pocket. Our white glove service extracts any web data your business needs, regardless of project size and complexity. This includes a dedicated team and round-the-clock support.

Zyte’s legal team is our backbone and is made up of the leading minds in web data extraction compliance. They stay on top of the ever-changing and opaque laws that loom over the industry. They evaluate compliance risks and inform customers about best practices.

Zyte is certified by and a co-founder of the Ethical Web Data Collection Initiative (EWDCI) which recognizes web data providers operating with the highest level of ethical and legal standards.

Come work for us!

We encourage a flexible and diverse work environment, so we embraced the benefits of remote work from our very early beginnings. Our team includes over 200 employees in over 30 countries. All sharing the same drive, to do more with web data.

Similar Jobs

Remote
Brazil
150 Employees

SailPoint Logo SailPoint

Counsel

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
119 Locations
2461 Employees
151K-280K Annually

Motorola Solutions Logo Motorola Solutions

Senior Full-stack Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
Brazil
21000 Employees

Hinge Logo Hinge

Public Relations Manager

Artificial Intelligence • Machine Learning • Mobile • Other • Social Impact • Software • App development
Easy Apply
Remote or Hybrid
Brazil
305 Employees
213K-256K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account