Data Engineer - Data Foundry Engineer

Reposted 4 Days Ago
Be an Early Applicant
Hiring Remotely in São Paulo, BRA
In-Office or Remote
Mid level
Artificial Intelligence • Machine Learning • Software
The Role
Join the Data Foundry team as a Data Engineer, building pipelines and infrastructure for AI workflows, ensuring data quality and governance while maintaining and optimizing ETL processes.
Summary Generated by Built In
Data Science at TRACTIAN
The Data Science team at TRACTIAN focuses on extracting valuable insights from vast amounts of industrial data. Using advanced statistical methods, algorithms, and data visualization techniques, this team transforms raw data into actionable intelligence that drives decision-making across engineering, product development, and operational strategies. The team constantly works on optimizing prediction models, identifying trends, and providing data-driven solutions that directly enhance the company’s operational efficiency and the quality of its products.
What you'll do
We're looking for a Data Engineer with a strong engineering foundation and comfort with AI workflows to join our Data Foundry team. In this role, you'll be the bridge between our model training and data annotation teams, building the pipelines and infrastructure that turn raw, messy data into gold-standard datasets ready for AI consumption.

Responsibilities

  • Design and maintain robust data pipelines to ingest from a wide range of sources, including APIs, documents, websites, and raw sensor data
  • Integrate and optimize ETL/ELT processes developed by MLE colleagues, improving performance, reliability, and long-term maintainability
  • Own the full dataset lifecycle, from raw ingestion through cleaning, validation, and delivery as training-ready data
  • Define and enforce data quality standards and governance practices across the Data Foundry team
  • Build and maintain labeling pipeline infrastructure for ML applications, working closely with the annotation team
  • Participate in architectural decisions, code reviews, and technical mentorship within the team
  • Document data sources, pipeline logic, and processing decisions for reproducibility and team alignment

Requirements

  • 3+ years of experience in data engineering
  • Degree in Computer Science, Data Engineering, Computer Engineering, Information Systems, or equivalent technical background
  • Solid understanding of the ML training lifecycle and what properties make a dataset suitable for model training
  • Familiarity with layered data architecture patterns such as Medallion Architecture (Bronze/Silver/Gold) or Data Mesh
  • Proficiency in Python, with focus on data manipulation, pipeline development, and automation
  • Workflow orchestration using code-based tools such as Temporal, Airflow, Prefect, Dagster, or equivalent
  • Distributed data processing with Spark, Databricks, or similar
  • REST and gRPC API integration
  • Strong SQL skills, both for data modeling and query optimization
  • Experience with streaming systems and event-driven pipelines (Kafka, Kinesis, or equivalent)

Soft Skills

  • Comfortable jumping into ongoing codebases and optimizing work built by others, without needing to start from scratch
  • Technology-agnostic: you evaluate tools based on what the project needs, adopt new ones quickly, and don't get attached to a specific stack
  • At ease in fast-moving environments where priorities shift and the right answer isn't always obvious
  • Engineering-first mindset: you think in pipelines, own outcomes, and care about the quality of what you ship
  • Driven by curiosity and innovation, not by comfort with a known toolset

Nice to Have

  • Experience making architectural decisions and contributing to the technical growth of a team, formally or informally
  • Go, for high-performance pipeline components
  • dbt for transformation layer modeling
  • Open table formats: Delta Lake, Apache Iceberg, or Hudi
  • Data quality frameworks such as Great Expectations or Soda
  • Cloud experience, preferably OCI (our current migration target). AWS, GCP, or Azure background is also valued
  • Rapid prototyping with Streamlit or similar tools. The use of LLMs and GenAI to speed up internal tooling and experimentation is actively encouraged
  • Experience with data annotation workflows or training dataset pipelines

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Atlanta, , Georgia
103 Employees
Year Founded: 2019

What We Do

Tractian is a machine intelligence company that offers industrial monitoring systems. Tractian builds streamlined hardware-software solutions to give maintenance technicians and industrial decision-makers comprehensive oversight of their operations. It is democratizing access to sophisticated real-time monitoring and asset operations tools. Tractian's solutions are used in environments that address a combined total of 5% of global industrial output. The company’s broad market reach is evidenced in its customer base from various industries, such as John Deere, Procter & Gamble, Caterpillar, Goodyear, Carrier, Johnson Controls, and Bimbo, the owner of the brands Little Bites and Thomas Bagels. Tractian's customers see a 6-12x ROI with savings of $6,000 per monitored machine annually on average. In a major milestone and a first for the industry, Tractian launched the AI-Assisted Maintenance category in the industrial sector. In this new paradigm, artificial intelligence identifies machine problems and suggests preventive actions to be taken, giving invaluable insight and support to maintenance professionals. It is important to highlight that the intent of Assisted Maintenance is firmly rooted in augmenting maintenance professionals to provide more assertive diagnosis with human-in-the-loop feedback. Tractian's mission is to elevate this category of workers in a highly impactful way. The Assisted Maintenance category will provide unimaginable support for maintenance professionals. By combining shop floor expertise with our technology, maintainers will be able to anticipate and address issues with unprecedented accuracy and speed

Similar Jobs

Mondelēz International Logo Mondelēz International

Aprendiz de Merchandising Recife/PE

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
Brazil
90000 Employees

Dynatrace Logo Dynatrace

Account Manager

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
Remote or Hybrid
São Paulo, BRA
5200 Employees

CrowdStrike Logo CrowdStrike

Technical Account Manager

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
Brazil
10000 Employees

CrowdStrike Logo CrowdStrike

Senior Customer Success Manager

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
2 Locations
10000 Employees

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account