Data Engineer - Data Scraping

Sorry, this job was removed at 08:10 p.m. (CST) on Monday, Jun 02, 2025
Be an Early Applicant
Hiring Remotely in São Paulo
In-Office or Remote
Artificial Intelligence • Machine Learning • Software
The Role
Data Science at TRACTIAN

The Data Science team at TRACTIAN focuses on extracting valuable insights from vast amounts of industrial data. Using advanced statistical methods, algorithms, and data visualization techniques, this team transforms raw data into actionable intelligence that drives decision-making across engineering, product development, and operational strategies. The team constantly works on optimizing prediction models, identifying trends, and providing data-driven solutions that directly enhance the company’s operational efficiency and the quality of its products.

What you'll do

We’re looking for software and data engineers to join our newly established Data Gathering and Labeling (DGL) team. In this role, you'll be critical to building Tractian's comprehensive and diverse datasets, from industrial equipment documentation to sensor data like vibration and temperature. Your work will directly power new features in our platform and enhance our competitive advantage through richer and more reliable data resources.

Responsibilities

  • Design and maintain robust data collection pipelines from a wide range of sources, including websites, documents, APIs, and raw sensor data
  • Extract and structure information from unstructured or semi-structured formats into clean, standardized schemas
  • Handle real-world data challenges like pagination, rate limits, CAPTCHAs, noise, missing values, and inconsistent formatting
  • Clean, filter, and validate raw data to ensure high quality, consistency, and usability across our systems
  • Develop small tools and utilities to support and automate data collection workflows
  • Support the creation and maintenance of labeling pipelines for ML applications
  • Collaborate with engineering and product teams to optimize data storage and access patterns
  • Document data sources, collection methodologies, and processing procedures for reproducibility

Requirements

  • 0–2 years of experience in software development, data engineering, or related fields
  • Degree in Computer Science, Computer Engineering, Information Systems, or equivalent technical background
  • Understanding of HTML, CSS selectors, and how web pages are structured
  • Strong problem-solving skills and an eye for detail
  • Ability to work in a fast-paced environment and manage shifting priorities

Technical Skills

  • Proficiency in Python, especially for data manipulation and automation
  • Experience (academic or professional) with data extraction using tools like `requests`, `BeautifulSoup`, or similar
  • Familiarity with REST APIs and the HTTP protocol
  • Experience with data cleaning techniques such as:
  • Handling missing or inconsistent values
  • Removing duplicates and outliers
  • Standardizing formats (e.g., dates, units, text normalization)
  • Validating data against schemas or expected ranges
  • (Optional) Exposure to browser automation tools like Selenium or Playwright

Nice to Have

  • Experience with web scraping libraries/frameworks like Scrapy, Playwright, or Selenium
  • Familiarity with proxy usage, headless browsers, or CAPTCHA bypass techniques
  • Understanding of database systems (SQL or NoSQL)
  • Exposure to rapid prototyping tools like Streamlit
  • Previous experience working with or around industrial equipment or maintenance systems

Similar Jobs

ServiceNow Logo ServiceNow

Architect

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
São Paulo, BRA
28000 Employees

ServiceNow Logo ServiceNow

Consultant

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
São Paulo, BRA
28000 Employees
Remote or Hybrid
São Paulo, BRA
1100 Employees

ServiceNow Logo ServiceNow

Consultant

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
São Paulo, BRA
28000 Employees
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Atlanta, , Georgia
103 Employees
Year Founded: 2019

What We Do

Tractian is a machine intelligence company that offers industrial monitoring systems. Tractian builds streamlined hardware-software solutions to give maintenance technicians and industrial decision-makers comprehensive oversight of their operations. It is democratizing access to sophisticated real-time monitoring and asset operations tools.

Tractian's solutions are used in environments that address a combined total of 5% of global industrial output. The company’s broad market reach is evidenced in its customer base from various industries, such as John Deere, Procter & Gamble, Caterpillar, Goodyear, Carrier, Johnson Controls, and Bimbo, the owner of the brands Little Bites and Thomas Bagels. Tractian's customers see a 6-12x ROI with savings of $6,000 per monitored machine annually on average.

In a major milestone and a first for the industry, Tractian launched the AI-Assisted Maintenance category in the industrial sector. In this new paradigm, artificial intelligence identifies machine problems and suggests preventive actions to be taken, giving invaluable insight and support to maintenance professionals. It is important to highlight that the intent of Assisted Maintenance is firmly rooted in augmenting maintenance professionals to provide more assertive diagnosis with human-in-the-loop feedback.

Tractian's mission is to elevate this category of workers in a highly impactful way. The Assisted Maintenance category will provide unimaginable support for maintenance professionals. By combining shop floor expertise with our technology, maintainers will be able to anticipate and address issues with unprecedented accuracy and speed

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account