AI-Driven Big Data Engineer (PhD Required)

Posted 19 Days Ago
Be an Early Applicant
Singapore
In-Office
Mid level
Marketing Tech • Analytics
The Role
The role involves designing AI-driven big data systems, developing ML pipelines, and conducting research in distributed ML and data optimization.
Summary Generated by Built In
AI- Driven Big Data EngineerEmployment Type: Full-Time
Location: Remote, Singapore
Level: Entry to Mid Level (PhD Required)
Bridge Cutting-Edge AI Research with Petabyte-Scale Data Systems

Pixalate is an online trust and safety platform that protects businesses, consumers and children from deceptive, fraudulent and non-compliant mobile, CTV apps and websites. We're seeking a PhD-level Big Data Engineer to revolutionize how AI transforms massive-scale data operations.

Our impact is real and measurable. Our software has uncovered:

  • Gizmodo: An iCloud Feature Is Enabling a $65 Million Scam
  • Washington Post: Your kids' apps are spying on them
  • ProPublica: Porn, Piracy, Fraud: What Lurks Inside Google's Black Box Ad Empire
About the Role

Work at the intersection of big data and AI, where you'll develop intelligent, self-healing data systems processing trillions of data points daily. You'll have autonomy to pursue research in distributed ML systems and AI-enhanced data optimization, with your innovations deployed at unprecedented scale within months, not years.

This isn't traditional data engineering - you'll implement agentic AI for autonomous pipeline management, leverage LLMs for data quality assurance, and create ML-optimized architectures that redefine what's possible at petabyte scale.

Key Research Areas & ResponsibilitiesAI-Enhanced Data Infrastructure
  • Design intelligent pipelines with autonomous optimization and self-healing capabilities using agentic AI
  • Implement ML-driven anomaly detection for terabyte-scale datasets
Distributed Machine Learning at Scale
  • Build distributed ML pipelines
  • Develop real-time feature stores for billions of transactions
  • Optimize feature engineering with AutoML and neural architecture search
Required QualificationsEducation & Research
  • PhD in Computer Science, Data Science, or Distributed Systems (exceptional Master's with research experience considered)
  • Published research or expertise in distributed computing, ML infrastructure, or stream processing
Technical Expertise
  • Core Languages: Expert SQL (window functions, CTEs), Python (Pandas, Polars, PyArrow), Scala/Java
  • Big Data Stack: Spark 3.5+, Flink, Kafka, Ray, Dask
  • Storage & Orchestration: Delta Lake, Iceberg, Airflow, Dagster, Temporal
  • Cloud Platforms: GCP (BigQuery, Dataflow, Vertex AI), AWS (EMR, SageMaker), Azure (Databricks)
  • ML Systems: MLflow, Kubeflow, Feature Stores, Vector Databases, scikit-learn + search CV, H2O AutoML, auto-sklearn, GCP Vertex AI AutoML Tables
  • Neural Architecture Search: KerasTuner, AutoKeras, Ray Tune, Optuna, PyTorch Lightning + Hydra
Research Skills
  • Track record with 100TB+ datasets
  • Experience with lakehouse architectures, streaming ML, and graph processing at scale
  • Understanding of distributed systems theory and ML algorithm implementation
Preferred Qualifications
  • Experience applying LLMs to data engineering challenges
  • Ability to translate complex AutoML/NAS research into practical production workflows
  • Hands-on project examples of feature engineering automation or NAS experiments
  • Proven success in automating ML pipelines, from raw data to an optimized model architecture
  • Contributions to Apache projects (Spark, Flink, Kafka)
  • Knowledge of privacy-preserving techniques and data mesh architectures
What Makes This Role Unique

You'll work with one of the few truly petabyte-scale production datasets outside of major tech companies, with the freedom to experiment with cutting-edge approaches. Unlike traditional big data roles, you'll apply the latest AI research to fundamental data challenges - from using LLMs to understand data quality issues to implementing agentic systems that autonomously optimize and heal data pipelines.

Top Skills

Airflow
Autokeras
AWS
Azure
Dagster
Dask
Delta Lake
Flink
GCP
H2O Automl
Iceberg
Java
Kafka
Kerastuner
Kubeflow
Mlflow
Python
Pytorch Lightning
Ray
Scala
Scikit-Learn
Spark
SQL
Temporal
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto, CA
51 Employees
Year Founded: 2012

What We Do

Pixalate, the market-leading fraud protection, privacy, and compliance analytics platform for Connected TV (CTV) and Mobile Advertising. We work 24/7 to guard your reputation and grow your media value. Pixalate offers the only system of coordinated solutions across display, app, video, and OTT/CTV for better detection and elimination of ad fraud. Pixalate is an MRC-accredited service for the detection and filtration of sophisticated invalid traffic (SIVT) across desktop and mobile web, mobile in-app, and OTT/CTV advertising. www.pixalate.com

Similar Jobs

Virtu Financial Logo Virtu Financial

Research Technology Developer

Information Technology • Financial Services
In-Office
Singapore, SGP
822 Employees

Virtu Financial Logo Virtu Financial

Options Quantitative Strategist

Information Technology • Financial Services
In-Office
Singapore, SGP
822 Employees
Hybrid
Singapore, SGP
289097 Employees
Hybrid
Singapore, SGP
289097 Employees

Similar Companies Hiring

ClickMint Thumbnail
Marketing Tech • Generative AI • eCommerce • AdTech
Malibu, CA
7 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account