Data Engineer

Posted Yesterday
Be an Early Applicant
Toronto, ON, CAN
Hybrid
Mid level
Artificial Intelligence • Marketing Tech • Software • SEO
The Role
Design, build, and maintain Databricks/Spark ETL pipelines and Bronze-Silver-Gold architectures; implement event-driven streaming with Pub/Sub and Protocol Buffers; prepare datasets for LLM fine-tuning and embeddings; integrate third-party sources (Shopify, Klaviyo); implement attribution, CLV, and analytics models; ensure data quality, monitoring, and idempotent retry-safe pipelines.
Summary Generated by Built In
About Us

The search bar is becoming a conversation. Brands need to know how to get found by AI, and that's what we do. Yolando is the platform that helps marketers understand and improve how AI models discover, cite, and recommend their brand.

We've raised $8.5M from Drive Capital and MaRS Discovery District. We're 15 people building the standard for Generative Engine Optimization.

Role Overview

We are seeking a skilled Data Engineer to build the backbone of our AI platforms, Yolando and BirdseyePost. You will design and maintain sophisticated ETL pipelines using Databricks and Spark, ensuring the reliable flow of data that powers our insights and ML models. You will implement Bronze-Silver-Gold medallion architectures and build event-driven flows to process streaming data for real-time analytics. In this role, you will prepare datasets for LLM fine-tuning and drive the integration of third-party sources to enable data-driven decision-making at scale.

Key Responsibilities
  • Build and Optimize Data Pipelines: Design, build, and maintain ETL pipelines using Databricks and Spark for processing customer data, campaign analytics, and AI model inputs. Implement Bronze-Silver-Gold medallion architectures for reliable data transformation.

  • Enable Real-Time Data Processing: Build event-driven data flows using GCP Pub/Sub and Protocol Buffers. Process streaming data for real-time analytics, attribution tracking, and AI system inputs.

  • Power AI and ML Systems: Prepare and manage datasets for LLM fine-tuning, embedding generation, and recommendation systems. Build pipelines that feed vector databases (pgvector) with processed embeddings for semantic search.

  • Integrate Third-Party Data Sources: Build reliable ingestion pipelines for platforms like Klaviyo, Shopify, and marketing APIs. Handle incremental loads, schema evolution, and data quality validation.

  • Drive Analytics and Attribution: Implement attribution models, customer lifetime value (CLV) calculations, and campaign performance analytics. Build data models that power dashboards and enable data-driven decision making.

  • Ensure Data Quality and Reliability: Implement data validation, monitoring, and alerting for pipeline health. Build idempotent, retry-safe pipelines that handle failures gracefully.

What We're Looking For
  • 4+ years data engineering experience.

  • Strong proficiency in Python and SQL for data transformation.

  • Production experience with Spark (PySpark) and distributed data processing.

  • Experience with cloud data platforms (Databricks, BigQuery, Snowflake, or similar).

  • Solid understanding of data modeling patterns (dimensional modeling, medallion architecture).

  • Experience with event streaming systems (Pub/Sub, Kafka, or similar).

  • Familiarity with GCP or other major cloud platforms.

  • Track record of building reliable, scalable pipelines in production.

Bonus if you have:
  • Experience with Databricks Asset Bundles or similar deployment frameworks.

  • Background in ML data pipelines: feature engineering, embedding generation, model serving data.

  • Familiarity with Protocol Buffers or other schema evolution tools.

  • Experience with vector databases and embedding workflows.

  • Background in marketing data: attribution, customer analytics, campaign tracking.

  • Experience with e-commerce data sources (Shopify, Klaviyo, marketing platforms).

Our Stack
  • Data Processing: Databricks, Apache Spark, PySpark, dbt

  • Event Streaming: GCP Pub/Sub, Protocol Buffers

  • Storage: BigQuery, AlloyDB (PostgreSQL), Cloud Storage

  • ML/AI Data: pgvector, embedding pipelines, LLM training data

  • Infrastructure: GCP, Terraform, Kubernetes, GitHub Actions

  • Languages: Python 3.11, SQL

Why Join Us?
  • Join an innovative, fast-growing startup building cutting-edge AI marketing solutions.

  • Make a meaningful impact by shaping the platform's user experience, design identity, and overall success.

  • Dynamic environment with opportunities for real ownership, learning, and growth.

  • Competitive salary and support for professional development.

How to Apply
  • Please send your resume, portfolio, and a brief note about why you're interested in joining us.

  • We'd love to see your work and hear your story!

  • This is a hybrid role, with 4 days per week in our downtown Toronto office.

Skills Required

  • 4+ years data engineering experience
  • Strong proficiency in Python
  • Strong proficiency in SQL
  • Production experience with Spark (PySpark)
  • Experience with cloud data platforms (Databricks, BigQuery, Snowflake, or similar)
  • Solid understanding of data modeling patterns (dimensional modeling, medallion architecture)
  • Experience with event streaming systems (GCP Pub/Sub, Kafka, or similar)
  • Familiarity with GCP or other major cloud platforms
  • Track record of building reliable, scalable pipelines in production
  • Experience with Databricks Asset Bundles or similar deployment frameworks
  • Background in ML data pipelines: feature engineering, embedding generation, model serving data
  • Familiarity with Protocol Buffers or other schema evolution tools
  • Experience with vector databases and embedding workflows (pgvector)
  • Background in marketing data, attribution, customer analytics, campaign tracking
  • Experience with e-commerce data sources (Shopify, Klaviyo, marketing platforms)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
30 Employees

What We Do

Yolando is a SaaS platform that helps marketing teams optimize their brand's visibility and representation across AI platforms. By focusing on Generative Engine Optimization (GEO) and Answer Engine Optimization (AEO), Yolando analyzes AI-generated responses and provides actionable recommendations and content strategies to improve how brands are cited and perceived by large language models like ChatGPT, Gemini, and Claude.

Similar Jobs

CrowdStrike Logo CrowdStrike

Data Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
7 Locations
10000 Employees
195K-320K Annually

Samsara Logo Samsara

Senior Data Engineer

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
Canada
4000 Employees
136K-160K Annually

Samsara Logo Samsara

Senior Software Engineer

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
Canada
4000 Employees
126K-163K Annually

BMO Logo BMO

Data Engineer

Financial Services
In-Office
2 Locations
51885 Employees
76K-142K Annually

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account