Apify

Data Engineer

Reposted 2 Days Ago

Be an Early Applicant

Prague, CZE

Hybrid

Mid level

Web3 • Automation

The Role

As a Data Engineer, you will manage integration between Snowflake and various operational tools, ensuring data accuracy and availability for Sales, Marketing, and Product teams. Responsibilities include building reliable data pipelines, designing CDP layers, and resolving pipeline incidents.

Summary Generated by Built In

Apify is the largest marketplace of tools for AI. 30,000+ Actors helping people and agents get real-time web data, track competitors, generate leads, or integrate their apps. Actors are built by a global creator community that now earns more than $1M every month.

Join us to help people put the web to work. Apify can find missing children, protect consumers from fake discounts across the EU, and feed data to AI chatbots.

We're looking for a Data Engineer to own the integration layer between Snowflake and the operational tools that run Apify's go-to-market and product motion: HubSpot, Intercom, Mixpanel, and Segment. You'll make sure the right data lands in the right system at the right time, with the right shape, so Sales, Marketing, Customer Success, and Product teams can act on it.

You'll be the 9th member of the data team - joining a mix of analytical engineers, analysts, and data scientists - at the moment Segment is being rolled out as Apify's CDP. That's yours to land end-to-end.

What you'll be working on:

Own the integration domain end to end - all pipelines, transformations, and Snowflake models that connect HubSpot, Intercom, Mixpanel, and Segment to the rest of the platform, in both directions.
Design event tracking and the CDP layer with the RevOps team as Segment becomes the source of truth for behavioral data flowing into product, marketing, and CRM systems.
Build reliable, observable pipelines in Keboola and dbt - with clear data contracts, schema tests, freshness guarantees, and alerting.
Model integration data in Snowflake so HubSpot, Intercom, Mixpanel, and Segment data lands in well-defined tables that downstream consumers can trust, with documentation that analysts and scientists can actually use.
Power lifecycle automations - PQA scores back into HubSpot, behavioral campaigns in Intercom and customer.io, product usage signals - by shipping the data they depend on.
Diagnose and resolve pipeline incidents independently - trace lineage across multiple components, find root causes, fix, and write the runbook so it doesn't bite the next person.

Tech stack

Snowflake - data warehouse
Keboola - extractors, writers, and orchestration
dbt - transformations on Snowflake (orchestrated by Keboola; this is where we're actively migrating existing transformation logic)
Tableau and Redash - BI
n8n - workflow automation
Segment - CDP, currently being rolled out end-to-end

Who we're looking for:

3+ years of data engineering experience, with meaningful time spent on integrations between a cloud warehouse and operational SaaS tools (HubSpot, Salesforce, Intercom, Zendesk, Mixpanel, Amplitude, Segment, RudderStack, or similar).
Fluent in SQL (window functions, CTEs, complex multi-source joins, query optimization) and comfortable in Python for the parts a no-code tool can't handle.
Production experience with Snowflake (or BigQuery, Databricks, Redshift), and an understanding of the cost, performance, and access-control tradeoffs of a usage-based warehouse.
Experience building end-to-end pipelines combining an orchestration or ELT platform (Keboola, Fivetran, Airflow, Dagster, Prefect, Matillion) with a transformation framework like dbt.
Hands-on experience with a CDP (Segment, RudderStack, mParticle) - tracking plans, schemas, identity resolution, downstream consumers - not just installing the snippet.
You think in data contracts - schema stability, freshness SLAs, documented field definitions - and treat the boundary between your domain and downstream consumers as a first-class interface.
Comfortable with reverse ETL (Census, Keboola, or hand-rolled), and you understand what it means to write back to a CRM that humans are also editing.
Pragmatic about tooling - happy to use n8n for the right job, and equally happy to write proper code when that's the right call.
Able to explain why a dashboard moved and what it means to non-technical stakeholders in Sales, Marketing, and Customer Success, in English, both in writing and in person.

Nice to have:

Experience with usage-based billing or product-led growth data models.
Exposure to LLM-assisted workflows in the data stack.
Prior experience at a SaaS company between 50 and 500 people.

By the end of the first month, we expect you to:

Know the data team, the RevOps and Growth stakeholders who depend on the integration layer, and the workflows that flow through HubSpot, Intercom, Mixpanel, and Segment.
Work through the existing Keboola components and dbt models to understand what's in place, what's fragile, and where the silent failures live.
Trace a typical record from each source system through to the Snowflake tables analysts use.

By the end of the first 3 months, we expect you to:

Have a complete map of the integration domain - what flows where, what's owned by whom, where the silent failures are - and a documented six-month plan for the work ahead.
Have at least one end-to-end improvement shipped with monitoring in place.
Be the go-to person on the data team for HubSpot, Intercom, Mixpanel, and Segment data questions.

By the end of the first 6 months, we expect you to:

Have Segment operating as the durable CDP for Apify, with a published tracking plan and reliable event flows into Snowflake and downstream tools.
Have core tables from HubSpot, Intercom, Mixpanel, and Segment with documented data contracts - schema, freshness SLA, ownership - and tests and alerting in place.
Have driven measurable improvements in data freshness, pipeline reliability, and incident response time, tracked publicly, and shipped at least one cross-team initiative where the data integration unlocked a business outcome (conversion lift, churn reduction, ops automation).

Why should you work at Apify?

Space, support, and autonomy for personal growth, with a direct impact on our success
Full-time position in Prague (Lucerna Palace)
Flexible working hours (perfect for both night owls 🦉 and early birds 🐥)
Nobody counts holidays as long as the work gets done 💪
Unlimited Claude for every Apifier. We don't count tokens. Just use them well 🤖
Stock options and profit sharing 💰
Free Multisport card
We welcome pets, kids, and bikes in the office
Epic team buildings and offsites 🚢 with biking, canoeing, and other adventures 🪂
Solid education and training budget, conference tickets, internal “Eat & Learn” sessions, and the possibility to work across teams
Generous hardware budget 💻
Free lunches every day when working from the office 🌮🥡
Unlimited supply of ☕ & 🍺 and snacks
Free entry to the wonderful Prague and Brno Zoo 🐘
Ping-pong, chess, PS5, lightsabers, foosball league after lunch.

For more details about Apify and what it’s like to work with us, see our Careers page.

Skills Required

3+ years of data engineering experience with integrations
Fluent in SQL and comfortable in Python
Production experience with Snowflake or similar data warehouses
Experience building end-to-end data pipelines
Hands-on experience with a Customer Data Platform
Ability to explain data implications to non-technical stakeholders

View all jobs at Apify

View Apify Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Prague

97 Employees

Year Founded: 2015

What We Do

Apify is a full-stack web scraping and browser automation platform that lets you extract data from websites and automate workflows on the web. With Apify, you can turn any website into an API!