Maze (mazehq.com)

Backend Engineer (Data Engineering)

Reposted 5 Days Ago

28 Locations

Remote

Senior level

Artificial Intelligence • Security • Cybersecurity

AI meets Vulnerability Management.

The Role

As a Backend Engineer (Data Engineering), design and build scalable data pipelines and infrastructure for processing security data, ensuring reliability and performance.

Summary Generated by Built In

Summary of the Role:

As a Backend Engineer (Data Engineering) at Maze, you'll be the technical architect behind our data infrastructure, building production-grade data pipelines that process massive volumes of security data at scale. This is a unique opportunity to join as one of the early engineering team members of a well-funded startup building at the intersection of generative AI and cybersecurity. You'll design and implement the data lake architecture, streaming pipelines, and transformation systems that power our AI agents' ability to analyze and protect customer environments.

You'll take full ownership of the entire data lifecycle—from ingesting millions of records in minutes to building self-managed, distributed data systems on AWS. Your success will be measured by pipeline reliability, data processing performance, and your ability to scale our infrastructure to handle exponentially growing data volumes as we bring on enterprise customers. This role is perfect for a software engineer who specializes in building data systems at scale, thinks in terms of platform architecture, and thrives on solving the complex technical challenges that come with processing security data for the world's largest organizations.

Your Contributions to Our Journey:

Build Production Data Pipelines: Design, implement, and maintain scalable data pipelines that ingest gigabytes to terabytes of security data daily, processing millions of records in single-digit minutes while maintaining reliability and data quality
Architect Distributed Data Systems: Build and evolve our S3-based data lake infrastructure using Apache Iceberg, creating self-managed, distributed systems that enable rapid data transformations and efficient storage at massive scale
Own the Complete Data Lifecycle: Take end-to-end ownership from data ingestion through Kafka streams to transformation via Spark/EMR, ensuring seamless data flow from customer environments to our AI-powered analysis platform
Enable Platform Scalability: Build data infrastructure with platform thinking, creating systems that support current product needs while laying the foundation for future products and exponential data growth
Optimize for Enterprise Scale: Continuously improve data processing performance and cost efficiency as we scale from current volumes to supporting the world's largest enterprise security environments
Drive Technical Excellence: Establish data engineering best practices, participate in code reviews as a software engineer, and mentor team members on building robust, maintainable data systems
Collaborate Cross-Functionally: Work closely with infrastructure engineers, backend engineers, and product teams to ensure data systems seamlessly integrate with our AI agents and security analysis capabilities

What You Need to Be Successful:

Software Engineering Foundation: 7+ years of software engineering experience with at least 4+ years focused specifically on data engineering—you must be a strong software engineer who will pass our coding challenges, not just someone who transforms existing data
Production Data Pipeline Mastery: Proven track record building and scaling data ingestion systems that handle gigabytes to terabytes daily, with hands-on experience at companies moving massive data volumes (early Fivetran/Matillion engineers, or engineers at companies feeding Databricks/Snowflake at scale)
Core Technology Expertise: Deep, hands-on production experience with Python (essential), Apache Kafka, and Apache Spark—these are your bread and butter technologies that you use daily and know intimately, not just tools you've experimented with. Strong Python coding skills are non-negotiable for this role
AWS Data Infrastructure: Strong expertise with AWS data services including S3, EMR, and building data lakes at scale—you intuitively design systems using AWS and understand how to architect for both performance and cost optimization
Data Lake Architecture: Proven experience with Apache Iceberg (critical—this is our core technology), data lakehouse concepts, and building distributed data systems that process massive datasets efficiently
Attention to Detail: Exceptional care and precision in data ingestion and transformation work—you understand that in security data, accuracy and reliability are non-negotiable
Scale Experience: Direct experience working at companies that deal with serious data scale—environments where Redshift and BigQuery aren't sufficient and you've had to architect custom solutions for petabyte-scale challenges
Hands-On Builder: Currently active as a developer, writing production code regularly—you're not someone who just designs systems or reviews others' work, you build and own them yourself
Nice to haves:
- Experience with Temporal workflow orchestration (very important for our architecture)
- Knowledge of Apache Hoodie, Parquet, or ORC file formats for optimized data storage
- Background with RDS, PostgreSQL optimization, or other database performance tuning
- Previous experience at technical security product companies or handling security-related data
- Track record of building self-service data platforms that enable other teams to operate independently

Why Join Us:

Ambitious Data Challenges: We're leveraging advanced data processing at the intersection of generative AI and cybersecurity, building systems that process security data at massive scale to enable proactive threat detection. You'll architect the data infrastructure that powers breakthrough AI capabilities for security teams worldwide.
Expert Team: We are a team of hands-on leaders with deep experience in Big Tech and Scale-ups. Our team has been part of the leadership teams behind multiple acquisitions and an IPO.
Impactful Work: Cybersecurity is a force for good—helping stop cyber attacks ultimately helps deliver better outcomes for all of us. The data systems you build will directly enable security teams to protect organizations from real threats.
Build an AI-Native Company: We're building a new company in the AI era with the opportunity to design everything from the ground up—you'll architect the data foundation using cutting-edge technologies like Apache Iceberg and build systems with platform thinking from day one.
Technical Growth: Direct partnership with experienced infrastructure and engineering teams, significant equity upside, and the opportunity to own the data engineering function as we scale from startup to handling the world's largest security datasets.