Maze (mazehq.com)

Infra/DevOps Engineer

Reposted 2 Days Ago

28 Locations

Remote

Senior level

Artificial Intelligence • Security • Cybersecurity

AI meets Vulnerability Management.

The Role

As an Infra/DevOps Engineer, you'll design, implement, and maintain our infrastructure, ensuring scalability, security, and reliability while collaborating with engineers and optimizing performance.

Summary Generated by Built In

Summary of the Role:

As Infra/DevOps Engineer at Maze, you'll be the architect of our complex, multi-account Kubernetes infrastructure, building and scaling the foundation that powers our AI-driven cybersecurity platform across isolated enterprise environments. This is a unique opportunity to join as one of the early engineering team members of a well-funded startup building at the intersection of generative AI and cybersecurity. You'll design, code, and maintain sophisticated infrastructure spanning 12-15 AWS accounts, each with dedicated Kubernetes clusters, ensuring complete data segregation for our security-conscious enterprise customers.

You'll take full ownership of our infrastructure-as-code implementation, managing multiple Kubernetes clusters at scale using cutting-edge tools like Karpenter, Flux, and Kustomize. Your success will be measured by infrastructure reliability, deployment velocity, and your ability to build self-managed, distributed systems that scale elegantly as we grow from startup to enterprise scale. This role is perfect for a hands-on infrastructure engineer who has mastered complex Kubernetes deployments at scale, writes production-grade infrastructure code, and thrives on building simple, elegant solutions to complex distributed systems challenges.

Your Contributions to Our Journey:

Architect Multi-Cluster Kubernetes Infrastructure: Design, implement, and write infrastructure-as-code for our complex Kubernetes setup spanning multiple AWS accounts, ensuring each cluster is completely isolated for enterprise security requirements while maintaining operational efficiency
Build Self-Managed, Distributed Systems: Develop infrastructure that manages itself through GitOps workflows using Flux and Kustomize, creating distributed systems where actions in one place automatically trigger appropriate changes across the infrastructure without manual intervention
Scale Kubernetes Operations: Manage and optimize dozens of Kubernetes clusters across our multi-tenant and single-tenant environments, implementing auto-scaling solutions with Karpenter and ensuring seamless scaling as customer workloads grow exponentially
Develop Production-Grade Automation: Write robust, maintainable code to build and maintain CI/CD pipelines, custom automation tools, and deployment scripts that enable rapid feature delivery while maintaining the highest reliability standards
Ensure Enterprise Security: Implement security best practices and compliance measures that protect our highly sensitive security data, managing firewalls, encryption, IAM policies, and network segregation across our multi-account AWS architecture
Optimize Platform Performance: Build comprehensive monitoring, logging, and alerting systems that proactively identify issues, using tools like Prometheus and Grafana to ensure our infrastructure scales efficiently as we handle increasingly complex workloads
Enable Engineering Velocity: Work closely with backend and data engineering teams to build self-service infrastructure capabilities, allowing teams to provision databases, deploy services, and scale resources independently without constant infrastructure team involvement

What You Need to Be Successful:

Kubernetes Mastery at Scale: 5+ years of infrastructure/DevOps experience with deep, hands-on expertise managing complex Kubernetes deployments—you must have experience with multiple Kubernetes clusters (tens of clusters) in sophisticated setups, not just simple single-cluster environments
GitOps and Modern K8s Tooling: Proven production experience with Karpenter (for auto-scaling), Flux (for GitOps), and Kustomize (for configuration management)—if you have these three, you'll be a fish in the water with our infrastructure approach
AWS Infrastructure Expertise: Deep knowledge of AWS with hands-on experience managing complex multi-account architectures, understanding how to design for isolation, security, and scalability across numerous AWS accounts with proper networking and IAM configuration
Infrastructure-as-Code Excellence: Strong coding skills with production experience using Terraform or CloudFormation, writing maintainable, well-architected infrastructure code that follows best practices and scales with organizational growth. Proficiency in Python is essential for automation, tooling, and infrastructure development
Hands-On Coding: Currently active as a developer writing production code in Python for infrastructure automation, custom tooling, and operational scripts—you're not just an architect who delegates implementation
Simplicity-Driven Architecture: Proven ability to build simple, elegant solutions to complex infrastructure problems—you instinctively know the "right way" to use tools like Helm charts and avoid over-engineering while maintaining scalability
Platform Thinking: Experience building infrastructure with a platform mindset, creating systems that support multiple products and enable team self-service rather than building one-off solutions for individual applications
AWS Managed Services Philosophy: Understanding of when to use AWS managed services (RDS, MSK, EMR) versus building custom solutions, with experience scaling startups using managed services efficiently before investing in complex self-hosted infrastructure
Distributed Systems Mindset: Deep understanding of distributed systems principles with experience building infrastructure that is decentralized rather than centralized, allowing independent operation across multiple clusters and regions
Nice to haves:
- Experience with AWS auto-scaling across complex, multi-cluster environments
- Background in security-focused infrastructure or handling sensitive enterprise data
- Previous experience at scale-ups that grew infrastructure from 20-100+ engineers
- Knowledge of infrastructure observability tools beyond Prometheus/Grafana (e.g., ELK Stack)
- Track record of building infrastructure that went through SOC2, ISO, or similar compliance certifications

Why Join Us:

Ambitious Infrastructure Challenges: We're using generative AI (LLMs and agents) to solve critical cybersecurity challenges, requiring sophisticated infrastructure that handles sensitive security data across isolated enterprise environments. You'll build the foundation for breakthrough AI-powered security solutions at unprecedented scale.
Expert Team: We are a team of hands-on leaders with deep experience in Big Tech and Scale-ups. Our team has been part of the leadership teams behind multiple acquisitions and an IPO.
Impactful Work: Cybersecurity is a force for good—helping stop cyber attacks ultimately helps deliver better outcomes for all of us. The infrastructure you build will directly enable security teams to protect organizations worldwide from real threats.
Build an AI-Native Company: We're building a new company in the AI era with the opportunity to design everything from the ground up—you'll architect infrastructure using cutting-edge Kubernetes practices and establish platform standards that will scale with us from startup through hypergrowth.
Technical Leadership Growth: Direct partnership with experienced engineering leadership, significant equity upside, and the opportunity to own and shape the entire infrastructure function as we scale our platform to support the world's largest enterprises.