What You'll Accomplish
- Design and Deliver High-Impact Solutions: Design and implement systems that enhance reliability, observability, traceability, and incident management, ensuring the platform scales effectively
- Lead Strategic Initiatives: Take ownership of cross-team collaborations and drive impactful projects by providing technical leadership and guidance
- Partner Across Teams: Collaborate with engineers from AI/ML, Data, Platform, and Product teams to develop best-in-class services
- Partner with engineers from AI/ML, Data, Platform, Product, and other groups to deliver best-in-class services
- Establish Standards and Best Practices: Define and enforce production standards, processes, and tools to ensure operational excellence
- Champion Reliability Goals: Advocate for and implement SLIs, SLOs, and other reliability-focused metrics across the engineering organization
- Mentorship and Knowledge Sharing: Guide and mentor team members, fostering technical growth and helping to develop the next generation of engineering leaders
- Innovate and Inspire: Drive continuous improvement by bringing creative ideas and challenging the status quo
Your Expertise
- 7+ years of experience in Production Engineering, Backend Engineering, SRE, DevOps or similar role
- Proficient Problem-Solver: Strong coding ability in at least one language (e.g., Golang, Python, Java, Typescript) with the capability to solve complex issues through code
- Track Record of Success: Demonstrated experience delivering medium to large-scale projects that drive meaningful improvements in platform reliability and scalability
- Reliability Expertise: Deep understanding of production reliability concepts, including SLIs, SLOs, and incident management
- Strong Communicator: Excellent verbal and written communication skills with the ability to influence and collaborate across technical and non-technical teams
- Fast-Paced Experience: Familiarity with working in dynamic, reliability-focused production environments (preferred)
What We Use
- Our infrastructure runs primarily in Kubernetes hosted in AWS’s EKS
- Infrastructure tooling includes Istio, Datadog, Terraform, CloudFlare, and Helm
- Our backend is Java / Spring Boot microservices, built with Gradle, coupled with things like DynamoDB, Kinesis, AirFlow, Postgres, Planetscale, and Redis, hosted via AWS
- Our frontend is built with React and TypeScript, and uses best practices like GraphQL, Storybook, Radix UI, Vite, esbuild, and Playwright
- Our automation is driven by custom and open source machine learning models, lots of data and built with Python, Metaflow, HuggingFace 🤗, PyTorch, TensorFlow, and Pandas
Similar Jobs
What We Do
Attentive® is the AI marketing platform for leading brands, designed to optimize message performance through 1:1 SMS and email interactions. Infusing intelligence at every stage of the consumer’s purchasing journey, Attentive empowers businesses to achieve hyper-personalized communication with their customers on a large scale. Leveraging AI-powered tools, a mobile-first approach, two-way conversations, and enterprise-grade technology, Attentive drives billions in online revenue for brands around the globe. Trusted by over 8,000 leading brands such as CB2, Urban Outfitters, GUESS, Dickey’s Barbeque Pit, and Wyndham Resort, Attentive is the go-to solution for delivering powerful commerce experiences for consumers with the brands they love.
To learn more about Attentive or to request a demo, visit www.attentive.com or follow us on LinkedIn, X (formerly Twitter), or Instagram.
Why Work With Us
At Attentive, you'll connect with inspiring, high-caliber people, and be encouraged to take risks, get creative, and think bigger. We're solving big problems for our customers, through our innovative AI solutions, giving employees the opportunity to thrive along the journey. The sky's the limit when it comes to what's possible.
Gallery









