Site Reliability Engineers (SREs) are responsible for the overall performance and reliability of ASAPP's infrastructure and products. The team owns the entire infrastructure stacks. SREs design and implement the tools that automate building reliable and performant systems. We emphasize building tools over manual processes. We implement, not administer. We’re obsessed with automation, not repetition. Our job is to focus on building reliable infrastructure and tools for our product teams so that they can solve customer problems and deliver new features, not reinvent platforms.
What you'll do
- Work with product engineering teams on service architecture and implementation
- Deliver Infrastructure configuration as code and automate everything
- Direct and implement monitoring and alerting systems to support rapid problem diagnosis
- Perform Root Cause Analysis and design and deliver resolutions
- Work on our Kubernetes / AWS infrastructure to support our product engineers
- Design secure and performant networking solutions in our production systems
What you'll need
- +4 years of relevant experience bringing software to production at high scale
- Participation in on-call rotation, triaging and addressing production issues
- Obsession with automation and instrumentation
- Understanding of complex systems and failure scenarios
- Excellent communication skills
- Knowledge of AWS services, containers and container management frameworks
- Familiarity with Message Bus based systems and distributed architectures
- Proficiency in Terraform , Python and/or Go
What we'd like to see
- BS or MS degree in the Computer Science field, or equivalent hands-on experience.
- Experience in product oriented environments
- Scalable distributed applications experience
Benefits
- Competitive compensation with stock options
- Comprehensive medical, vision, and dental insurance
- 401k matching
- Fitness and wellness stipend
- Mobile phone reimbursement
- Mental well-being benefits
- Professional learning and development stipend
- Parental leave, including adoptive and foster parents
- 3 weeks paid time off (increases with tenure) and unlimited sick leave
Skills Required
- 10 years of experience bringing software to production at high scale
- Understanding of distributed systems architecture and failure scenarios
- Knowledge of AWS services, containers and container management frameworks
- Proficiency in Terraform, Python and/or Go
- Participation in on-call rotation, triaging and addressing production issues
What We Do
Our artificial intelligence and machine learning products deliver automation and human augmentation, allowing individuals and organizations to realize their full potential. Today, the world's largest organizations rely on ASAPP to provide amazingly efficient and effective customer experiences. Our Research & Development team is unparalleled, driving the advancement of AI, machine learning, speech recognition, robotic process automation, natural language processing and more.
.png)


.png)





