At TripActions, we are constantly striving to make the most reliable and scalable systems possible to ensure that our platform is available to our travelers when they need it most. With our exponential growth, we have many exciting challenges up ahead. We are expanding our Site Reliability Engineering team to tackle these obstacles, and provide world class availability to our travellers.
We are looking for a passionate Senior Site Reliability Engineer to design and develop the tooling, automation and infrastructure services that power the TripActions app used by thousands of travelers on a daily basis. You will work most closely with the Core engineering team, and will have many friendly peers and cross-functional partners in the Amsterdam and Palo Alto offices to see and work with regularly.
The impact you’ll make:
Building a fast moving, high growth service. TripActions is revolutionizing travel and expense management, and the product is evolving quickly. You are comfortable in a startup environment, enjoy seeing the product take shape, and have strong ownership of the success of your services.
Designing, implementing and operating cloud infrastructure. You’re a fit for us if you think in terms of infrastructure as code, deployment pipelines, and building the guardrails to make going fast also going safely.
Identifying reliability anti-patterns and solving them systemically. You dive deep into the data to evaluate the health of your systems, and you use it to improve visibility and reliability across the fleet of services.
Finding and automating the toil out of our processes. You’d prefer to automate it entirely, or build a tool to empower your users rather than be the gatekeeper to the tool.
What We're Looking For:
- 5+ years of experience as an SRE (or Infrastructure Software Engineer, or DevOps Engineer)
- Building and operating distributed systems in AWS, using CI/CD to ship code to production using tools such as maven and Jenkins.
- Experience with microservice architecture and related reliability patterns such as throttling, queueing, and retries
- Writing Infrastructure as Code in Terraform or Cloudformation
- You have been automating away manual tasks using python, bash and ruby
- Building, using, and automating monitoring systems such as SignalFX, Kibana, Grafana
- Strong sense of ownership demonstrated through shipping production-quality code and infrastructure equipped with testing, monitoring and documentation
- Passion for solving problems and learning new tools and technologies
- Excellent communication skills working with stakeholders and domain experts across the company to design solutions to user problems
- Ability to thrive in a fast-paced environment
- Experience with Java based applications and services including jvm profiling and performance tuning
- Experience building CI/CD pipelines from scratch and scaling them up
- Database experience with RDS (mysql), Couchbase and/or Elasticsearch