Senior Site Reliability Engineer (Remote) at Fetch Rewards
What we’re building and why we’re building it.
Fetch is a build-first technology company creating a rewards program to power the world. Over the last 5 years we’ve grown from 0 to 7M active users and taken over the rewards game in the US with our free app. The foundation has been laid. In the next 5 years we will become a global platform that completely transforms how people connect with brands.
It all comes down to two core beliefs. First, that people deserve to be rewarded when they create value. If a third party directly benefits from an action you take or data you provide, you should be rewarded for it. And not just the “you get to use our product!” cop-out. We’re talkin’ real, explicit value. Fetch points, perhaps.
Second, we also believe brands need a better and more direct connection with what matters most to them: their customers. -- Brands need to understand what people are doing, and have a direct line to be able to do something about it. Not just advertise, but ACT. Sounds nice right?
That’s why we’re building the world’s rewards platform. A closed-loop, standardized rewards layer across all consumer behavior that will lead to happier shoppers and stronger brands.
Fetch Rewards is an equal employment opportunity employer.
Fetch’s next step in evolving the shopping experience will require a Senior Site Reliability Engineer.
The Senior Site Reliability Engineering (SRE) team combines software and systems engineering to build and run distributed, fault-tolerant systems at scale. SRE’s ensure that Fetch’s services - both our externally visible and internally critical systems - have reliability and uptime appropriate to our users’ needs. In addition, we keep an ever watchful eye on system capacity and performance. We’re proud to be our engineers’ engineers, and much of our software development focuses on optimizing existing systems, building infrastructure, and eliminating work through automation.
Fetch’s culture of diversity, intellectual curiosity, problem solving, and openness is key to our success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage them to collaborate, think big, and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
In your tool-bag:
- Experience leading other engineers in software development or operational support.
- Experience performing incident response for a production environment.
- Experience in the design, development, and maintenance of production software.
- Experience conducting technical deep-dives into code, networking, operating systems, storage, and/or cloud provider APIs.
- Experience programming in one or more of the following: Java, Python, Go, C/C++, etc.
- Experience with Unix/Linux operating systems internals, networking, or cloud platforms (i.e., AWS, Azure, GCP).
- Experience with analyzing and troubleshooting systems.
- Bachelor's degree in Computer Science, related technical field, or equivalent practical experience.
- Experience participating in an on-call rotation providing 24x7x365 operational support.
- Experience designing, analyzing, and troubleshooting distributed systems.
- Experience designing and developing software oriented towards systems or infrastructure automation.
- Systematic problem-solving approach, coupled with effective communication skills and a sense of ownership and drive.
- Lead a team of Software/Systems Engineers on projects and be directly responsible for uptime.
- Facilitate and participate in an on-call rotation and blameless postmortems.
- Partner with fast-moving product development teams to create and maintain support procedures.
- Manage end-to-end availability and performance of key services and build automation to prevent problem recurrence.
- Automate response to all non-exceptional service conditions.
- Lead by example, mentor the team, and establish credibility through quality technical execution.
- Design, write, and deliver software to improve the availability, scalability, latency, and efficiency of Fetch’s services.