Smarsh

Sr. Site Reliability Engineer I

Sorry, this job was removed at 01:49 p.m. (CST) on Thursday, Aug 08, 2024

Portland, OR

Hybrid

115K-145K Annually

7+ Years Experience

Software

The Role

Summary

As a Sr. Site Reliability Engineer, you are instrumental in helping make our Petabyte scale Kubernetes-centric ProArchive application resilient. This position will coordinate with multiple teams to develop a migration plan for various components and services as well as implement best practices for our tech stack. A person in this position will have a passion for getting things done for various functions, including automation, CI/CD, infra components, middleware, etc. You’ll work closely with our Dev Engineering, QA, and Platform Engineering groups to manage our current on-prem deployments and on-prem & cloud-native infrastructures.

How will you contribute?

Help define technology choices, best practices and process for the team.
Develop and maintain documentation standard for the team.
Develop new tools and libraries for broader use by SaaS Operations and Engineering teams. Enable engineering teams to discover and understand problems quicker.
Work with product architects and make suggestions for architectural changes and design platform component roadmaps.
Act as a subject matter expert (SME) for components and functions desired. Develop the skill as required, to become SME for components in need.
Assist engineering teams in deep troubleshooting and application code review to find opportunities to improve performance and scalability.
Work closely with Engineering and peer SRE teams to design and use Smarsh coding standards and best practices.
Respond to incidents coordinated by SRE and Incident Response teams. Act as a Incident Commander during incidents.
Participate in escalation and off-hours on-call schedule.
Adopt and embrace qualities of an SRE as defined in the team charter. Help set them for the rest of the team.
Mentor and train junior members of the team. Design training curriculum for the team.

What will you bring?

Minimum 7+ years industry experience.
BS in CS or equivalent combination of education and experience.
Strong experience operating Kubernetes in production environments – EKS Anywhere is preferred
Experience with middleware systems (Kafka, AMQ, Redis, Memcache, etcd)
Experience managing CI/CD systems (Flux, Concourse)
Experience deploying and/or operating Observability stack (Splunk, Datadog, Grafana)
Experience with large scale systems
Familiarity with working with PostgreSQL and MongoDB
Background working in a multi-platform environment (Linux, Windows)
Familiarity of programming/scripting languages (ie. Python, Bash, PowerShell, Go, etc.)
Familiarity with Agile/Scrum/Kanban methodologies
Strong interpersonal skills with a can-do attitude and sense of urgency for a high growth/fast paced environment
Curious mind, wanting to learn new technologies and share with others.
The ability to think outside of the box to resolve issues and create solutions