Staff Site Reliability Engineer

Posted 20 Days Ago
Be an Early Applicant
Sunnyvale, CA
179K-215K Annually
Senior level
Software • Cybersecurity
The Role
The Staff Site Reliability Engineer will design, deploy, and maintain cloud infrastructure solutions, implement infrastructure as code, develop CI/CD pipelines, monitor system performance, support incident response, collaborate with development teams, implement security best practices, drive automation initiatives, evaluate emerging technologies, and support junior team members.
Summary Generated by Built In

No Agency Submissions Accepted.

Location: Onsite, Sunnyvale, California (5 days a week in the office)

In this role, you will drive containerization of our entire platform, update our existing tooling and automation, and define our digital transformation within this team.

As a Staff SRE, every day you will work on major initiatives with other engineers and accomplish operations tasks.

To thrive in this role, you must have a can-do attitude with a solution-oriented approach, and an excitement to solve challenging problems.

About the team:

The Cloud Operations team at Illumio is working to deploy and manage our SaaS services by reducing human error, aggressively focusing on automation, and providing deep insight into application behavior and health! We do that by incorporating aspects of software engineering and applying them to infrastructure and operations problems to create and manage scalable and reliable distributed software systems.

About the role:

We are looking for a backend/platform or SRE engineer with a demonstrated track record of building secure, large scale, highly available services using automation and Infrastructure as Code, who is well versed in cloud architecture (with a focus on Kubernetes), and loves to delight the engineers they support.

This engineer will be an essential member of our Operations team, collaborating with the Platform and Data engineers to deliver the latest Illumio products.

The Cloud Platform SRE Engineer will be responsible for designing and deploying scalable, reliable, and secure cloud infrastructure. This individual must have a thorough understanding and experience with AWS and/or Azure clouds. The platform will be based on Kubernetes and is built using cloud native technologies. The Cloud Platform SRE is responsible for building, operating, and maintaining this platform. They are responsible for defining and meeting Platform SLOs, capacity utilization, cost visibility, security compliance etc. They are highly critical to the success of the Multi-cloud Platform.

Key responsibilities:

  • Driving reliability improvements back into applications
  • Building code to resolve reliability/resiliency issues
  • Mentor and educate team members to aid in strengthening technical expertise
  • Collaborate closely with cloud architects to drive cloud solutions
  • Curating proper SLI/SLOs to accurately measure or assess error budgets
  • Embed with the development teams to assist with cloud methodologies when developing products to ensure that the deliverable is as reliable as possible
  • Work with development teams to build and strengthen application security and compliance
  • Manage high impact situations that involve technically challenging issues across diverse audiences and drive to find the root cause, mitigate, and identify a solution
  • Focus on observability

Who you are:

  • Bachelor's degree in Computer Science, Engineering, or related field; or equivalent work experience
  • 6+ years of relevant SRE, DevOps, Platform or Infrastructure Engineering experience.
  • 4+ years in production support role in a fast-paced industry/organization
  • Experience deploying, tuning, and maintaining Linux-based, highly available, fault-tolerant web platforms in public cloud providers such as AWS, Azure, and GCP
  • Common monitoring, log aggregation, and metrics gathering platforms experience (Icinga, Sensu, Splunk, Telegraf/InfluxDB, et. al.)
  • Configuration management & orchestration tools experience like Chef, Ansible, and AWS Services & APIs, or equivalent
  • Experience scripting/coding with Python, Java, Ruby and/or Go.
  • Experience with MySQL, PostgreSQL, Redis, or similar
  • Solid knowledge of Linux operating system, Ubuntu, RHEL, OEL7 is required
  • EKS and/or AKS frameworks
  • Knowledge/Experience of Incident Management/on-call: PagerDuty
  • Knowledge of Database Technologies, Release Management, REST, SRE, etc.
  • Load balancers/ Traffic manager knowledge
  • Experience working with Kubernetes, Docker, or other virtualization & containerization technologies
  • Networking basics and trouble shooting skills
  • Good understanding of Production deployment, Distributed Environments required
  • Strong problem solving and operational process skills, attention to detail
  • Application support and debugging experience in a dynamic fast-paced production environment 
  • Experience with SDLC principles, architecture and operations.
  • Experience working with senior leadership both inside and outside of engineering.
  • Ability to manage multiple tasks and competing priorities to deliver projects on schedule
  • Azure certifications such as Azure Administrator, Azure Developer, or AWS/GCP certifications are a plus

Who We Are

Illumio, the pioneer and market leader of Zero Trust segmentation, prevents breaches from becoming cyber disasters. Illumio protects critical applications and valuable digital assets with proven segmentation technology purpose-built for the Zero Trust security model. Illumio ransomware mitigation and segmentation solutions see risk, isolate attacks, and secure data across cloud-native apps, hybrid and multi-clouds, data centers, and endpoints, enabling the world’s leading organizations to strengthen their cyber resiliency and reduce risk. 

Illumio believes that an environment of unique backgrounds, experiences, viewpoints, and individual contributions drives our success and makes us stronger together. We are dedicated to creating and maintaining a diverse culture and emphasizing inclusion and belonging. #LI-KD1 #LI-ONSITE

Top Skills

AWS
Azure
GCP
The Company
Sunnyvale, CA
552 Employees
On-site Workplace
Year Founded: 2013

What We Do

Illumio, the Zero Trust Segmentation company, prevents breaches from spreading and turning into cyber disasters. Illumio protects critical applications and valuable digital assets with proven segmentation technology purpose-built for the Zero Trust security model. Illumio ransomware mitigation and segmentation solutions see risk, isolate attacks, and secure data across cloud-native apps, hybrid and multi-clouds, data centers, and endpoints, enabling the world’s leading organizations to strengthen their cyber resiliency and reduce risk.  

Similar Jobs

Crusoe Energy Systems Logo Crusoe Energy Systems

Senior/Staff Site Reliability Engineer

Cloud • Greentech • Other • Energy
Hybrid
San Francisco, CA, USA
450 Employees
180K-225K Annually

Crunchyroll Logo Crunchyroll

Staff Site Reliability Engineer - Data Engineering, Platform

Digital Media • eCommerce • Gaming • Mobile • News + Entertainment
Remote
San Francisco, CA, USA
1200 Employees
191K-239K Annually

Voltage Park Logo Voltage Park

Site Reliability Engineer

Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
Remote
San Francisco, CA, USA
51 Employees
140K-180K Annually

Alchemy Logo Alchemy

Site Reliability Engineer

Blockchain • Information Technology • Software • Cryptocurrency • Web3
Easy Apply
Hybrid
2 Locations
200 Employees

Similar Companies Hiring

Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account