Senior Engineer, Cloud (Observability Lead)

Job Posted 12 Days Ago Reposted 12 Days Ago
Be an Early Applicant
Sunnyvale, CA
164K-189K
Senior level
Software • Cybersecurity
The Role
Lead observability initiatives within the engineering team to enhance operational reliability and visibility across production systems, collaborating closely with engineers on logging and metrics integration.
Summary Generated by Built In

Location: Onsite, Sunnyvale, California (5 days a week in the office)Onwards Together!

Illumio is the leader in ransomware and breach containment, redefining how organizations contain cyberattacks and enable operational resilience. Powered by the Illumio AI Security Graph, our breach containment platform identifies and contains threats across hybrid multi-cloud environments – stopping the spread of attacks before they become disasters.
Recognized as a Leader in the Forrester Wave™ for Microsegmentation, Illumio enables Zero Trust, strengthening cyber resilience for the infrastructure, systems, and organizations that keep the world running. 

Our Team's Vision:

Our Engineering team is driven by a culture that thrives on visionary leadership, autonomy, and ownership, creating a dynamic synergy that drives us forward in the ever-evolving landscape of cybersecurity. 

When you join our team, you become part of the leader in Zero Trust Segmentation. You'll work with a cutting-edge technology stack that spans operating systems, distributed applications, and immersive UI/visualization tools.  

We're shaping the future of cybersecurity. And together, we will continue to build world-class products—led by people with different perspectives, backgrounds, and a commitment to innovation in a time when the world faces its greatest cybersecurity threats in history. 

Your Impact: 

We are seeking a Senior Engineer for our Cloud team with a strong focus on observability to join our engineering team as the Observability Lead. In this role, you will champion initiatives to enhance our production systems the reliability, visibility, and operational readiness. of our production systems. You will collaborate closely with engineers to catalog services, improve logging practices, reduce log noise, and integrate additional metrics across all applications. Additionally, you will develop runbooks, build dashboards, and manage PagerDuty configurations and escalation workflows.

  • Serve as an advocate for observability practices within the engineering team, promoting operational best practices and reliability. 
  • Catalog all production services, documenting critical details for operational visibility and management. 
  • Collaborate with engineering teams to develop and implement a comprehensive observability plan, ensuring metrics are integrated into all services. 
  • Enhance logging practices where needed, reduce log noise, and ensure meaningful insights are captured. 
  • Add and refine metrics across applications to improve operational visibility and performance tracking. 
  • Develop detailed runbooks for critical alerts and incidents, facilitating efficient response processes. 
  • Build and maintain dashboards that offer insights into SLAs, performance, and business metrics for engineering and product teams. 
  • Set up and manage PagerDuty alerts, define on-call duties, and establish incident escalation paths. 
  • Continuously improve alerting, logging, and monitoring processes to enhance service reliability and reduce unnecessary noise. 

Your Toolkit:

  • Proven experience in a DevOps or observability-focused role, concentrating on production service management and operational excellence. 

  • Prior experience working with microservices in a production environment is a must. 

  • At least 5+ years of experience managing large numbers of instances in public clouds like AWS, Azure, GCP, etc. 

  • Strong expertise in observability practices and tools (e.g., Prometheus, Grafana, Datadog). 

  • Experience enhancing logging, reducing log noise, and integrating critical metrics into services. 

  • Proficiency in building and managing dashboards and monitoring tools. 

  • Expertise in setting up and managing PagerDuty alerts, with on-call rotation and escalation management knowledge. 

  • Strong collaboration skills to work closely with engineering teams, advocating for observability best practices. 

  • Familiarity with cloud platforms (AWS, GCP, Azure) and modern CI/CD processes. 

  • Automation scripting or coding experience (Python, Go, or similar). 

  • Knowledge of infrastructure-as-code tools (e.g., Terraform, CloudFormation). 

  • Excellent problem-solving skills and attention to detail in managing complex systems. 

Compensation:

$ 164,000 USD - $ 189,000 USD

The pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include responsibilities of the job, education, location, experience, knowledge, skills, abilities, and internal equity, alignment with market data, or applicable laws. 

At Illumio we offer a wide range of benefits to our eligible team members. Our benefit programs vary by location and can include Medical, Dental, Vision Coverage – Health and Dependent Savings Accounts – Life and Disability Programs – Paid Parental Leave – Voluntary Benefit Programs – Company Sponsored Wellness Program – Wellness Reimbursement Program - Retirement Savings – Equity Opportunities – Paid time off and Paid Holidays – Employee Incentive Program. #LI-KD1 #LI-ONSITE

Our Commitment: 

Illumio believes that an environment of unique backgrounds, experiences, viewpoints, and individual contributions drives our success and makes us stronger together. We are dedicated to creating and maintaining a diverse culture and emphasizing inclusion and belonging.   

All official job offers from our company are extended directly by our recruitment team and will be sent through an official DocuSign document for your review and signature. Please be aware that we do not ask for any personal information in the process of extending offers of employment, such as financial details or social security numbers. Upon acceptance of any offer, we will request such information as part of the onboarding process prior to or on your first day of employment, and only after completing a background check through an authorized third-party vendor. If you receive any communication asking for personal details outside of these processes, please contact us immediately to verify the authenticity of the request. Your security is important to us, and we are committed to a safe and transparent hiring experience. 

Top Skills

AWS
Azure
CloudFormation
Datadog
GCP
Go
Grafana
Prometheus
Python
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Sunnyvale, CA
552 Employees
On-site Workplace
Year Founded: 2013

What We Do

Illumio, the Zero Trust Segmentation company, prevents breaches from spreading and turning into cyber disasters. Illumio protects critical applications and valuable digital assets with proven segmentation technology purpose-built for the Zero Trust security model. Illumio ransomware mitigation and segmentation solutions see risk, isolate attacks, and secure data across cloud-native apps, hybrid and multi-clouds, data centers, and endpoints, enabling the world’s leading organizations to strengthen their cyber resiliency and reduce risk.  

Similar Jobs

PwC Logo PwC

Media & Entertainment Engineer - Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
Los Angeles, CA, USA
370000 Employees
100K-232K Annually

Magna International Logo Magna International

Senior Staff Systems Engineer - Thermal Camera

Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Hybrid
Goleta, CA, USA
171000 Employees
131K-196K Annually

Anduril Logo Anduril

Senior Hydrodynamics Engineer

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
Remote
Costa Mesa, CA, USA
4500 Employees
168K-252K Annually

Relativity Space Logo Relativity Space

Components Engineer II

3D Printing • Aerospace • Hardware • Robotics • Software • Manufacturing
Easy Apply
Hybrid
Long Beach, CA, USA
1300 Employees
112K-143K Annually

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account