Lead - Site Reliability Engineer

Reposted 5 Days Ago
Be an Early Applicant
Bangalore, Bengaluru Urban, Karnataka
In-Office
Senior level
Industrial
The Role
Lead Site Reliability Engineer responsible for managing SRE practices, incident response, and automation using container orchestration and cloud services. Mentor team, ensure operational performance and support quality service delivery.
Summary Generated by Built In

ZEISS in India

ZEISS in India is headquartered in Bengaluru and present in the fields of Industrial Quality Solutions, Research Microscopy Solutions, Medical Technology, Vision Care and Sports & Cine Optics.

ZEISS India has 3 production facilities, R&D center, Global IT services and about 40 Sales & Service offices in almost all Tier I and Tier II cities in India. With 2200+ employees and continued investments over 25 years in India, ZEISS’ success story in India is continuing at a rapid pace. 

Further information at ZEISS India.

8-12 years of relevant industry experience. • Minimum of 3 years as a Site Reliability Engineering Lead.  • Minimum of 5 years’ experience as a Site Reliability Engineer • Minimum of 8 years’ experience with cloud computing platforms like Azure and related services. • In-depth knowledge of system architecture, networking, and microservice based distributed systems. • Expertise in designing and implementing reliable, scalable, and fault-tolerant systems using container Orchestration Technologies like Docker and Kubernetes.  • Proficiency in setting up and managing monitoring, alerting, and logging systems for early detection and resolution of issues for container orchestrators like Kubernetes using Tools like Prometheus, Grafana, Open Telemetry Collector or similar tools. • Hands-on experience in incident management, including incident response, troubleshooting, and post-mortem analysis. • Proficiency in coding/scripting languages commonly used in infrastructure automation and monitoring (such as Terraform). • Knowledge of best practices in disaster recovery planning and execution for cloud based Systems. • Ability to lead and mentor a team of SREs, providing guidance, support, and coaching.  • Capability to advocate for SRE best practices and principles within the organization and drive cultural changes as needed. • Willingness to stay updated with the latest trends, tools, and technologies in the field of site reliability engineering. • Strong communication skills to effectively collaborate with cross-functional teams, including Software Developers, Product Owners, and Cloud Platform Engineers.  

degradation. • Enable the development team to bring new software or new features (Digital Offering) to production as quickly as possible, while also ensuring an agreed-upon acceptable level of IT operations performance and error risk in line with the service level agreements (SLAs) agreed. • Closely cooperate with different Product Owners, Site Reliability Engineers, and the Cloud Platform Teams to define processes to migrate between different Cloud Platforms while ensuring reliability for business offerings. • Work with multiple Site Reliability Engineers to for operations and system administration tasks - analyzing logs, performance tuning, applying patches, testing production environments, identify opportunities and drive the design and implementation of end-to-end observability, alerting, self-healing and automation capabilities to improve service health, manageability, and reliability. • Work with different stake holders (POs, SRES and Platform Team) to define Incident Management Process as required for responding to incidents, drive postmortems reviews for improving the service quality. • Closely work with Dev and SRE team to select appropriate metrics related to observability and reliability as well as defining SLIs and SLOs •  Define and drive observability for self-developed software and the managed cloud components by collecting appropriate observability data for insights and alerting including setting up proper alerting for critical components. • Ensure availability and responsiveness of application by setting up and maintaining the required documentation method and tools. Building Playbooks for troubleshooting techniques to effectively identify and investigate issues that can be used by SREs. • Handle resolution of blockers, escalation to stakeholders, and provisioning of resources. • Own availability, performance, and supportability targets for the service. • Author functional and technical documentation and remain current on relevant technologies and 

Your ZEISS Recruiting Team:

Saptarshi Chowdhury, Upasana Sinal

Top Skills

Azure
Docker
Grafana
Kubernetes
Open Telemetry Collector
Prometheus
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Oberkochen
20,567 Employees

What We Do

ZEISS is an internationally leading technology enterprise operating in the fields of optics and optoelectronics. In the previous fiscal year, the ZEISS Group generated annual revenue totaling 10 billion euros in its four segments Semiconductor Manufacturing Technology, Industrial Quality & Research, Medical Technology and Consumer Markets (status: 30 September 2023).

With around 43,000 employees, ZEISS is active globally in almost 50 countries with around 30 production sites, 60 sales and service companies and 27 research and development facilities (status: 30 September 2023). Founded in 1846 in Jena, the company is headquartered in Oberkochen, Germany. The Carl Zeiss Foundation, one of the largest foundations in Germany committed to the promotion of science, is the sole owner of the holding company, Carl Zeiss AG.

Data privacy: www.zeiss.com/data-protection
Imprint: http://zeiss.com/publisher

This is ZEISS's official LinkedIn account. It follows the ZEISS Netiquette: www.zeiss.com/netiquette

Similar Jobs

Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
289097 Employees
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
289097 Employees

Iron Mountain Logo Iron Mountain

Site Reliability Engineer

Big Data • Cloud • Information Technology
In-Office or Remote
2 Locations
32000 Employees
Remote or Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
1100 Employees

Similar Companies Hiring

The HEICO Companies, LLC Thumbnail
Manufacturing • Industrial • Angel or VC Firm
Warrenville, IL
9000 Employees
WorkWhile Thumbnail
Software • Machine Learning • Industrial • Information Technology • HR Tech • Artificial Intelligence • App development
San Francisco, CA
90 Employees
Arch Systems Inc. Thumbnail
Software • Manufacturing • Machine Learning • Internet of Things • Industrial • Artificial Intelligence • Analytics
US
85 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account