Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
560064, Yelahanka, Karnataka, IND
In-Office
Senior level
Logistics • Transportation
The Role
Lead and implement SRE practices across platforms to ensure reliability, availability, and performance. Define SLOs/SLIs/SLAs, manage OS, applications, databases, middleware, and optimize Azure cloud and Kubernetes. Implement observability with Prometheus/Grafana and APM, develop automation and AIOps initiatives, create scripts/tools to reduce manual work, and provide technical leadership and cross-functional collaboration.
Summary Generated by Built In

About the Role:

We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) specializing in Operating Systems (OS), Applications, Databases, and Middleware to join Maersk. This role requires deep expertise in implementing SRE practices and driving engagements, with a strong focus on observability, automation, performance through open-source monitoring tools and problem solving techniques. The ideal candidate will have substantial experience with VMWare, Azure Cloud, DC components like compute, storage, network, Java/JVM, Dockers Kubernetes, alongside good coding skills. Knowledge on Automation, Networking & AIOps will be an added advantage.

Key Responsibilities:

SRE Practices Implementation:

  • Lead the establishment and implementation of SRE practices across multiple platforms within Maersk to ensure the reliability, availability, and performance of critical systems and services.

  • Define and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).

System and Application Management:

  • Oversee the administration, maintenance, and optimization of operating systems, applications, databases, and middleware.

  • Collaborate closely with development and operations teams to integrate and optimize applications for performance and reliability.

Cloud and Containerization:

  • Manage and optimize Azure Cloud infrastructure for scalability, performance, and cost-efficiency.

  • Deploy, manage, and monitor Kubernetes clusters to ensure high availability and resilience.

Observability and Monitoring:

  • Implement and manage observability solutions using open-source tools such as Prometheus and Grafana.

  • Develop and maintain robust monitoring, logging, and alerting systems to proactively identify and resolve issues.

  • Utilize Application Performance Management (APM) tools and practices to continuously monitor and enhance application performance.

AIOps and Automation:

  • Drive AIOps initiatives to streamline operations and enhance efficiency through automation and machine learning.

  • Develop automation scripts and tools to reduce manual intervention and ensure consistent deployment and management practices.

Collaboration and Leadership:

  • Lead cross-functional engagements with development, operations, and business teams to foster a culture of reliability and continuous improvement.

  • Provide technical leadership, guidance, and mentorship to team members and stakeholders.

Qualifications:

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.

  • Proven experience as a Site Reliability Engineer or in a similar role.

  • Deep knowledge of operating systems (Linux/Windows), application management, databases (SQL/NoSQL), and middleware.

  • Strong expertise in Hybrid Cloud and Kubernetes orchestration.

  • Hands-on experience with observability tools like Prometheus, Grafana, and APM solutions.

  • Proficiency in coding/scripting languages such as Python, Go, Shell, etc.

  • Familiarity with AIOps practices and tools.

  • Excellent problem-solving skills and a proactive approach to addressing issues.

  • Strong communication and collaboration skills.

Preferred Skills:

  • Certification in Hybrid Cloud, VMWare or Kubernetes.

  • Experience with other cloud platforms (AWS, Google Cloud).

  • Familiarity with CI/CD pipelines and DevOps practices.

  • Experience with Infrastructure as Code (IaC) tools like Terraform or Ansible.

  • Knowledge of security best practices and compliance standards.

Why Join Maersk:

  • Opportunity to work with one of the world’s largest integrated logistics companies, leveraging cutting-edge technologies in a dynamic and innovative environment.

  • Collaborative and inclusive company culture that values diversity and teamwork.

  • Professional development opportunities and continuous learning initiatives.

  • Competitive salary and comprehensive benefits package.

Maersk is committed to a diverse and inclusive workplace, and we embrace different styles of thinking. Maersk is an equal opportunities employer and welcomes applicants without regard to race, colour, gender, sex, age, religion, creed, national origin, ancestry, citizenship, marital status, sexual orientation, physical or mental disability, medical condition, pregnancy or parental leave, veteran status, gender identity, genetic information, or any other characteristic protected by applicable law. We will consider qualified applicants with criminal histories in a manner consistent with all legal requirements.

 

We are happy to support your need for any adjustments during the application and hiring process. If you need special assistance or an accommodation to use our website, apply for a position, or to perform a job, please contact us by emailing  [email protected]

Skills Required

  • Bachelor's degree in Computer Science, Engineering, or equivalent experience
  • Proven experience as a Site Reliability Engineer or similar role
  • Deep knowledge of operating systems (Linux, Windows)
  • Experience administering and optimizing applications, databases (SQL/NoSQL), and middleware
  • Strong expertise in Hybrid Cloud and Kubernetes orchestration
  • Hands-on experience with Azure Cloud and VMWare, data center components (compute, storage, network)
  • Experience with containerization (Docker) and Kubernetes cluster deployment/management
  • Experience with observability tools such as Prometheus and Grafana and APM solutions
  • Proficiency in coding/scripting languages (Python, Go, Shell)
  • Strong problem-solving, communication, and collaboration skills
  • Familiarity with AIOps practices and tools
  • Certification in Hybrid Cloud, VMWare, or Kubernetes
  • Experience with AWS or Google Cloud
  • Familiarity with CI/CD pipelines and DevOps practices
  • Experience with Infrastructure as Code tools like Terraform or Ansible
  • Knowledge of security best practices and compliance standards
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Capital Region
58,338 Employees

What We Do

A.P. Moller - Maersk is an integrated transport and logistics company; going all the way, together, for our customers and society. ALL THE WAY is our commitment to connect the world so that everyone has both the possibility and the ability to trade, grow and thrive. The company employs roughly 110.000 employees across operations in 130 countries.

Similar Jobs

CrowdStrike Logo CrowdStrike

Site Reliability Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
10000 Employees

SentiLink Logo SentiLink

Senior Software Engineer

Fintech • Information Technology • Software
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
170 Employees

Skillz Logo Skillz

Site Reliability Engineer

Gaming • Mobile • Esports
In-Office
Bengaluru, Bengaluru Urban, Karnataka, IND
445 Employees

Weekday, Inc. Logo Weekday, Inc.

Site Reliability Engineer

Artificial Intelligence • HR Tech • Professional Services • Software
In-Office
Bengaluru, Bengaluru Urban, Karnataka, IND

Similar Companies Hiring

Blissway Thumbnail
Computer Vision • Fintech • Hardware • Internet of Things • Machine Learning • Software • Transportation
Denver, Colorado
24 Employees
Toro TMS Thumbnail
Cloud • Enterprise Web • Sales • Software • Transportation
Chicago, IL
80 Employees
Axle Health Thumbnail
Artificial Intelligence • Healthtech • Information Technology • Logistics
Santa Monica, CA
22 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account