A.P. Moller - Maersk

Site Reliability Engineer

Posted Yesterday

Be an Early Applicant

560064, Yelahanka, Karnataka, IND

In-Office

Senior level

Logistics • Transportation

The Role

Lead and implement SRE practices across platforms to ensure reliability, availability, and performance. Define SLOs/SLIs/SLAs, manage OS, applications, databases, middleware, and optimize Azure cloud and Kubernetes. Implement observability with Prometheus/Grafana and APM, develop automation and AIOps initiatives, create scripts/tools to reduce manual work, and provide technical leadership and cross-functional collaboration.

Summary Generated by Built In

About the Role:

We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) specializing in Operating Systems (OS), Applications, Databases, and Middleware to join Maersk. This role requires deep expertise in implementing SRE practices and driving engagements, with a strong focus on observability, automation, performance through open-source monitoring tools and problem solving techniques. The ideal candidate will have substantial experience with VMWare, Azure Cloud, DC components like compute, storage, network, Java/JVM, Dockers Kubernetes, alongside good coding skills. Knowledge on Automation, Networking & AIOps will be an added advantage.

Key Responsibilities:

SRE Practices Implementation:

Lead the establishment and implementation of SRE practices across multiple platforms within Maersk to ensure the reliability, availability, and performance of critical systems and services.
Define and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).

System and Application Management:

Oversee the administration, maintenance, and optimization of operating systems, applications, databases, and middleware.
Collaborate closely with development and operations teams to integrate and optimize applications for performance and reliability.

Cloud and Containerization:

Manage and optimize Azure Cloud infrastructure for scalability, performance, and cost-efficiency.
Deploy, manage, and monitor Kubernetes clusters to ensure high availability and resilience.

Observability and Monitoring:

Implement and manage observability solutions using open-source tools such as Prometheus and Grafana.
Develop and maintain robust monitoring, logging, and alerting systems to proactively identify and resolve issues.
Utilize Application Performance Management (APM) tools and practices to continuously monitor and enhance application performance.

AIOps and Automation:

Drive AIOps initiatives to streamline operations and enhance efficiency through automation and machine learning.
Develop automation scripts and tools to reduce manual intervention and ensure consistent deployment and management practices.

Collaboration and Leadership:

Lead cross-functional engagements with development, operations, and business teams to foster a culture of reliability and continuous improvement.
Provide technical leadership, guidance, and mentorship to team members and stakeholders.

Qualifications:

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
Proven experience as a Site Reliability Engineer or in a similar role.
Deep knowledge of operating systems (Linux/Windows), application management, databases (SQL/NoSQL), and middleware.
Strong expertise in Hybrid Cloud and Kubernetes orchestration.
Hands-on experience with observability tools like Prometheus, Grafana, and APM solutions.
Proficiency in coding/scripting languages such as Python, Go, Shell, etc.
Familiarity with AIOps practices and tools.
Excellent problem-solving skills and a proactive approach to addressing issues.
Strong communication and collaboration skills.

Preferred Skills:

Certification in Hybrid Cloud, VMWare or Kubernetes.
Experience with other cloud platforms (AWS, Google Cloud).
Familiarity with CI/CD pipelines and DevOps practices.
Experience with Infrastructure as Code (IaC) tools like Terraform or Ansible.
Knowledge of security best practices and compliance standards.

Why Join Maersk:

Opportunity to work with one of the world’s largest integrated logistics companies, leveraging cutting-edge technologies in a dynamic and innovative environment.
Collaborative and inclusive company culture that values diversity and teamwork.
Professional development opportunities and continuous learning initiatives.
Competitive salary and comprehensive benefits package.

Maersk is committed to a diverse and inclusive workplace, and we embrace different styles of thinking. Maersk is an equal opportunities employer and welcomes applicants without regard to race, colour, gender, sex, age, religion, creed, national origin, ancestry, citizenship, marital status, sexual orientation, physical or mental disability, medical condition, pregnancy or parental leave, veteran status, gender identity, genetic information, or any other characteristic protected by applicable law. We will consider qualified applicants with criminal histories in a manner consistent with all legal requirements.

We are happy to support your need for any adjustments during the application and hiring process. If you need special assistance or an accommodation to use our website, apply for a position, or to perform a job, please contact us by emailing [email protected].

Skills Required

Bachelor's degree in Computer Science, Engineering, or equivalent experience
Proven experience as a Site Reliability Engineer or similar role
Deep knowledge of operating systems (Linux, Windows)
Experience administering and optimizing applications, databases (SQL/NoSQL), and middleware
Strong expertise in Hybrid Cloud and Kubernetes orchestration
Hands-on experience with Azure Cloud and VMWare, data center components (compute, storage, network)
Experience with containerization (Docker) and Kubernetes cluster deployment/management
Experience with observability tools such as Prometheus and Grafana and APM solutions
Proficiency in coding/scripting languages (Python, Go, Shell)
Strong problem-solving, communication, and collaboration skills
Familiarity with AIOps practices and tools
Certification in Hybrid Cloud, VMWare, or Kubernetes
Experience with AWS or Google Cloud
Familiarity with CI/CD pipelines and DevOps practices
Experience with Infrastructure as Code tools like Terraform or Ansible
Knowledge of security best practices and compliance standards

View all jobs at A.P. Moller - Maersk

View A.P. Moller - Maersk Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

Capital Region

58,338 Employees

What We Do

A.P. Moller - Maersk is an integrated transport and logistics company; going all the way, together, for our customers and society. ALL THE WAY is our commitment to connect the world so that everyone has both the possibility and the ability to trade, grow and thrive. The company employs roughly 110.000 employees across operations in 130 countries.