Manager of Monitoring Operations

Posted An Hour Ago
Easy Apply
Be an Early Applicant
Chennai, Tamil Nadu, IND
Hybrid
Expert/Leader
Cloud • Information Technology
Support mission critical workloads, modernize IT infrastructure and reduce total cost of ownership.
The Role
Lead and manage the enterprise monitoring operations team to ensure availability, performance, and reliability of infrastructure and applications. Oversee BMC Helix, OpenShift, Prometheus/Grafana, and Entuity monitoring; manage upgrades, capacity, alerting quality, SOPs, DR tests, incident escalations, ITIL alignment, and stakeholder reporting.
Summary Generated by Built In

Job Description: Manager – Monitoring Operations

Role Summary

The Manager – Monitoring Operations will lead and manage the enterprise monitoring operations team responsible for the availability, performance, and reliability of IT infrastructure and applications. This role will oversee the day-to-day operations of BMC Helix On-Premises Monitoring tool deployed on RedHat OCP (OpenShift Container Platform), Network and Device monitoring using ParkPlace Entuity, along with OS Monitoring using Prometheus-Grafana, ensuring a high service quality, operational excellence, and continuous improvement.

The role requires strong people management skills, deep technical expertise in systems monitoring platforms, and experience operating monitoring solutions in containerized environments.

Key Responsibilities

· Lead, mentor, and manage a team of monitoring engineers/analysts, defining goals, KPIs, shift coverage, and on-call rotations.

· Drive skill development through performance reviews, training initiatives, and continuous learning plans.

· Act as escalation point for major monitoring incidents and outages, guiding quick workarounds to prevent monitoring gaps and loss of metrics.

· Ensure operational excellence aligned with ITIL practices (Incident, Problem, Change) and adherence to security, compliance, and operational standards.

· Manage upgrades, patches, capacity planning, and health checks across the monitoring estate to maintain high availability and performance.

· Oversee the Server (Windows/Linux/AIX), Network, Database & Synthetic URL Monitoring for the Enterprise and for the Global clients’ private cloud.

· Collaborate with Container Platform, Core Infrastructure, and Network teams on platform stability, scaling, resilience, and resource allocation.

· Optimize alert quality, reduce alert fatigue, standardize dashboards/alerting frameworks, and deliver actionable insights.

· Maintain SOPs, runbooks, and operational documentation; provide regular reports on platform health, incidents, and SLA compliance.

· Serve as the primary stakeholder contact for all monitoring services.

· Conduct annual disaster-recovery (DR) tests for the monitoring estate to validate resilience, recovery procedures, and business continuity readiness.




Required Experience & Qualifications

Experience

· 10+ years of overall IT industry experience, including 5+ years in monitoring operations in medium-to-large organizations.

· Hands-on operational expertise with at least two of the following monitoring platforms/tools:

o BMC Helix Monitoring (SaaS or On-Prem)

o RedHat OpenShift Container Platform (OCP) or Kubernetes Cluster Management

o Prometheus, Exporters, OTEL Collectors, and Grafana

o ParkPlace Entuity Network and Hardware Monitoring

· Proven experience in monitoring architecture design, capacity planning, performance tuning, and integration with ITSM tools for automated ticketing workflows.

· Strong knowledge of ITIL processes and operational best practices.

Leadership & Soft Skills

· Strong people-management and leadership capabilities

· Excellent communication and stakeholder-management skills

· Ability to handle high-pressure situations and lead incident response

· Strategic mindset with a focus on operational maturity and optimization

Education & Certifications

· Bachelor’s degree in computer science, Information Technology, or equivalent

· Relevant certifications (preferred, not mandatory):

o RedHat OpenShift / Kubernetes

o BMC Helix

o Foundation certifications in ITIL and/or AI

Nice-to-Have

· Exposure to hybrid or multi-cloud environments

· Experience in Automation, Scripting, APIs and AI-driven service improvements

· Application Performance Monitoring (APM) experience

Skills Required

  • 10+ years overall IT industry experience, including 5+ years in monitoring operations in medium-to-large organizations
  • Hands-on operational expertise with at least two of: BMC Helix Monitoring, RedHat OpenShift (OCP)/Kubernetes, Prometheus/Exporters/OTEL Collectors/Grafana, ParkPlace Entuity
  • Proven experience in monitoring architecture design, capacity planning, performance tuning, and integration with ITSM tools for automated ticketing
  • Strong knowledge of ITIL processes and operational best practices (Incident, Problem, Change)
  • Experience overseeing Server (Windows/Linux/AIX), Network, Database and Synthetic URL Monitoring
  • Demonstrated people-management experience including defining KPIs, shift/on-call rotations, performance reviews, and training programs
  • Bachelor's degree in Computer Science, Information Technology, or equivalent
  • RedHat OpenShift/Kubernetes, BMC Helix, or ITIL foundation certifications
  • Experience with automation, scripting, APIs and AI-driven service improvements
  • Application Performance Monitoring (APM) and hybrid/multi-cloud exposure

Ensono Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Ensono and has not been reviewed or approved by Ensono.

  • Leave & Time Off Breadth Time off provisions include unlimited PTO, paid volunteer time, and a formal sabbatical program. These offerings provide flexibility for rest, community service, and extended renewal.
  • Retirement Support Retirement support includes a 401(k) with company match as part of the core package. This adds long‑term financial value alongside day‑one eligibility for other coverage.
  • Parental & Family Support Family‑forming coverage and paid parental leave are explicitly included, with adoption and surrogacy reimbursement. These benefits support diverse paths to growing a family.

Ensono Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Downers Grove, IL
3,000 Employees
Year Founded: 2015

What We Do

Ensono helps IT leaders be the catalyst for change by harnessing the power of hybrid IT to transform their businesses. Our broad services portfolio from mainframe to cloud, powered by an intelligent governance platform, is designed to help our clients operate for today and optimize for tomorrow. We are award-winning certified experts in AWS & Azure

Why Work With Us

Our culture is collaborative & results-driven. Curiosity, passion, honesty & reliability are values we live by. Career & professional development is encouraged through promotions, learning opportunities, Ensono University - eTalks, training academies, paid tuition and study leave, quarterly Innovator Awards. Thinking Thursdays (no meetings 8 to 12)

Gallery

Gallery

Similar Jobs

TransUnion Logo TransUnion

P02 Service Desk (ASC) Engineer (India)

Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
Hybrid
Chennai, Tamil Nadu, IND
13000 Employees

Comcast Logo Comcast

Engineer 2, Software Development & Engineering (NBCU-ENT-MS-N010)

Digital Media • Information Technology • News + Entertainment
Hybrid
Chennai, Tamil Nadu, IND
115000 Employees

Comcast Logo Comcast

Engineer 1 - DevOps

Digital Media • Information Technology • News + Entertainment
Hybrid
Chennai, Tamil Nadu, IND
115000 Employees

Comcast Logo Comcast

Development Engineer 2

Digital Media • Information Technology • News + Entertainment
Hybrid
Chennai, Tamil Nadu, IND
115000 Employees

Similar Companies Hiring

Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account