Job Description: Manager – Monitoring Operations
Role Summary
The Manager – Monitoring Operations will lead and manage the enterprise monitoring operations team responsible for the availability, performance, and reliability of IT infrastructure and applications. This role will oversee the day-to-day operations of BMC Helix On-Premises Monitoring tool deployed on RedHat OCP (OpenShift Container Platform), Network and Device monitoring using ParkPlace Entuity, along with OS Monitoring using Prometheus-Grafana, ensuring a high service quality, operational excellence, and continuous improvement.
The role requires strong people management skills, deep technical expertise in systems monitoring platforms, and experience operating monitoring solutions in containerized environments.
Key Responsibilities
· Lead, mentor, and manage a team of monitoring engineers/analysts, defining goals, KPIs, shift coverage, and on-call rotations.
· Drive skill development through performance reviews, training initiatives, and continuous learning plans.
· Act as escalation point for major monitoring incidents and outages, guiding quick workarounds to prevent monitoring gaps and loss of metrics.
· Ensure operational excellence aligned with ITIL practices (Incident, Problem, Change) and adherence to security, compliance, and operational standards.
· Manage upgrades, patches, capacity planning, and health checks across the monitoring estate to maintain high availability and performance.
· Oversee the Server (Windows/Linux/AIX), Network, Database & Synthetic URL Monitoring for the Enterprise and for the Global clients’ private cloud.
· Collaborate with Container Platform, Core Infrastructure, and Network teams on platform stability, scaling, resilience, and resource allocation.
· Optimize alert quality, reduce alert fatigue, standardize dashboards/alerting frameworks, and deliver actionable insights.
· Maintain SOPs, runbooks, and operational documentation; provide regular reports on platform health, incidents, and SLA compliance.
· Serve as the primary stakeholder contact for all monitoring services.
· Conduct annual disaster-recovery (DR) tests for the monitoring estate to validate resilience, recovery procedures, and business continuity readiness.
Required Experience & Qualifications
Experience
· 10+ years of overall IT industry experience, including 5+ years in monitoring operations in medium-to-large organizations.
· Hands-on operational expertise with at least two of the following monitoring platforms/tools:
o BMC Helix Monitoring (SaaS or On-Prem)
o RedHat OpenShift Container Platform (OCP) or Kubernetes Cluster Management
o Prometheus, Exporters, OTEL Collectors, and Grafana
o ParkPlace Entuity Network and Hardware Monitoring
· Proven experience in monitoring architecture design, capacity planning, performance tuning, and integration with ITSM tools for automated ticketing workflows.
· Strong knowledge of ITIL processes and operational best practices.
Leadership & Soft Skills
· Strong people-management and leadership capabilities
· Excellent communication and stakeholder-management skills
· Ability to handle high-pressure situations and lead incident response
· Strategic mindset with a focus on operational maturity and optimization
Education & Certifications
· Bachelor’s degree in computer science, Information Technology, or equivalent
· Relevant certifications (preferred, not mandatory):
o RedHat OpenShift / Kubernetes
o BMC Helix
o Foundation certifications in ITIL and/or AI
Nice-to-Have
· Exposure to hybrid or multi-cloud environments
· Experience in Automation, Scripting, APIs and AI-driven service improvements
· Application Performance Monitoring (APM) experience
Skills Required
- 10+ years overall IT industry experience, including 5+ years in monitoring operations in medium-to-large organizations
- Hands-on operational expertise with at least two of: BMC Helix Monitoring, RedHat OpenShift (OCP)/Kubernetes, Prometheus/Exporters/OTEL Collectors/Grafana, ParkPlace Entuity
- Proven experience in monitoring architecture design, capacity planning, performance tuning, and integration with ITSM tools for automated ticketing
- Strong knowledge of ITIL processes and operational best practices (Incident, Problem, Change)
- Experience overseeing Server (Windows/Linux/AIX), Network, Database and Synthetic URL Monitoring
- Demonstrated people-management experience including defining KPIs, shift/on-call rotations, performance reviews, and training programs
- Bachelor's degree in Computer Science, Information Technology, or equivalent
- RedHat OpenShift/Kubernetes, BMC Helix, or ITIL foundation certifications
- Experience with automation, scripting, APIs and AI-driven service improvements
- Application Performance Monitoring (APM) and hybrid/multi-cloud exposure
Ensono Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Ensono and has not been reviewed or approved by Ensono.
-
Leave & Time Off Breadth — Time off provisions include unlimited PTO, paid volunteer time, and a formal sabbatical program. These offerings provide flexibility for rest, community service, and extended renewal.
-
Retirement Support — Retirement support includes a 401(k) with company match as part of the core package. This adds long‑term financial value alongside day‑one eligibility for other coverage.
-
Parental & Family Support — Family‑forming coverage and paid parental leave are explicitly included, with adoption and surrogacy reimbursement. These benefits support diverse paths to growing a family.
Ensono Insights
What We Do
Ensono helps IT leaders be the catalyst for change by harnessing the power of hybrid IT to transform their businesses. Our broad services portfolio from mainframe to cloud, powered by an intelligent governance platform, is designed to help our clients operate for today and optimize for tomorrow. We are award-winning certified experts in AWS & Azure
Why Work With Us
Our culture is collaborative & results-driven. Curiosity, passion, honesty & reliability are values we live by. Career & professional development is encouraged through promotions, learning opportunities, Ensono University - eTalks, training academies, paid tuition and study leave, quarterly Innovator Awards. Thinking Thursdays (no meetings 8 to 12)
Gallery







