Sr. Manager, Site Reliability & Innovation, IT

Posted 6 Days Ago
Be an Early Applicant
Hiring Remotely in Hong Kong
Remote
Senior level
Fintech • Payments • Financial Services
The Role
Lead SRE responsibilities for monitoring, Kubernetes platform reliability, observability pipelines, alerting and dashboards. Troubleshoot RHEL systems, manage Prometheus/Victoria-Metrics and Logstash/Elasticsearch pipelines, collaborate across teams, participate in on-call rotations, and drive adoption of modern monitoring and SRE practices.
Summary Generated by Built In

Position Description
We are seeking a Senior Site Reliability Engineer who will be responsible for both build and shared services operations, including monitoring, site reliability engineering (SRE), and ensuring the stability, scalability, and performance of critical systems.
The ideal candidate is a strong technical problem-solver and capable of delivering end-to-end monitoring and reliability solutions while diagnosing complex issues during critical incidents.

Key Areas of Responsibilities

  • Own monitoring, Kubernetes platform reliability, and SRE operations to ensure highly reliable, available, and performant systems

  • Build, enhance, and maintain monitoring solutions using ITRS Geneos, Prometheus, Victoria-Metrics, Elasticsearch, and Grafana

  • Develop, optimize, and maintain alerting rules, dashboards, and observability pipelines

  • Troubleshoot Linux servers (RHEL 7/8/9), including upgrades, configurations, patching, and maintenance, while determining appropriate monitoring requirements for system changes

  • Analyze logs, investigate issues, and perform fault finding to identify performance exceptions

  • Collaborate with engineering, application, and infrastructure teams to improve system resilience, stability, security, efficiency, and scalability.

  • Operate, maintain, and optimize Kubernetes environments, including cluster health, workload reliability, capacity planning, and platform observability

  • Continuously research and adopt modern monitoring and SRE tools and practices.

Requirements

  • Bachelor’s degree or higher in Computer Science / Engineering

  • Around 8-10 years of experience within IT, preferably in site reliability engineering, production support, platform engineering, or investment banking environments

  • Strong experience configuring and maintaining monitoring and observability platforms, including:
    ITRS Geneos, Prometheus, Victoriametrics, Elasticsearch, Grafana, and Kibana

  • Experience with automation (e.g., Bash, Python, Ansible, CI/CD tools) is a must

  • Hands-on experience building and implementing Prometheus pipelines, including exporters, scraping configurations, relabelling, metric routing, and integrations with long-term storage (e.g., Victoriametrics)

  • Experience building and maintaining Logstash pipelines, including ingestion, parsing, filtering, enrichment, and routing of logs into Elasticsearch

  • Ability to design, build, and maintain Grafana and Kibana dashboards for metrics, logs, and performance analytics across distributed systems

  • Understanding of metrics, logging, alerting, dashboards, and observability pipelines

  • Strong Linux administration skills (RHEL 7/8/9), including troubleshooting, upgrades, configuration, patching, and performance optimization.

  • Good understanding of SRE principles, high availability, scalability, incident management and Disaster Recovery / Business Continuity Planning) activities

  • Experience managing GPU-enabled infrastructure for AI or machine learning platforms is preferred.

  • Strong hands-on experience with Kubernetes, including cluster operations, workload orchestration, troubleshooting, scaling, and production support

  • Understanding of networking fundamentals, performance tuning, and troubleshooting distributed systems

  • Operations with participation in on-call rotations, including after-hours and weekend support

  • Self-motivated, adaptable and able to prioritize, learn continuously and manage multiple responsibilities effectively

  • Excellent in English, with Chinese will be advantage

Stay informed on CITIC CLSA Job Opportunities

Not the right fit? You can create a job alert to receive our latest job openings that meet your interest.

Skills Required

  • Bachelor's degree in Computer Science or Engineering
  • 8-10 years of IT experience in site reliability, production support, platform engineering, or investment banking environments
  • Experience configuring and maintaining ITRS Geneos, Prometheus, Victoria-Metrics, Elasticsearch, Grafana, and Kibana
  • Hands-on experience building and implementing Prometheus pipelines, exporters, scraping configs, relabeling, metric routing, and long-term storage integrations
  • Experience building and maintaining Logstash pipelines for ingestion, parsing, filtering, enrichment and routing into Elasticsearch
  • Strong Linux administration skills (RHEL 7/8/9): troubleshooting, upgrades, configuration, patching, and performance optimization
  • Strong hands-on experience with Kubernetes cluster operations, workload orchestration, scaling, and production support
  • Experience with automation using Bash, Python, Ansible and CI/CD tools
  • Ability to design, build, and maintain Grafana and Kibana dashboards for metrics, logs, and performance analytics
  • Understanding of metrics, logging, alerting, dashboards, observability pipelines, SRE principles, HA, scalability, incident management, and DR/BCP
  • Participation in on-call rotations including after-hours and weekend support
  • Experience managing GPU-enabled infrastructure for AI/ML platforms
  • Chinese language ability

CLSA Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about CLSA and has not been reviewed or approved by CLSA.

  • Pay Growth & Progression Base pay for junior bankers was increased significantly in 2021 to stay competitive in a hot market. This indicates willingness to adjust compensation when talent risks rise.
  • Healthcare Strength Permanent staff are automatically enrolled in healthcare aligned to local markets, with added travel health and security support via International SOS. This points to solid core medical coverage with global-travel assistance.
  • Retirement Support Permanent staff are automatically enrolled in pension plans aligned to local markets. A group retirement plan is administered regionally, signaling formalized retirement benefits infrastructure.

CLSA Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
London,
2,160 Employees
Year Founded: 1986

What We Do

CITIC CLSA is a wholly-owned subsidiary of CITIC Securities and its overseas business platform. Established in Hong Kong in 1986, CITIC CLSA is Asia’s leading capital markets and investment group, committed to driving the growth strategies of global institutional investors, corporations, governments and high-net-worth individuals. CITIC CLSA’s award-winning research, extensive Asia network, direct links to China and highly experienced financial professionals set CITIC CLSA apart from global investment banks and regional players. Over three decades, CITIC CLSA has built an extensive Asia network with deep local knowledge and connections. Globally, we operate from 13 countries across Asia, Australia, Europe and the Americas. For further information, please visit clsa.com

Similar Jobs

TransUnion Logo TransUnion

Senior Consultant

Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
Remote or Hybrid
Hong Kong
13000 Employees

BlackRock Logo BlackRock

Portfolio Management, Associate/ Vice President

Fintech • Information Technology • Financial Services
Remote
Hong Kong
25000 Employees

Airwallex Logo Airwallex

Account Manager

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Remote
HK
2200 Employees

Airwallex Logo Airwallex

Associate GTM Partnerships Manager

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Remote
HK
2200 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account