Infrastructure Reliability Engineering, Senior Manager

Reposted 7 Days Ago
Be an Early Applicant
London, Greater London, England
In-Office
Expert/Leader
Fintech • Financial Services
The Role
Lead Infrastructure Reliability Engineering, ensuring resilience and operational readiness while defining standards and managing a high-performing team. Drive transformation and risk reduction for trading-critical platforms.
Summary Generated by Built In
Infrastructure Reliability Engineering, Senior Manager

Shift Pattern:

Standard 40 Hour Week (United Kingdom)

Scheduled Weekly Hours:

40

Corporate Grade:

C - Vice President

Reporting Line:

(UK Division) Information Technology

Location:

UK-London

Worker Type:

Permanent

About the London Metal Exchange and LME Clear: 

 

The London Metal Exchange is the world centre for industrial metals trading. Most of the world’s global non-ferrous futures business is conducted on the LME’s three trading platforms totalling $18 trillion, 178 million lots and 4 billion tonnes with a market open interest high of 1.8 million lots in 2024. All trades are cleared and settled by LME Clear. 

 

Participants can transfer or take on price risk against aluminium, copper, nickel, tin, zinc, lead, molybdenum, cobalt, lithium, steel scrap, rebar and hot-rolled coil as well as alumina, aluminium premiums and alloys. 

 
The LME and LME Clear are HKEX Group companies. 

 

www.lme.com  

 

Overall Purpose of Role: 

 

This role is accountable for initially establishing then maturing a best of breed Infrastructure Reliability Engineering (IRE) function, embedding reliability engineering as a core discipline across the technology lifecycle, from design through live operation, in support of trading critical and regulatory significant services. 

 

To provide senior leadership across Infrastructure Reliability Engineering, accountable for the resilience, availability, and operational readiness of the LME Group technology estate. Lead the design and delivery of complex infrastructure transformation, platform modernisation, and re-architecture initiatives, ensuring secure, compliant, and highly reliable services that support trading critical operations and regulatory obligations. 

Responsibilities: 

 

Establish, mature, and continuously evolve the Infrastructure Reliability Engineering function, defining the IRE operating model, engagement patterns, and service boundaries across infrastructure, architecture, operations, security, and application teams. 

 

Set, maintain, and enforce consistent reliability engineering standards, patterns, and tooling across the infrastructure estate, balancing resilience, regulatory assurance, and operational efficiency. 

 

Act as senior Infrastructure Reliability Engineering SME across major programmes endtoend (discovery, dependency mapping, design, planning, build, cutover, fallback), with direct accountability for service stability and risk reduction for tradingcritical platforms.  

 

Drive a proactive reliability and failure engineering culture, including structured risk identification, resilience testing, failover validation, and scenario based exercises for trading critical and systemically important services. 

 

Act as the accountable owner for Infrastructure Operational Readiness, ensuring platforms and services do not transition into live operation without meeting mandated readiness, observability, recoverability, and supportability criteria. 

 

Define and embed a consistent reliability measurement framework across infrastructure platforms, including service level indicators, objectives, and leading indicators of operational risk, enabling data driven prioritisation and informed investment decisions. 

 

Build, lead, and develop a high performing Infrastructure Reliability Engineering team, defining clear role expectations, capability standards, and development pathways. 

Foster a culture of engineering excellence, shared ownership, and continuous improvement, ensuring operational knowledge and resilience capability are institutionalised and not dependent on individuals. 

 

Act as a senior authority on infrastructure resilience and operational risk, influencing strategic decisions, architectural direction, and investment priorities to ensure reliability is designed in, not retrofitted. 

 

Own measurable infrastructure reliability outcomes, including availability, resilience, recovery performance, and operational risk reduction, with regular executive level reporting against agreed targets. 

 

Own and enforce reliability governance, including stage gates, design authorities, risk and issue management, CAB/change control, and auditable documentation aligned to ITSM, IBS, and regulatory expectations.  

 

Lead platform modernisation and resilience engineering initiatives, including containerisation and cloudadjacent platforms (e.g. Kubernetes, OpenShift), working closely with Architecture, InfoSec, and application teams to embed reliability, security, and observability by design.  

 

Define and drive the LME Infrastructure Reliability posture, including fault tolerance, redundancy, capacity planning, disaster recovery, and failover strategies across onprem and hybrid environments. 

 

Lead seniorlevel technical discovery and design workshops to shape scope, delivery approach, and resourcing for reliabilitycritical initiatives, ensuring alignment with IOE priorities and business outcomes.  

 

Establish and assure Operational Readiness (ORR) standards: runbooks, monitoring and alerting, SLIs/SLOs, performance and capacity baselines, service transition, and operational handover.  

Ensure infrastructure platforms meet security and compliance requirements (e.g. CIS, ISO 27001, NIST), covering identity and access management, encryption, auditability, and regulatory evidence.  

 

Engage at senior stakeholder level across Technology and the business, providing clear communication on delivery status, operational risk, dependencies, cost forecasts, and resource demand. 

 

Academic and Professional Qualifications Required: 

 

Bachelor’s degree in Computer Science, Engineering, Information Technology, or a closely related discipline. 

 

Demonstrable track record of continuous professional development in infrastructure, solutions engineering, or technology transformation. 

  

Required Knowledge and Level of Experience: 

 

10+ years of experience leading largescale Infrastructure or Reliability Engineering functions, with demonstrable accountability for the availability, resilience, and operational performance of missioncritical systems. 

 

Proven experience establishing, scaling, or materially maturing an Infrastructure Reliability, Platform Reliability, or equivalent function within a complex, regulated, or highavailability environment. 

 

Significant experience operating in regulated or highassurance environments (e.g. financial services, exchanges, clearing, or equivalent). 

 

Experience influencing senior leadership and steering complex transformation initiatives across multiple technology domains. 

 

Significant experience leading or assuring largescale, enterprise Linux estates (e.g. RHELbased), including responsibility for reliability, resilience, and operational risk in regulated or highavailability environments. 

Skills set and Core Competencies Required for Role: 

 

Deep expertise in infrastructure reliability engineering, resilience patterns, and operational risk management 

 

Strong governance, assurance, and regulatory mindset 

 

Excellent stakeholder engagement and senior communication skills 

 

Ability to lead multidisciplinary technical teams through complex change 

 

Datadriven approach to reliability, performance, and continuous improvement 

 

Reliability engineering, resilience patterns, and operational risk management. 

Governance, assurance, and regulatory mindset. 

 

Datadriven analysis and decisionmaking. 

 

Senior stakeholder influence and technical authority. 

 

Team leadership and capability development. 

 

Technical Skills –Infrastructure Reliability Engineering 

  • Enterprise Linux / RHEL mastery 

  • Linux reliability, performance, and capacity engineering 

  • Automation, standardised builds, configuration management 

  • Observability, diagnostics, and rootcause analysis 

  • Linux host reliability for container / OpenShift platforms 

  • Linux security, hardening, and compliance 

  • Linuxlevel failure engineering and resilience patterns 

  • Senior Linux technical authority 

Personal Qualities: 

 

High integrity, ownership, and accountability in all aspects of work. 

 

Structured, pragmatic, and calm under pressure Able to manage competing priorities and deliver in high-stakes environments. 

 

Collaborative and inclusive, building strong cross-functional relationships and fostering a culture of open communication. 

 

Curious and improvement-oriented, always seeking to challenge the status quo and drive innovation with data-driven insights. 

 

Adaptable and resilient, able to navigate ambiguity and lead teams through complex change. 

 

Commitment to diversity, equity, and inclusion, respecting and valuing the unique contributions of all colleagues. 

 

Comfortable holding the line on operational risk and readiness in highpressure, timesensitive delivery environments. 

The LME is committed to creating a diverse environment and is proud to be an equal opportunity employer. In recruiting for our teams, we welcome the unique contributions that you can bring in terms of education, ethnicity, race, sex, gender identity, expression and reassignment, nation of origin, age, languages spoken, colour, religion, disability, sexual orientation and beliefs. In doing so, we want every LME employee to feel our commitment to showing respect for all and encouraging open collaboration and communication.  

Top Skills

Kubernetes
Linux
Openshift
Rhel
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Hong Kong, Hong Kong
1,723 Employees
Year Founded: 2000

What We Do

HKEX Group is a global exchange group, operating dynamic and integrated financial markets in Asia and Europe.

From our home in the financial hub of Hong Kong and an additional base in London, we provide world-class facilities for trading and clearing securities and derivatives in Equities, Commodities, Fixed Income and Currency.

Uniquely positioned at the intersection of Chinese and international capital flows, Hong Kong has long been Connecting China with the World. With the accelerated opening-up of China’s capital markets, HKEX continues to be at the forefront of this historic transition, which we believe will Shape the Global Market Landscape

Similar Jobs

Navan Logo Navan

Legal Assistant

Fintech • Information Technology • Payments • Productivity • Software • Travel • Automation
Easy Apply
Hybrid
London, Greater London, England, GBR
3300 Employees

Wise Logo Wise

Principal Marketing Lead - Assets

Fintech • Mobile • Payments • Software • Financial Services
Hybrid
London, Greater London, England, GBR
8000 Employees
90K-120K Annually

Wise Logo Wise

Principal Product Manager

Fintech • Mobile • Payments • Software • Financial Services
Hybrid
London, Greater London, England, GBR
8000 Employees

Immersive Logo Immersive

Accountant

Enterprise Web • HR Tech • Information Technology • Software • Cybersecurity
Hybrid
Bristol, England, GBR
330 Employees

Similar Companies Hiring

Rain Thumbnail
Web3 • Payments • Infrastructure as a Service (IaaS) • Fintech • Financial Services • Cryptocurrency • Blockchain
New York, NY
80 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account