System Reliability Engineer (Data Centre)

Posted Yesterday
Be an Early Applicant
Singapore, SGP
In-Office
Junior
Information Technology • Software • Cybersecurity • Defense
The Role
Ensure reliability, availability, and performance of data centre IT operations. Manage day-to-day monitoring, incident lifecycle, capacity planning, documentation, observability and network monitoring, remote management tools, and define SRE metrics (SLO/SLI/error budgets). Collaborate with facilities on power, cooling, and physical infrastructure.
Summary Generated by Built In
You will be part of a dynamic team responsible for ensuring the reliability, availability, and performance of our data centre's IT operations. As a System Reliability Engineer (Data Centre), you will oversee the day-to-day IT operations within the data centre, working closely with various teams to ensure seamless IT service delivery. While knowledge of data centre power and cooling infrastructure is beneficial, the primary focus of this role is on IT operations. You will collaborate with Data Centre Facilities teams on matters related to power, cooling, and physical infrastructure as needed. You must have a good understanding of cloud infrastructure technologies, architecture, and site reliability engineering (SRE) principles. 

Responsibilities

  • Oversee and manage IT operations within the data centre, including day-to-day monitoring, incident management, and problem management
  • Lead the end-to-end incident management lifecycle that encompass immediate troubleshooting, root cause identification, and resolution implementation to restore services, followed by comprehensive post-incident analysis
  • Develop and maintain documentation on IT infrastructure, operations, and procedures within the data centre
  • Perform capacity planning to ensure IT infrastructure is scalable for future demands
  • Collaborate and coordinate with Data Centre Facilities teams on matters related to power, cooling, and physical infrastructure
  • Design and implement robust observability platform alongside network monitoring tools for performance monitoring and real-time alerting of IT devices and networks
  • Implement and manage remote management tools for out-of-band access and control of IT devices and servers
  • Define, implement, and track SRE metrics, including SLO, SLI, and error budgets to improve data centre IT reliability

Requirements (Minimum Qualifications)

  • Background in Computer Science, Computer or Electrical Engineering, Information Technology or a related field
  • Good technical knowledge in IT infrastructure, including servers, storage, networking, and cloud technologies
  • Proficient in IT management software and tools
  • 2 years of working experience in IT operations is preferred
  • Fresh graduates are welcomed to apply
  •  

As CSIT is an agency under the Ministry of Defence (Singapore), only Singapore Citizens will be considered.

Skills Required

  • Degree in Computer Science, Computer Engineering, Electrical Engineering, Information Technology, or related field
  • Good technical knowledge in IT infrastructure (servers, storage, networking, cloud technologies)
  • Understanding of cloud infrastructure technologies, architecture, and site reliability engineering (SRE) principles
  • Proficient in IT management software and tools
  • Design and implement observability platforms and network monitoring tools for performance monitoring and real-time alerting
  • Implement and manage remote/out-of-band management tools for IT devices and servers
  • Define, implement, and track SRE metrics (SLO, SLI, error budgets)
  • 2 years working experience in IT operations
  • Fresh graduates are welcomed to apply
  • Knowledge of data centre power and cooling infrastructure (beneficial)
  • Singapore Citizenship (only Singapore Citizens will be considered)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
631 Employees
Year Founded: 2003

What We Do

The Centre for Strategic Infocomm Technologies (CSIT) is a technical agency in the Ministry of Defence that harnesses cutting-edge digital technologies to meet Singapore's security needs. It develops capabilities to support missions such as cyber defence, counter terrorism, and counter hostile information operations, with a technical focus on cybersecurity, data analytics, software engineering, and cloud infrastructure and services.

Similar Jobs

Wise Logo Wise

Automation Senior Analyst

Fintech • Mobile • Payments • Software • Financial Services
Hybrid
Singapore, SGP
9000 Employees

Braze Logo Braze

Account Executive

Marketing Tech • Mobile • Software
Easy Apply
Hybrid
Singapore, SGP
2000 Employees

Wise Logo Wise

APAC Employee Relations Regional Lead

Fintech • Mobile • Payments • Software • Financial Services
Hybrid
Singapore, SGP
9000 Employees

ServiceNow Logo ServiceNow

Consultant

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Singapore, SGP
29000 Employees

Similar Companies Hiring

Outpost Space Thumbnail
Aerospace • Defense
US
24 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account