Principal Site Reliability Engineer

Posted 19 Hours Ago
Be an Early Applicant
Park, Galway
Expert/Leader
Fintech
The Role
The Principal Site Reliability Engineer will lead reliability and observability strategies, manage incidents, and coach teams on building scalable systems. Responsibilities include deploying highly distributed systems, utilizing automation, and ensuring services are available and resilient through monitoring and incident management.
Summary Generated by Built In

Job Description:

Principal Site Reliability Engineer

Our Site Reliability Engineering group within Enterprise Infrastructure combines Operations Excellence with the Development Experience to deliver services at high scale, high availability with resilience by using automation and Infrastructure Code. We build reliability into our ecosystem by applying best practices in Resiliency Engineering, Automation, Observability & Chaos Testing.

The team comes from diverse technical backgrounds, and the responsibilities provide the opportunity for a variety of challenges. Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience as an SRE. We are looking for a Systems Thinking, SRE Principal Engineer who has helped teams scale through production insights, operational automation, developer guidance, real-time metrics, automation, automation, automation...!

The Expertise We’re Looking For

  • Bachelor’s degree or higher in a technology related field (e.g. Engineering, Computer Science.) required, master’s degree a plus.
  • 8+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale.
  • Hands-on experience with Public Cloud, preferably AWS or Azure.
  • Hands-on experience with EKS, AKS OR Rancher Kubernetes Service container orchestration.
  • Experience operating and implementing distributed & highly concurrent service-based architectures, including microservices, containerized services, and/or serverless architectures.
  • Thought leadership and an ability to handle production incidents.

The Skills You Bring

  • Hands-on Kubernetes skills and knowledge.
  • Programming/development track record with a compiled/OOP-geared language like C# or Java and scripting/interpreted language experience like JavaScript/TypeScript or Python.
  • Proven experience in maintaining scalability and resiliency of complex environment.
  • Demonstrated ability to utilize modern monitoring tools (Datadog, Prometheus, Splunk, …)
  • Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.
  • Understand, Implement and be accountable for the Production Services/SRE Capabilities across Digital Security. This includes a direct knowledge of the capabilities, usage & value, gaps and challenges. 
  • Technical & Operational leadership and be an escalation point of contact during major incidents or issues that are not resolved in the expected timeframes. Hands-on responsibility to actively lead Production bridges during major incidents working across the team and the Enterprise Infrastructure organization.
  • Responsible for the execution and quality controls for the Fidelity Brokerage Business Unit Specific Post-Mortem reviews for the team including deep technical RCA, Observability & Automation reviews and act as the connection across Enterprise Infrastructure domains in the region.

The Value You Deliver

  • Help define and execute a comprehensive reliability and observability strategy, ensuring that Fidelity’s systems are always available when our customers need them.
  • Bring together technical, procedural, and financial data to reduce toil and increase efficiency.
  • You will execute plans for technical standardization and process refinement within the engineering organization, especially for Site Reliability Engineers.
  • Troubleshoot stack-wide engineering issues related to hardware, software, network, applications and cloud service providers.
  • Coach peer SREs and development teams on how to build highly available systems.

For more like this search #SWE


Category:Information Technology

Top Skills

AWS
Azure
C#
Java
JavaScript
Kubernetes
Python
Typescript
The Company
HQ: Boston, MA
58,848 Employees
On-site Workplace
Year Founded: 1946

What We Do

At Fidelity, our goal is to make financial expertise broadly accessible and effective in helping people live the lives they want. We do this by focusing on a diverse set of customers: - from 23 million people investing their life savings, to 20,000 businesses managing their employee benefits to 10,000 advisors needing innovative technology to invest their clients’ money. We offer investment management, retirement planning, portfolio guidance, brokerage, and many other financial products.

Privately held for nearly 70 years, we’ve always believed by providing investors with access to the information and expertise, we can help them achieve better results. That’s been our approach- innovative yet personal, compassionate yet responsible, grounded by a tireless work ethic—it is the heart of the Fidelity way.

Similar Jobs

Rent the Runway Logo Rent the Runway

Site Reliability Engineer III

eCommerce • Fashion • Logistics
Hybrid
Galway, IRL
1000 Employees

Rent the Runway Logo Rent the Runway

Senior Site Reliability Engineer

eCommerce • Fashion • Logistics
Hybrid
Galway, IRL
1000 Employees
Park, Galway, IRL
58848 Employees

Similar Companies Hiring

EDGE Thumbnail
Software • Fintech • Financial Services • Analytics
Chicago, IL
20 Employees
Bectran, Inc Thumbnail
Software • Machine Learning • Information Technology • Fintech • Automation • Artificial Intelligence
Schaumburg, IL
51 Employees
MassMutual India Thumbnail
Insurance • Information Technology • Fintech • Financial Services • Big Data
Hyderabad, Telangana

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account