Senior Site Reliability Engineer

Posted 18 Days Ago
Be an Early Applicant
Fairfax, VA, USA
In-Office
Senior level
Artificial Intelligence • Cloud • Information Technology • Security • Software
The Role
Responsibilities include defining, implementing, and growing the SRE practice, ensuring reliability and performance of production environments, and collaborating with cross-functional teams.
Summary Generated by Built In
Job Summary & Responsibilities

ECS is seeking a Senior Site Reliability Engineer to work in our Fairfax, VA office.


ECS is seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better awareness and visibility into their security posture and cyber threats. ECS is responsible for designing, building, deploying, operating, and maintaining a complete ‘Data Services’ solution which includes the collection, normalization, visualization, and sharing of cyber data from more than 100 Federal agencies. The CDM Data Services product is an integrated suite of multiple Commercial Off the Shelf (COTS) products, software configuration packages, and custom code which work together to operate as an integrated solution tailored to meet Department of Homeland Security (DHS) requirements.  

 

We are seeking professionals who thrive in a dynamic, fast-paced, and highly collaborative environment where problem-solving, critical thinking, and a holistic approach to serving the mission are key.  Our program operates within the Scaled Agile Framework (SAFe). An aptitude and enthusiasm for continuous learning, improvement, and cyber security is a must!

 

Role & Responsibilities: 

ECS is seeking a talented Senior Site Reliability Engineer (SRE) to play a key role in defining, implementing, and growing our SRE practice to ensure the reliability, availability, and performance of our critical production environments. 

The Senior SRE will contribute to a culture of continuous improvement, identifying areas for enhancement, and driving initiatives to improve system reliability, scalability, and efficiency.  

The successful candidate will have demonstrated hands-on experience designing, implementing, and maintaining solutions to ensure that systems, including infrastructure and applications, are resilient, highly available, and performant.  The Senior SRE will also play a critical role in defining and measuring the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our solution. 

The Senior SRE will be responsible for setting up comprehensive logging, monitoring, and alerting solutions using the Elastic stack and other tools as necessary to ensure the continuous performance of services Additionally, they will respond to incidents, perform root cause analyses, and implement solutions to prevent reoccurrences.  The Senior SRE will work in close collaboration with other SRE team members, developers, testers, infrastructure engineers, DevOps engineers, and other stakeholders to integrate reliability and observability into the software development lifecycle. 

Preferred Qualifications
  • Must be a US citizen with the ability to obtain Public Trust Suitability.  
  • 6+ years of experience as a Site Reliability Engineer (SRE) or equivalent 
  • 6+ years of demonstrated experience designing, implementing, and maintaining observability solutions to include logging, monitoring, and alerting 
  • 6+ years of hands-on experience with SRE tools (e.g., Elastic, Prometheus, Grafana, Splunk, etc.) 
  • 3+ years defining and measuring SLOs and SLIs  
  • 3+ years of relevant experience using cloud platforms (AWS GovCloud preferred) 
  • 3+ years of hands-on programming or scripting (e.g., Python, Bash, etc.) 
  • Strong knowledge of microservices, containerization, and orchestration tools (Docker, Kubernetes) 
  • Proven ability to collaborate with cross-functional teams (development, testing, and product) to integrate reliability and observability into the software development lifecycle 
  • Strong problem-solving and analytical skills 
  • Proactive, detail-oriented approach to identifying inefficiencies and implementing improvements. 
  • Proficient in developing Synthetic monitoring scripts using typescript. 

Top Skills

Aws Govcloud
Bash
Docker
Elastic
Grafana
Kubernetes
Prometheus
Python
Splunk
Typescript
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Fairfax, VA
2,129 Employees
Year Founded: 1993

What We Do

ECS, a segment of ASGN (NYSE: ASGN), delivers advanced solutions and services in cloud, cybersecurity, artificial intelligence (AI), machine learning (ML), application and IT modernization, and science and engineering. The company solves critical, complex challenges for customers across the U.S. public sector, defense, intelligence and commercial industries. ECS maintains partnerships with leading cloud, cybersecurity, and AI/ML providers and holds specialized certifications in their technologies. Headquartered in Fairfax, Virginia, ECS has more than 3,400 employees throughout the U.S. and has been recognized as a Top Workplace by The Washington Post for the last five years.

Similar Jobs

MongoDB Logo MongoDB

Senior Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
9 Locations
5550 Employees
127K-249K Annually
Remote or Hybrid
United States
1750 Employees

Zocdoc Logo Zocdoc

Senior Site Reliability Engineer

Healthtech • Information Technology • Software • Telehealth
Easy Apply
Remote or Hybrid
USA
900 Employees
180K-220K Annually

HiBob Logo HiBob

Senior Site Reliability Engineer

HR Tech • Information Technology • Professional Services • Sales • Software
Remote or Hybrid
United States
1350 Employees
170K-215K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account