Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
Rockville, MD, USA
Hybrid
112K-150K Annually
Mid level
Artificial Intelligence • Cloud • Software • Cybersecurity
The Role
Operate and tune AWS environments to meet SLAs, build observability and alerts, automate infrastructure with IaC and CI/CD, define SLIs/SLOs, support security/compliance within a FISMA Moderate boundary, design resilience and DR plans, and own incident response and post-mortems.
Summary Generated by Built In
We are Skyward.
 
That is, a love for people, for improvement, for human advancement through information technology. We are a people-centered business with a desire to serve others. We are diverse and unified; creative and collaborative; a collection of complementary, not competing talents. And though on the surface we remain relaxed, beneath, a torrent of energy links us to our civic tech mission.
 
We stand by our values, and we won’t compromise on any of them.
 
Integrity: We’re conscientious, intentional, and empathetic. Our words and actions align. That’s our character. Please don’t ask us to play another part, we’re poor actors.   
Compassionate: If we may borrow a quote from Theodore Roosevelt: “No one cares how much you know until they know how much you care.” Because our team is thoughtful and supportive, caring deeply for each other, our clients, and our work, this comes naturally. 
Inquisitive: We remain students by failing openly and turning lessons into solutions.
Unconventional: For us, life isn’t what happens outside of work. Work happens inside of life and our culture erases the line often dividing the two.   
Authentic: Made possible only because we embody the values listed above. We’re relaxed and fun yet intensely curious and driven. Team members are placed with thought, care, and precision to ensure that Trust, Truth, and Transparency continue to represent our brand.
 
Because of that, we continue Onward, Upward, and Skyward.

We need an SRE.
Do you have a real feel for how distributed systems behave, and a knack for tracking down the network, infrastructure, or pipeline issue everyone else gave up on? Are you comfortable in the cloud, fluent in CI/CD, and the type who believes an alert should mean something and a dashboard should tell a story? If you love keeping complex systems healthy, fast, and quietly reliable, then apply. Like, now.

Come join us if you're motivated to learn from others, to learn from mistakes, to be part of a future-looking and growth-oriented team.

Let's go Skyward together.

What you'll do:

  • Join the team supporting the Centers for Medicare & Medicaid Services (CMS) as it merges and modernizes its enterprise knowledge and data systems into a single, AI-driven platform, reducing manual effort, improving data accuracy, and enhancing transparency for stakeholders.
  • Keep the systems up and the users happy. Operate and tune AWS environments to meet infrastructure and application availability SLAs, even during transition and change.
  • Build observability that actually informs. Implement continuous monitoring, alerting, and dashboards using tools like AWS CloudWatch, New Relic, and Splunk, and establish performance baselines so you can spot degradation before users do.
  • Automate the toil. Write infrastructure-as-code (Terraform, Ansible) and support CI/CD pipelines (Jenkins) and containerized workloads (Docker) for repeatable, reliable deployments.
  • Define and track the numbers that matter. Set and monitor SLIs and SLOs, and produce performance, load/stress, and bottleneck reports that drive smarter decisions.
  • Optimize for performance, security, and cost. Use tools like AWS Trusted Advisor to find and act on improvement opportunities.
  • Support security and compliance modernization. Partner with the Security & Compliance SME to review vulnerability and security scans, feed continuous monitoring, and help advance the move toward a Continuous ATO (cATO) within a FISMA Moderate boundary (RMF, ARS, IS2P2).
  • Strengthen resilience. Help design and maintain disaster recovery and COOP continuity so the systems hold up against outages, incidents, and the unexpected.
  • Own incidents end to end. Drive response, run blameless post-mortems, and implement the preventative fixes that keep the same thing from happening twice.

What we’d like you to have:

  • A bachelor’s degree in computer science, engineering, or a related field (or equivalent hands-on experience).
  • 3–5 years of experience in site reliability, systems, or cloud engineering, with meaningful time spent in AWS environments.
  • Solid working knowledge of core AWS services, architecture, and best practices.
  • Hands-on experience with infrastructure-as-code tools (Terraform, Ansible, or CloudFormation).
  • A good understanding of CI/CD pipelines and automation tools (Jenkins, GitLab CI, or similar).
  • Comfort scripting and automating in Python.
  • Familiarity with monitoring and observability tooling (CloudWatch, New Relic, Splunk, or comparable).
  • Strong problem-solving instincts and the composure to work calmly under pressure.
  • Clear communication skills, with the ability to make complex technical concepts understandable.

What would blow us away:

    • You’ve previously worked with CMS.
    • You have experience working in AI, NLP, or LLM-driven environments.
    • You have all the AWS certifications and the real-world scars that come with them.
    • Even if you don’t meet 100% of the qualifications, we encourage you to apply. At Skyward, we’re focused on hiring individuals with the right skills and passion to grow, not just checking off every box.

And now the important part. What we offer you:

  • Medical, dental, vision insurance (fully paid for employees)
  • 15 days of paid leave
  • 7 days of sick leave
  • 2 days bereavement leave
  • 11 paid Federal holidays
  • Up to 40 hours for jury duty
  • 401K with 4% employer contribution (and no vesting period)
  • Up to 4 weeks of paid paternity and maternity leave
  • Company provided laptop
  • $5,000 per year for professional development
  • $600 per year for technical supplies and equipment
  • $2,000 referral bonus
  • Life and disability insurance
  • HSA and FSA
  • Legal Shield and ID Shield Voluntary Benefits
  • Opportunity to work in a collaborative, motivated team focused on modernizing government services with cutting-edge technology and innovative solutions. Who says government work can't be exciting!

At Skyward, we support flexible working hours and remote opportunities to help maintain a healthy work-life balance for all employees.
 
Offers of employment with Skyward are contingent upon acceptable results of a background investigation.
 
Applicants must have the ability to obtain and maintain a Public Trust security clearance due to the nature of our work as a government contractor.

Skills Required

  • Bachelor's degree in computer science, engineering, or related field (or equivalent hands-on experience)
  • 3-5 years experience in site reliability, systems, or cloud engineering with meaningful AWS experience
  • Working knowledge of core AWS services, architecture, and best practices
  • Hands-on experience with infrastructure-as-code (Terraform, Ansible, or CloudFormation)
  • Experience with CI/CD pipelines and automation tools (Jenkins, GitLab CI, or similar)
  • Scripting and automation experience in Python
  • Familiarity with monitoring and observability tooling (CloudWatch, New Relic, Splunk, or comparable)
  • Experience with containerized workloads (Docker)
  • Ability to obtain and maintain a Public Trust security clearance (background check)
  • Strong problem-solving skills and ability to work calmly under pressure
  • Clear communication skills to explain complex technical concepts
  • Prior experience with CMS
  • Experience with AI, NLP, or LLM-driven environments
  • AWS certifications
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
63 Employees

What We Do

Skyward IT Solutions, LLC is a technology services provider specializing in AI-driven government services and digital modernization for federal and public sector operations. Founded in 2013, the company delivers secure, scalable, and user-centered solutions including custom AI agents, cloud engineering, and agile software development, focusing on improving service delivery and operational efficiency for agencies such as CMS and the SBA.

Similar Jobs

Domino Data Lab Logo Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning
Easy Apply
Remote or Hybrid
US
200 Employees
200K-230K Annually

BAE Systems, Inc. Logo BAE Systems, Inc.

Site Reliability Engineer

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Hybrid
Fort Meade, MD, USA
40000 Employees
150K-254K Annually

AlphaSense Logo AlphaSense

Site Reliability Engineer

Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Business Intelligence
Remote or Hybrid
United States
2000 Employees
150K-225K Annually

Akamai Technologies Logo Akamai Technologies

Site Reliability Engineer

Cloud • Security • Software • Cybersecurity
In-Office or Remote
2 Locations
10285 Employees
95K-171K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account