Senior Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
Hiring Remotely in Canada
Remote
110K-160K Annually
Senior level
Software
The Role
Operate and maintain production AWS/EKS Kubernetes clusters; design and ship infrastructure-as-code with Terraform; manage Helm charts and ArgoCD GitOps for multi-region SaaS; maintain observability (Grafana, alerting, logs); improve CI/CD pipelines; remediate container and infrastructure CVEs; support compliance (FedRAMP/SOC2/NIST); create runbooks and lead incident response and post-incident reviews.
Summary Generated by Built In
Who We Are; What We Do; Where We’re Going
 
Magnet Forensics is a global leader in the development of digital investigative software that acquires, analyzes, and shares evidence from computers, smartphones, tablets, and IoT-related devices. We are continually innovating so our customers can deploy advanced and effective tools to protect their companies, communities, and countries.
 
Serving thousands of customers globally, our solutions are playing a crucial role in modernizing digital investigations, helping investigators fight crime, protect assets, and guard national security.
 
With employees based around the world, Magnet Forensics has been expanding our global presence. As a part of Magnet Forensics, you can expect to make a difference in the world, no matter what role you play. You’ll be supported through learning and development, not to mention an incredible team with unbelievable talent and integrity.
 
If you think you would be the right person to join our team working towards this goal, we would love to hear from you! 

Role Overview 
We're seeking a Senior Site Reliability Engineer to join our SaaS-Ops team within Shared Services Engineering. The team owns reliability and operational excellence for our highly available SaaS platform, a production Kubernetes environment serving law enforcement and government customers globally. 
 
This role requires deep AWS expertise, infrastructure-as-code discipline, and CI/CD best practices. You'll work closely with Application, Platform, and Security teams to drive secure-by-design architectures and improve automation and reliability across our cloud environments. You'll ship infrastructure as code, respond to production incidents with discipline, and drive platform modernization through deliberate roadmap execution.
 
As part of the SaaS-Ops team, you’ll work in a high-performing environment where members take ownership of outcomes and operate with a strong sense of trust and autonomy. You’ll identify challenges, contribute to solutions, raise concerns proactively, support improvements, and navigate situations requiring timely decision-making. If you’re looking for your next challenge where infrastructure quality directly impacts real‑world outcomes, this role could be a great fit!
 
 
Note: This role includes participation in an on-call rotation.

What You’ll Do

  • Own and operate production Kubernetes clusters (Amazon EKS) including upgrades, scaling, security hardening, and cluster lifecycle management;
  • Design, implement, and maintain infrastructure-as-code using Terraform; contribute to shared module libraries and enforce IaC standards across the team;

  • Manage and evolve Helm chart definitions and ArgoCD GitOps workflows for multi-region SaaS deployments;

  • Operate and maintain observability infrastructure including Grafana, alerts, dashboards, and log pipelines. Act to eliminate noise and surface signal;

  • Contribute to pipeline reliability: identify flaky stages, reduce build times, improve developer experience across CI/CD pipelines;

  • Remediate security vulnerabilities (CVEs) in container images and infrastructure components; participate in compliance work including FedRAMP support activities;

  • Develop and maintain runbooks, change management procedures, and operational documentation;

  • Ensure alignment with internal policies and frameworks such as ISO 27001, SOC2, and NIST;

  • Contribute to AI-assisted tooling and automation (e.g., Claude-based Terraform agents, automated triage tools) as part of the team's operational efficiency roadmap;

  • Participate in on-call incident response rotation; lead or support incident command during active production incidents including root cause analysis and post-incident review.

What We’re Looking For

  • 5+ years of industry experience with a trajectory that demonstrates growing depth in cloud infrastructure and SRE practices;

  • Managed production Kubernetes environments at scale: not just deployed workloads, but owned cluster health, upgrades, and failure modes;

  • Responded to production incidents in high-stakes environments where downtime has real consequences;

  • Written and maintained Terraform at the module level, not just as a consumer: understands state, dependencies, and the operational burden of drift;

  • Operated in an environment that uses GitOps: has a good understanding of Helm chart organization, ArgoCD app-of-apps patterns, or equivalent;

  • Balanced reactive operational work with proactive roadmap delivery; knows how to protect time for improvements while keeping production stable;

  • Worked with observability as a first-class discipline: built meaningful dashboards, eliminated alert fatigue, and used metrics to make operational decisions;

  • Contributed to security hardening in a regulated or compliance-adjacent environment: FedRAMP, SOC 2, or similar frameworks are a strong asset.

Compensation & Benefits 
The Compensation range is for the primary location for which the job is posted. Please note that the actual compensation may vary depending on location and job-related factors such as qualifications, experience, knowledge and skills. If you are applying for this role outside of the primary location and you are selected for an interview, the Talent Acquisition Partner can share more information with you.  If the compensation structure for the role includes an incentive component (i.e. most Sales roles) the range below represents total target compensation (TTC) (base salary + variable).
 
$110,000 - $160,000 CAD (CDN) a year 
 
Position Type: Current Vacancy 

Magnet is proud to offer benefits such as: 
 
- Generous time off policies 
- Competitive compensation 
- Volunteer opportunities  
- Reward and recognition programs   
- Employee committees & resource groups  
- Healthcare and retirement benefits 
 
Indicators of Success
 
We’re looking for someone who checks off most, but not all, of the boxes listed in “skills and experiences”.  It’s more important to us to find candidates who can display indicators of success through skills they have developed and experiences they have been a part of, than to find folks who have “been there, done that”.  We want to be part of your development journey, and we’ll learn as much from you as you learn from us. 
 
How We Work
 
At Magnet Forensics, we take a hybrid-flexible approach to support your productivity and work-life balance. If you’re within a comfortable travel distance to one of our offices, you’ll occasionally join us in person. How often you’ll come in depends on your department and team needs, typically ranging from weekly to monthly. These in-person moments help us build stronger connections, spark new ideas, and celebrate our successes together. Most days, you can choose what works best for you, while staying in tune with your team’s goals.
 
We’re excited to welcome you to our team and look forward to achieving great things together - both in the office and wherever you work best!
 
The Most Important Thing
 
We’re looking for candidates that can provide examples of how they have demonstrated Magnet CODE in their previous experiences:
 
CARE – We care about each other and our mission to make a difference in the world.
OWN – We are accountable for our results – while never forgetting to act with integrity, empathy, and respect.
DEDICATE – We put our heart and soul into meeting the needs of our customers and helping them serve the people they protect.
EVOLVE – We are constantly innovating and exploring new ways to work together to make an impact with our work.
 
Here at Magnet Forensics, we are committed to continuous learning and are focused on building a diverse and inclusive workforce. This commitment will be reflected in our hiring processes and embedded in our values and how we treat one another. If you’re interested in this role, but do not meet all of the qualifications listed above, we encourage you to apply anyways.
 
Magnet Forensics is an Equal Opportunity Employer and considers applicants for employment without regard to race, colour, religion, sex, orientation, national origin, age, disability, genetics or any other basis forbidden under federal, provincial, or local law. We are committed to providing an inclusive, accessible recruitment process and work environment. Accommodation is available to all applicants upon request throughout the hiring process. Please contact [email protected] should you require any accommodations.
 
All offers of employment at Magnet are contingent upon satisfactory completion of a background check. All background checks will be conducted in accordance with all applicable laws. Magnet will consider each position’s job duties, among other factors, in determining what constitutes satisfactory completion of the background check. Refusal to consent to a background check may be grounds for revoking an offer of employment.
 
US Applicants: Magnet Forensics participates in E-Verify and will provide the federal government with your Form I-9 information to confirm that you are authorized to work in the U.S.
 
Magnet Forensics handles and uses personal data of job applicants in line with its Recruitment Privacy Policy found here

Skills Required

  • 5+ years of industry experience in cloud infrastructure and SRE practices
  • Deep AWS expertise and experience operating production AWS environments
  • Managed production Kubernetes environments at scale (Amazon EKS) including upgrades and cluster lifecycle management
  • Written and maintained Terraform at the module level (IaC discipline)
  • Experience with Helm chart management and GitOps workflows (ArgoCD, app-of-apps patterns)
  • Experience operating and maintaining observability infrastructure (Grafana, alerts, dashboards, log pipelines)
  • Experience improving and troubleshooting CI/CD pipelines and developer workflows
  • Experience responding to production incidents and participating in on-call rotations, including incident command and RCA
  • Experience remediating security vulnerabilities in container images and infrastructure components (CVEs)
  • Experience in regulated or compliance-adjacent environments (FedRAMP, SOC2, ISO 27001, NIST)
  • Familiarity with AI-assisted tooling for automation (e.g., Claude-based Terraform agents)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Herndon, VA
456 Employees
Year Founded: 2009

What We Do

Magnet Forensics is a global leader in the development of digital investigation software that acquires, analyzes and shares evidence from computers, smartphones, tablets and IoT related devices. Magnet Forensics has been helping law enforcement fight crime, protect assets and guard national security since 2009. Magnet Forensics has become a trusted partner for thousands of the world’s top law enforcement, government, military and corporate organizations in over 92 countries. Court-admissible evidence recovered by Magnet Forensics tools has been used to support a wide-variety of investigations including cybercrimes, child exploitation, terrorism, human resource disputes, fraud, and intellectual property theft. For more information, please visit https://www.magnetforensics.com

Similar Jobs

Remote
Canada
1485 Employees
108K-125K Annually

AuthZed Logo AuthZed

Senior Site Reliability Engineer

Artificial Intelligence • Information Technology • Software • Database
Remote
2 Locations
30 Employees

ScalePad Logo ScalePad

Senior Site Reliability Engineer

Information Technology • Software
In-Office or Remote
Vancouver, BC, CAN
224 Employees

Block Logo Block

Senior Site Reliability Engineer

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office or Remote
8 Locations
12000 Employees
161K-284K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account