Site Reliability Engineer

Posted 7 Days Ago
Be an Early Applicant
Toronto, ON
In-Office
70K-120K Annually
Mid level
Business Intelligence • Consulting
Reframing the future
The Role
The Site Reliability Engineer ensures operational excellence, reliability, and uptime of production systems through monitoring, incident response, and automation. Responsibilities include defining SLAs, leading incident management, implementing observability, and collaborating on system designs.
Summary Generated by Built In
Who we are!
  • Speer Technologies is a dynamic technology hub based in Toronto, partnered with some of the largest technology incubators in the Greater Toronto Area. We are a team of passionate innovators and open-minded thinkers, dedicated to building groundbreaking technologies. Our products are on the path to receiving FDA and ADA approvals or provisional patents, with partnerships spanning Italy, Germany, California, and France.
  • As a startup, we thrive on creativity, collaboration, and the drive to push boundaries. Our fast-paced environment offers exposure to a variety of programming languages, software, and work environments, ensuring a rich learning experience. We provide ample opportunities for personal and professional growth, all while fostering an inclusive and barrier-free workplace.
  • Speer is an equal opportunity employer and is committed to providing an inclusive and barrier-free recruitment process. We will accommodate the needs of applicants under the Ontario Human Rights Code and the Accessibility for Ontarians with Disabilities Act (AODA) throughout all stages of the recruitment and selection process.
  • Please advise Speer of any accommodations you may require to ensure your equal participation in the recruitment and selection process. Information received relating to accommodation measures will be addressed confidentially.
Why Speer Technologies?
  • Growth Opportunities: We offer the chance to grow with the company and take on new responsibilities as we expand.
  • Dynamic Environment: Our fast-paced startup environment ensures no two days are the same.
  • Innovation: Be part of a team that's pushing the boundaries of technology and making a real impact.
  • Inclusive Workplace: We are committed to creating an inclusive environment where all employees can thrive.
Role Summary

The Site Reliability Engineer (SRE) is responsible for ensuring the availability, reliability, and operational excellence of production systems. This role bridges infrastructure engineering and operations by applying software engineering principles to infrastructure, monitoring, incident response, and continuous improvement. The SRE ensures systems meet defined uptime, performance, and resiliency targets as they scale.

Key Responsibilities

Reliability & Availability

  • Define and enforce SLAs, SLOs, and error budgets for infrastructure and applications.
  • Design and maintain monitoring, alerting, and health checks across network, hardware, and application layers.
  • Identify reliability risks and implement preventative controls.

Incident Management

  • Lead or participate in incident response and on-call rotations.
  • Reduce MTTR (Mean Time to Recovery) through automation, runbooks, and alert tuning.
  • Conduct post-incident reviews and drive corrective actions.

Observability & Monitoring

  • Implement centralized logging, metrics, and tracing.
  • Monitor system health including connectivity, latency, hardware status, and application availability.
  • Ensure alerts are actionable and aligned to business impact.

Automation & Tooling

  • Automate repetitive operational tasks.
  • Build self-healing mechanisms where possible.
  • Improve deployment and rollback reliability.

Collaboration & Architecture

  • Partner with Infrastructure Architects to validate resiliency and failover designs.
  • Support Systems Implementation Engineers during go-lives and major changes.
  • Provide operational feedback to improve future designs.

Documentation & Operational Readiness

  • Create and maintain runbooks, escalation paths, and recovery procedures.
  • Ensure operational readiness before systems enter production.
  • Continuously improve reliability through data-driven analysis.
Required Skills & Experience
  • Experience supporting production infrastructure with uptime or SLA requirements.
  • Strong understanding of networking concepts (latency, packet loss, redundancy).
  • Experience with monitoring and observability tools.
  • Familiarity with incident management and on-call practices.
  • Ability to automate operational workflows.
  • Strong troubleshooting and root-cause analysis skills.
  • Clear written and verbal communication.
Nice to Have
  • Experience supporting multi-site or distributed systems.
  • Exposure to IoT, access control, cameras, or physical infrastructure.
  • Experience in regulated or compliance-heavy environments.
  • Background in systems engineering or infrastructure architecture.
  • Fluency in French is an asset. 
Job Details
  • Job Type: Full-Time
  • Pay: $70,000–$120,000 a year
  • Flexible language requirement: French not required
  • Schedule: 8 hour shift, Monday to Friday, Overtime
Benefits
  • Dental care
  • Paid time off
  • Vision care
  • Wellness program

Top Skills

Automation
Monitoring Tools
Observability Tools
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Toronto, ON
51 Employees
Year Founded: 2019

What We Do

Speer helps companies, start-ups and institutions build tools with proprietary tech, while assisting with end to end product development. We're also building apps and tools aimed to help improve our day to day lives.

Everyone at Speer is recognized for their unique talents, beyond technical skills. We have artists, designers, engineers, developers but more importantly - people who strive to impact the world with their work.

We want to be your partners in innovation, helping to bring your vision to life — which is why we never use the word clients. Only collaborators!

Similar Jobs

iManage Logo iManage

Site Reliability Engineer

Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software
Hybrid
Toronto, ON, CAN
1100 Employees

Boson AI Logo Boson AI

Site Reliability Engineer

Artificial Intelligence • Machine Learning
In-Office
Toronto, ON, CAN
21 Employees
150K-250K Annually

Verto Health Logo Verto Health

Software Engineer

Healthtech • Software
In-Office
Toronto, ON, CAN
68 Employees

Funded.club Logo Funded.club

Site Reliability Engineer

HR Tech • Professional Services
In-Office
Toronto, ON, CAN
25 Employees

Similar Companies Hiring

Northslope Technologies Thumbnail
Software • Information Technology • Generative AI • Consulting • Artificial Intelligence • Analytics
Denver, CO
88 Employees
Compa Thumbnail
Software • Other • HR Tech • Business Intelligence • Artificial Intelligence
Irvine, CA
70 Employees
Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account