Speer

Site Reliability Engineer

Posted 7 Days Ago

Be an Early Applicant

Toronto, ON

In-Office

70K-120K Annually

Mid level

Business Intelligence • Consulting

Reframing the future

The Role

The Site Reliability Engineer ensures operational excellence, reliability, and uptime of production systems through monitoring, incident response, and automation. Responsibilities include defining SLAs, leading incident management, implementing observability, and collaborating on system designs.

Summary Generated by Built In

Who we are!

Speer Technologies is a dynamic technology hub based in Toronto, partnered with some of the largest technology incubators in the Greater Toronto Area. We are a team of passionate innovators and open-minded thinkers, dedicated to building groundbreaking technologies. Our products are on the path to receiving FDA and ADA approvals or provisional patents, with partnerships spanning Italy, Germany, California, and France.
As a startup, we thrive on creativity, collaboration, and the drive to push boundaries. Our fast-paced environment offers exposure to a variety of programming languages, software, and work environments, ensuring a rich learning experience. We provide ample opportunities for personal and professional growth, all while fostering an inclusive and barrier-free workplace.
Speer is an equal opportunity employer and is committed to providing an inclusive and barrier-free recruitment process. We will accommodate the needs of applicants under the Ontario Human Rights Code and the Accessibility for Ontarians with Disabilities Act (AODA) throughout all stages of the recruitment and selection process.
Please advise Speer of any accommodations you may require to ensure your equal participation in the recruitment and selection process. Information received relating to accommodation measures will be addressed confidentially.

Why Speer Technologies?

Growth Opportunities: We offer the chance to grow with the company and take on new responsibilities as we expand.
Dynamic Environment: Our fast-paced startup environment ensures no two days are the same.
Innovation: Be part of a team that's pushing the boundaries of technology and making a real impact.
Inclusive Workplace: We are committed to creating an inclusive environment where all employees can thrive.

Role Summary

The Site Reliability Engineer (SRE) is responsible for ensuring the availability, reliability, and operational excellence of production systems. This role bridges infrastructure engineering and operations by applying software engineering principles to infrastructure, monitoring, incident response, and continuous improvement. The SRE ensures systems meet defined uptime, performance, and resiliency targets as they scale.

Key Responsibilities

Reliability & Availability

Define and enforce SLAs, SLOs, and error budgets for infrastructure and applications.
Design and maintain monitoring, alerting, and health checks across network, hardware, and application layers.
Identify reliability risks and implement preventative controls.

Incident Management

Lead or participate in incident response and on-call rotations.
Reduce MTTR (Mean Time to Recovery) through automation, runbooks, and alert tuning.
Conduct post-incident reviews and drive corrective actions.

Observability & Monitoring

Implement centralized logging, metrics, and tracing.
Monitor system health including connectivity, latency, hardware status, and application availability.
Ensure alerts are actionable and aligned to business impact.

Automation & Tooling

Automate repetitive operational tasks.
Build self-healing mechanisms where possible.
Improve deployment and rollback reliability.

Collaboration & Architecture

Partner with Infrastructure Architects to validate resiliency and failover designs.
Support Systems Implementation Engineers during go-lives and major changes.
Provide operational feedback to improve future designs.

Documentation & Operational Readiness

Create and maintain runbooks, escalation paths, and recovery procedures.
Ensure operational readiness before systems enter production.
Continuously improve reliability through data-driven analysis.

Required Skills & Experience

Experience supporting production infrastructure with uptime or SLA requirements.
Strong understanding of networking concepts (latency, packet loss, redundancy).
Experience with monitoring and observability tools.
Familiarity with incident management and on-call practices.
Ability to automate operational workflows.
Strong troubleshooting and root-cause analysis skills.
Clear written and verbal communication.

Nice to Have

Experience supporting multi-site or distributed systems.
Exposure to IoT, access control, cameras, or physical infrastructure.
Experience in regulated or compliance-heavy environments.
Background in systems engineering or infrastructure architecture.
Fluency in French is an asset.

Job Details

Job Type: Full-Time
Pay: $70,000–$120,000 a year
Flexible language requirement: French not required
Schedule: 8 hour shift, Monday to Friday, Overtime

Benefits

Dental care
Paid time off
Vision care
Wellness program

Top Skills

Automation

Monitoring Tools

Observability Tools

View all jobs at Speer

View Speer Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Toronto, ON

51 Employees

Year Founded: 2019

What We Do

Speer helps companies, start-ups and institutions build tools with proprietary tech, while assisting with end to end product development. We're also building apps and tools aimed to help improve our day to day lives.

Everyone at Speer is recognized for their unique talents, beyond technical skills. We have artists, designers, engineers, developers but more importantly - people who strive to impact the world with their work.

We want to be your partners in innovation, helping to bring your vision to life — which is why we never use the word clients. Only collaborators!