Senior II Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
2 Locations
In-Office or Remote
120K-217K Annually
Senior level
Cloud • Security • Software • Cybersecurity
Akamai powers and protects life online by solving the toughest challenges and turning the impossible into the possible.
The Role
Lead reliability for global load-balancing infrastructure: build observability pipelines, define SLO/SLIs, lead incident response, automate deployment safety, review designs, and develop tooling (Python/Go) and IaC to reduce operational toil.
Summary Generated by Built In

Do you want to own the reliability of cloud load balancing infrastructure that serves thousands of customers at global scale?

Are you a senior technical leader who can drive solutions across distributed teams while mentoring the engineers around you?

Join our Cloud Networking SRE Team!

The Cloud Networking SRE team is part of Akamai's Infrastructure Engineering & Operations organization. We design, deploy, and manage the reliability of Akamai's core cloud networking products. These products are foundational to the Akamai Cloud Compute platform, serving customer workloads across dozens of global regions.

Partner with the best

In this role, you'll own the operational reliability of two generations of load balancing infrastructure. You'll build and maintain observability frameworks, lead incident response for complex multiservice failures, and drive safe deployment practices across global rollouts. You'll work closely with cross-functional teams to shape how the NB/NLB stack evolves operationally.

As a Senior II Site Reliability Engineer, you will be responsible for:

  • Owning the SRE infrastructure lifecycle from design reviews and pre-rollout readiness assessments through production sign-off and ongoing reliability management
  • Designing and implementing frameworks that reflect customer experience for load balancing services and driving action when error budgets are at risk
  • Building and maintaining observability pipelines from load-balancing components and system-level sources to dashboards that enable rapid incident triage
  • Leading technical incident response for complex NB/NLB failures, acting as the technical commander and driving root cause analysis and preventive follow-through
  • Developing and automating safe deployment workflows for phased releases, including bake-period monitoring, feature flag management, and validation across global datacenter rollouts
  • Reviewing design documents, product-requirement documents and producing actionable SRE input on operational risks, capacity implications, Day-2 concerns, and product strategy gaps
  • Building automation and tooling using Python or Go that reduces operational toil and improves team-wide operational capability

Do what you love

To be successful in this role you will:

  • 8+ years of experience in SRE, infrastructure engineering, or platform engineering, working with large-scale distributed systems
  • Demonstrate deep expertise with Linux networking fundamentals and diagnosing at the packet level using tcpdump, netstat, and similar tools
  • Have hands-on experience with L4/L7 load balancing technologies covering configuration, health checking, high availability, and failure modes at scale
  • Show a track record of defining SLO/SLI frameworks, building observability platforms from scratch, and running incident management processes at scale
  • Demonstrate expertise in Kubernetes and containerization at scale including workload scheduling, networking, resource management, and operating stateful or network-intensive workloads in a cluster environment
  • Build automation and tooling using Python or Go, with infrastructure-as-code experience (SaltStack, Ansible, or Terraform) and deployment safety instincts

About us

At Akamai, we make life better for billions of people, trillions of times a day.
Whether you're streaming live events, scrolling social media, watching your favorite series, or managing your savings, we're the engine behind the scenes. We provide the world's most distributed platform from Cloud to Edge to help the giants of the digital world work faster and stay more secure, making the internet a better experience for everyone.
Our focus is simple:
Cloud and Edge: Running apps closer to users for instant performance.
Security: Neutralizing threats before they ever reach your data.
Content Delivery: Scaling the world's biggest moments without a glitch.
AI: Enabling our customers to build, secure, and scale AI apps on the world's most distributed cloud platform.
At Akamai, we don't just support the internet; we power and protect it, because behind every great digital experience is a massive hidden challenge. And we're the ones who solve it. When millions of people hit play or pay, Akamai ensures it just works.

Benefits at Akamai: We support your health, well-being, finances, and life beyond work. See our benefits.

FlexBase adapts to your job's needs

Akamai's FlexBase program is yet another way we show our commitment to providing employees with an exceptional workplace experience. It's not about telling employees where to work; it's about supporting employees to do their best work.
We trust our incredible employees to work in ways that suit them best: at home, in an office, or a combination of both.

Connect with us on social and see what life at Akamai is like!

Compensation

Akamai is committed to fair and equitable compensation practices. The base salary for this position ranges from 120,400 - 216,600 CAD/year; a candidate’s salary is determined by various factors including, but not limited to, relevant work experience, skills, certifications. The compensation package may also include incentive compensation opportunities in the form of annual bonus or incentives, equity awards and an Employee Stock Purchase Plan (ESPP). Akamai provides industry-leading benefits including healthcare, RRSP, company holidays, vacation (in the form of PTO), sick time, family friendly benefits including employee assistance program including a focus on mental and financial wellness; Eligibility requirements apply.

Skills Required

  • 8+ years of experience in SRE, infrastructure engineering, or platform engineering with large-scale distributed systems.
  • Deep expertise with Linux networking fundamentals and packet-level diagnostics using tcpdump, netstat, and similar tools.
  • Hands-on experience with L4/L7 load balancing technologies including configuration, health checking, high availability, and failure modes at scale.
  • Track record of defining SLO/SLI frameworks, building observability platforms from scratch, and running incident management processes at scale.
  • Expertise in Kubernetes and containerization at scale, including workload scheduling, networking, resource management, and operating stateful or network-intensive workloads.
  • Experience building automation and tooling using Python or Go, and infrastructure-as-code experience (SaltStack, Ansible, or Terraform).

Akamai Technologies Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Akamai Technologies and has not been reviewed or approved by Akamai Technologies.

  • Leave & Time Off Breadth Unlimited PTO in the U.S., wellness days, paid volunteering time, and the FlexBase model contribute to broad time‑off flexibility. These practices are described as supporting strong work‑life balance.
  • Parental & Family Support Generous paid parental leaves, paid family care leave, subsidized backup childcare, and inclusive family‑building benefits via Carrot form a comprehensive family support suite. Coverage spans fertility support through caregiving resources.
  • Retirement Support A 401(k) with a substantial company match and immediate vesting, plus Roth and Mega Backdoor Roth options, underpin long‑term savings. These features provide reliable long‑term savings opportunities.

Akamai Technologies Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Cambridge, MA
10,285 Employees
Year Founded: 1998

What We Do

At Akamai, we make life better for billions of people, billions of times a day. Every moment, billions of people, all over the world, are using the internet to shop, play games, look after finances, learn remotely, share videos, connect across the world, and so much more. These life-shaping digital experiences wouldn’t be possible without Akamai. We power and protect life online. It’s an extraordinary mission, and our global teams achieve it by solving the toughest challenges, and turning the impossible into the possible. With the world’s most distributed compute platform — from cloud to edge — we make it easy for businesses to develop and run applications, while we keep experiences closer to users and threats farther away. That’s why innovative companies worldwide choose Akamai to build, deliver, and secure their digital experiences. Thanks to our world’s most distributed platform for cloud computing, security, and content delivery. Akamai keeps applications and experiences closer and threats farther away. Devoted, determined problem-solvers who share a passion for technology, we’re always pushing ground-breaking ideas and driving innovation. Do you want to power and protect life online, by solving the toughest challenges with us? Be part of an amazing team!

Why Work With Us

Our people are devoted, determined problem-solvers who share a passion for technology. We find solutions through persistence, and push ground-breaking ideas forward with urgency and courage. This creates an environment where we harness creative energy and drive innovation. We are inclusive and diverse, and we love to collaborate across the world.

Gallery

Gallery

Similar Jobs

Remote
Canada
600 Employees
163K-194K Annually

Vertafore Logo Vertafore

Intern - Documentation Specialist

Information Technology • Insurance • Software
Remote or Hybrid
Montréal, QC, CAN
2372 Employees
18-18 Hourly

Dropbox Logo Dropbox

Director, Product Design

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Remote
Canada
2500 Employees
217K-293K Annually

NBCUniversal Logo NBCUniversal

Scientist

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
Montréal, QC, CAN
68000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account