Akamai Technologies Jobs

Senior Site Reliability Engineer (Cloud and Networking) - Remote

Akamai Technologies

Senior Site Reliability Engineer (Cloud and Networking) - Remote

Reposted 21 Days Ago

Be an Early Applicant

2 Locations

In-Office or Remote

Senior level

Cloud • Security • Software • Cybersecurity

Akamai powers and protects life online by solving the toughest challenges and turning the impossible into the possible.

The Role

As a Senior Site Reliability Engineer, you'll ensure the reliability of cloud load balancers, lead incident responses, mentor engineers, and build automation tools.

Summary Generated by Built In

Do you want to own the reliability of cloud load balancing infrastructure that serves thousands of customers at global scale?

Are you a senior technical leader who can drive solutions across distributed teams while mentoring the engineers around you?

Join our Cloud Networking SRE Team

The Cloud Networking SRE team (CNETSRE) is part of Akamai's Infrastructure Engineering & Operations (IE&O) organization. We design, deploy, and manage the reliability of Akamai's core cloud networking products — including NodeBalancer, our production L4/L7 load balancer, and NLB (Network Load Balancer), our next-generation high-throughput L4 load balancing platform. These products are foundational to the Akamai Cloud Compute platform, serving customer workloads across dozens of global regions.

Partner with the best

As a Senior Site Reliability Engineer on the NodeBalancer and NLB stack, you'll own the operational reliability of two generations of load balancing infrastructure: the production NodeBalancer fleet and the next-generation NLB platform with its distributed forwarding architecture. You'll build and maintain observability frameworks, lead incident response for complex multi-service failures, drive safe deployment practices across phased global rollouts, and mentor SRE II engineers on the team. The load balancing platform is actively evolving — future iterations are expected to move toward container-orchestrated deployments, so your Kubernetes expertise will be directly relevant as this stack grows. You'll work closely with NodeBalancer Engineering, the Product Delivery Team, and peer SRE functions to shape how the NB/NLB stack evolves operationally.

As a Senior Site Reliability Engineer, you will be responsible for:

Owning the SRE lifecycle for NodeBalancer and Network Load Balancer — from design reviews and pre-rollout readiness assessments through production sign-off and ongoing reliability management
Designing and implementing SLO/SLI frameworks that reflect true customer experience for L4 and L7 load balancing services, and driving action when error budgets are at risk
Building and maintaining observability pipelines for NB/NLB infrastructure, including Prometheus metrics from load balancing components and system-level sources, and Grafana dashboards that enable rapid incident triage
Leading technical incident response for complex NB/NLB failures — BGP/VIP issues, failover failures, data plane degradations, and configuration problems — acting as the technical commander and driving root cause analysis and preventive follow-through
Developing and automating safe deployment workflows for phased NB/NLB releases, including bake period monitoring, feature flag management, and GO/NO-GO validation across global datacenter rollouts
Reviewing design documents, product requirement Documents and producing actionable SRE input on operational risks, capacity implications, Day-2 concerns, and product strategy gaps
Building automation and tooling using Python or Go that reduces operational toil and improves team-wide operational capability
Mentoring SRE II engineers on the NB team, providing hands-on technical guidance, code/config reviews, and raising the bar for the team's SRE practice
Participating in an on-call rotation for NB/NLB production systems, responding to incidents and driving resolution for customer-facing load balancing infrastructure
Participate in a scheduled, daytime-only on-call rotation to spearhead technical incident response and resolve complex NB/NLB failures..

Do what you love

To be successful in this role you will:

Have extensive experience in SRE, platform engineering, or infrastructure engineering, working with large-scale distributed systems
Demonstrate deep expertise with Linux networking fundamentals — routing, BGP, nftables/iptables, ARP, VXLAN — and comfort diagnosing at the packet level using tcpdump, netstat, and similar tools
Have hands-on experience with L4/L7 load balancing technologies — including proxy-based or kernel-level load balancers — covering configuration, health checking, high availability, and failure modes at scale
Show a track record of defining SLO/SLI frameworks, building observability platforms from scratch, and running incident management processes at scale
Demonstrate expertise in Kubernetes and containerization at scale — including workload scheduling, networking (CNI, Services, ingress), resource management, and operating stateful or network-intensive workloads in a cluster environment
Build automation and tooling using Python or Go, with infrastructure-as-code experience (SaltStack, Ansible, or Terraform) and strong deployment safety instincts
Demonstrate 4+ years in SRE or infrastructure engineering, with at least 2 years at cloud scale

Learn what makes Akamai a great place to work

Connect with us on social and see what life at Akamai is like!

We power and protect life online, by solving the toughest challenges, together.

At Akamai, we're curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you'll thrive here.

Working for you

At Akamai, we will provide you with opportunities to grow, flourish, and achieve great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding all aspects of your life:

Your health
Your finances
Your family
Your time at work
Your time pursuing other endeavors

Our benefit plan options are designed to meet your individual needs and budget, both today and in the future.

About us

Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away.

Join us

Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!
#LI-Remote

Skills Required

Extensive experience in SRE, platform engineering, or infrastructure engineering with large-scale distributed systems
Deep expertise in Linux networking fundamentals
Hands-on experience with L4/L7 load balancing technologies
Experience defining SLO/SLI frameworks and managing incidents
Expertise in Kubernetes and containerization at scale
Experience building automation and tooling using Python or Go
4+ years in SRE or infrastructure engineering, including 2 years at cloud scale

Akamai Technologies Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Akamai Technologies and has not been reviewed or approved by Akamai Technologies.

Leave & Time Off Breadth — Unlimited PTO in the U.S., wellness days, paid volunteering time, and the FlexBase model contribute to broad time‑off flexibility. These practices are described as supporting strong work‑life balance.
Parental & Family Support — Generous paid parental leaves, paid family care leave, subsidized backup childcare, and inclusive family‑building benefits via Carrot form a comprehensive family support suite. Coverage spans fertility support through caregiving resources.
Retirement Support — A 401(k) with a substantial company match and immediate vesting, plus Roth and Mega Backdoor Roth options, underpin long‑term savings. These features provide reliable long‑term savings opportunities.

Learn more about Akamai Technologies's Compensation & Benefits →

Akamai Technologies Insights

What's It Like to Work at Akamai Technologies? Akamai Technologies Culture & Values Akamai Technologies Career Growth & Development What's the Work-Life Balance Like at Akamai Technologies? Akamai Technologies Leadership & Management Akamai Technologies Company Growth, Stability & Outlook

View all jobs at Akamai Technologies

View Akamai Technologies Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Cambridge, MA

10,285 Employees

Year Founded: 1998

What We Do

At Akamai, we make life better for billions of people, billions of times a day. Every moment, billions of people, all over the world, are using the internet to shop, play games, look after finances, learn remotely, share videos, connect across the world, and so much more. These life-shaping digital experiences wouldn’t be possible without Akamai. We power and protect life online. It’s an extraordinary mission, and our global teams achieve it by solving the toughest challenges, and turning the impossible into the possible. With the world’s most distributed compute platform — from cloud to edge — we make it easy for businesses to develop and run applications, while we keep experiences closer to users and threats farther away. That’s why innovative companies worldwide choose Akamai to build, deliver, and secure their digital experiences. Thanks to our world’s most distributed platform for cloud computing, security, and content delivery. Akamai keeps applications and experiences closer and threats farther away. Devoted, determined problem-solvers who share a passion for technology, we’re always pushing ground-breaking ideas and driving innovation. Do you want to power and protect life online, by solving the toughest challenges with us? Be part of an amazing team!

Why Work With Us

Our people are devoted, determined problem-solvers who share a passion for technology. We find solutions through persistence, and push ground-breaking ideas forward with urgency and courage. This creates an environment where we harness creative energy and drive innovation. We are inclusive and diverse, and we love to collaborate across the world.