Do you enjoy solving complex reliability challenges for cutting-edge technology?
Do you have a passion for automation and building systems that scale?
Join the Akamai Inference Cloud Team
The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design, implement, deploy and operate AI platforms that enable customers to run inference models and developers to create AI applications with unmatched performance, compliance, and economics.
Partner with the best
As a Senior SRE, responsibilities include owning reliability workstreams for Akamai's serverless inference platform, building automation and tooling, and contributing to architecture and operational decisions. Opportunities exist to take ownership of critical reliability problems end-to-end, partner with product engineering teams, and develop expertise in GPU infrastructure, Kubernetes at scale, and AI inference workloads.
As a Site Reliability Engineer, you will be responsible for:
- Building and maintaining observability for AI workloads, including telemetry, dashboards, alerts, SLO/SLI tracking, and driving improvements when targets are missed
- Writing automation and tooling to reduce operational toil, improve deployment safety, and accelerate incident response
- Integrating AI workloads into Akamai's existing incident management processes, building runbooks, participating in on-call rotations, and conducting blameless post-mortems
- Building and maintaining CI/CD integrations, deployment safety checks, and rollback automation
- Collaborating with product engineering teams to improve reliability, contribute to architecture decisions, and ensure operational readiness for product releases
- Contributing to capacity planning, autoscaling configuration, and workload scheduling for AI compute infrastructure
Do what you love
To be successful in this role you will:
- Demonstrate expertise in SRE, infrastructure, or platform engineering, managing large-scale distributed systems with extensive operational experience.
- Demonstrate expertise in Kubernetes and large-scale containerization systems.
- Define SLOs and work with observability tools like Prometheus, Grafana, and distributed tracing to enhance system monitoring.
- Demonstrate proficiency in Python or Go for automation, CI/CD pipelines, deployment safety, and infrastructure-as-code like Terraform.
- Interest in or experience with AI/ML infrastructure, model serving, or GPU workloads
- Resolve issues independently while maintaining accountability throughout the process.
- Demonstrate accountability for reliability, develop automation and monitoring, and collaborate effectively with an engineering team unfamiliar with SRE practices.
Work in a way that works for you
FlexBase, Akamai's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply.
Learn what makes Akamai a great place to work
We power and protect life online, by solving the toughest challenges, together.
At Akamai, we're curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you'll thrive here.
Working for you
At Akamai, we will provide you with opportunities to grow, flourish, and achieve great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding all aspects of your life:
- Your health
- Your finances
- Your family
- Your time at work
- Your time pursuing other endeavors
Our benefit plan options are designed to meet your individual needs and budget, both today and in the future.
About us
Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away.
Join us
Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!
#LI-Remote
Skills Required
- Expertise in SRE or platform engineering
- Experience managing large-scale distributed systems
- Expertise in Kubernetes and containerization
- Proficiency in Python or Go for automation
- Experience with observability tools like Prometheus and Grafana
Akamai Technologies Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Akamai Technologies and has not been reviewed or approved by Akamai Technologies.
-
Leave & Time Off Breadth — Unlimited PTO in the U.S., wellness days, paid volunteering time, and the FlexBase model contribute to broad time‑off flexibility. These practices are described as supporting strong work‑life balance.
-
Parental & Family Support — Generous paid parental leaves, paid family care leave, subsidized backup childcare, and inclusive family‑building benefits via Carrot form a comprehensive family support suite. Coverage spans fertility support through caregiving resources.
-
Retirement Support — A 401(k) with a substantial company match and immediate vesting, plus Roth and Mega Backdoor Roth options, underpin long‑term savings. These features provide reliable long‑term savings opportunities.
Akamai Technologies Insights
What We Do
At Akamai, we make life better for billions of people, billions of times a day. Every moment, billions of people, all over the world, are using the internet to shop, play games, look after finances, learn remotely, share videos, connect across the world, and so much more. These life-shaping digital experiences wouldn’t be possible without Akamai. We power and protect life online. It’s an extraordinary mission, and our global teams achieve it by solving the toughest challenges, and turning the impossible into the possible. With the world’s most distributed compute platform — from cloud to edge — we make it easy for businesses to develop and run applications, while we keep experiences closer to users and threats farther away. That’s why innovative companies worldwide choose Akamai to build, deliver, and secure their digital experiences. Thanks to our world’s most distributed platform for cloud computing, security, and content delivery. Akamai keeps applications and experiences closer and threats farther away. Devoted, determined problem-solvers who share a passion for technology, we’re always pushing ground-breaking ideas and driving innovation. Do you want to power and protect life online, by solving the toughest challenges with us? Be part of an amazing team!
Why Work With Us
Our people are devoted, determined problem-solvers who share a passion for technology. We find solutions through persistence, and push ground-breaking ideas forward with urgency and courage. This creates an environment where we harness creative energy and drive innovation. We are inclusive and diverse, and we love to collaborate across the world.
Gallery







