Senior Principal Site Reliability Engineer

Reposted 24 Days Ago
Be an Early Applicant
2 Locations
In-Office or Remote
Senior level
Cloud • Security • Software • Cybersecurity
Akamai powers and protects life online by solving the toughest challenges and turning the impossible into the possible.
The Role
As a Senior Principal Site Reliability Engineer, you will define reliability architecture for AI products, mentor engineers, and influence technical decisions while developing automation for production-grade environments.
Summary Generated by Built In

Do you want to shape the future of AI infrastructure?

Ready to define the reliability architecture for AI products, from GPU compute to globally distributed inference, ensuring performance and reliability at scale.

Join the Akamai AI Team

Akamai's Cloud Technology Group offers AI infrastructure globally. The GPU compute platform provides dedicated resources, from single GPUs to full clusters. These resources support training, simulation, inference, and various workloads. Site Reliability Engineering is integrated early to guarantee production-grade reliability and performance.

Partner with the best

As Senior Principal SRE for AI, this role involves setting technical direction for building, operating, and scaling AI services. Responsibilities include writing code, designing systems, and solving complex reliability issues. Additionally, mentoring team members, defining technical standards, and promoting engineering best practices are essential. Success depends on achieving influence with product engineering teams through exceptional technical expertise.

As a Principal Site Reliability Engineer, you will be responsible for:

  • Defining the reliability architecture for Akamai's AI compute and platform services, including SLO frameworks, fault tolerance patterns, and capacity planning models
  • Hands-on building of automation and tooling that reduces operational toil and scales the SRE team's impact
  • Designing observability strategy by leveraging Akamai's existing platform to build the telemetry, dashboards, alerts, and GPU-specific monitoring needed for AI workloads
  • Architecting deployment safety practices including progressive rollouts, canary analysis, rollback automation, and change safety processes
  • Influencing product engineering architecture and design decisions, embedding reliability into the development lifecycle at the system level
  • Mentoring and elevating other SREs through design reviews, code reviews, and hands-on problem-solving, setting the technical bar for the team

Do what you love

To be successful in this role you will:

  • Have extensive experience in SRE, platform engineering, and/or infrastructure engineering, with demonstrated impact at a principal or staff level
  • Demonstrate extensive Kubernetes expertise, managing autoscaling, resource scheduling, and container orchestration for handling compute-intensive workloads effectively.
  • Develop programming expertise in Python or Go, focusing on creating automation and tooling for production-grade environments.
  • Demonstrate expertise in programming with Python and/or Go, coupled with experience creating production-grade automation, tooling, and platform services.
  • Influence cross-team technical decisions, mentor engineers, elevate technical standards, and collaborate effectively with product engineering teams.
  • Gain experience in AI/ML infrastructure, model deployment, or GPU workloads to enhance technical expertise and practical understanding.
  • Design reliability into innovative platforms at the system level while building influence with product engineering teams through technical expertise.

Work in a way that works for you

FlexBase, Akamai's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply.
Learn what makes Akamai a great place to work

Connect with us on social and see what life at Akamai is like!

We power and protect life online, by solving the toughest challenges, together.

At Akamai, we're curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you'll thrive here.

Working for you

At Akamai, we will provide you with opportunities to grow, flourish, and achieve great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding all aspects of your life:

  • Your health
  • Your finances
  • Your family
  • Your time at work
  • Your time pursuing other endeavors

Our benefit plan options are designed to meet your individual needs and budget, both today and in the future.

About us

Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away.

Join us

Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!
#LI-Remote

Skills Required

  • Extensive experience in SRE, platform engineering, and/or infrastructure engineering
  • Extensive Kubernetes expertise
  • Programming expertise in Python or Go
  • Experience in AI/ML infrastructure, model deployment, or GPU workloads

Akamai Technologies Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Akamai Technologies and has not been reviewed or approved by Akamai Technologies.

  • Leave & Time Off Breadth Unlimited PTO in the U.S., wellness days, paid volunteering time, and the FlexBase model contribute to broad time‑off flexibility. These practices are described as supporting strong work‑life balance.
  • Parental & Family Support Generous paid parental leaves, paid family care leave, subsidized backup childcare, and inclusive family‑building benefits via Carrot form a comprehensive family support suite. Coverage spans fertility support through caregiving resources.
  • Retirement Support A 401(k) with a substantial company match and immediate vesting, plus Roth and Mega Backdoor Roth options, underpin long‑term savings. These features provide reliable long‑term savings opportunities.

Akamai Technologies Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Cambridge, MA
10,285 Employees
Year Founded: 1998

What We Do

At Akamai, we make life better for billions of people, billions of times a day. Every moment, billions of people, all over the world, are using the internet to shop, play games, look after finances, learn remotely, share videos, connect across the world, and so much more. These life-shaping digital experiences wouldn’t be possible without Akamai. We power and protect life online. It’s an extraordinary mission, and our global teams achieve it by solving the toughest challenges, and turning the impossible into the possible. With the world’s most distributed compute platform — from cloud to edge — we make it easy for businesses to develop and run applications, while we keep experiences closer to users and threats farther away. That’s why innovative companies worldwide choose Akamai to build, deliver, and secure their digital experiences. Thanks to our world’s most distributed platform for cloud computing, security, and content delivery. Akamai keeps applications and experiences closer and threats farther away. Devoted, determined problem-solvers who share a passion for technology, we’re always pushing ground-breaking ideas and driving innovation. Do you want to power and protect life online, by solving the toughest challenges with us? Be part of an amazing team!

Why Work With Us

Our people are devoted, determined problem-solvers who share a passion for technology. We find solutions through persistence, and push ground-breaking ideas forward with urgency and courage. This creates an environment where we harness creative energy and drive innovation. We are inclusive and diverse, and we love to collaborate across the world.

Gallery

Gallery

Similar Jobs

Mondelēz International Logo Mondelēz International

Product Owner

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
2 Locations
90000 Employees
Remote or Hybrid
Kraków, Małopolskie, POL
1100 Employees

Mondelēz International Logo Mondelēz International

Digital Supply Chain Engineering Director

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
3 Locations
90000 Employees
143K-235K Annually

OpenX Technologies Logo OpenX Technologies

Test Automation Engineer

AdTech • Enterprise Web • Information Technology • Machine Learning • Marketing Tech • Sales
Easy Apply
Remote or Hybrid
Kraków, Małopolskie, POL
420 Employees
119-133 Hourly

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account