Senior SRE/DevOps (Platform Tribe)

Reposted 5 Hours Ago
Be an Early Applicant
2 Locations
Remote
Senior level
Gaming • Software
The Role
The Senior SRE/DevOps Engineer will ensure system reliability in high-load environments, manage infrastructure using Kubernetes and Terraform, respond to incidents, and collaborate with teams to optimize performance and scalability.
Summary Generated by Built In

About the Role

We’re looking for a Senior SRE / DevOps Engineer to join our Platform Tribe - a lean & senior team where ownership is high and expectations are even higher. This is a deeply hands-on role at the core of a high-traffic system, where you’ll be directly responsible for maintaining reliability, performance, and stability in a fast-paced environment.

You’ll be working on real-time production challenges, handling incidents, managing alerts, and being part of a critical on-call rotation. This role requires resilience, strong decision-making under pressure, and a proactive mindset to continuously improve systems operating at scale.

If you thrive in high-load environments, enjoy solving complex production issues, and want to have a direct impact on systems used by millions - this is the place for you.

Key Responsibilities

  • Own system reliability by actively monitoring platform health, managing alerts, and responding to incidents in real time

  • Participate in 24/7 on-call rotations, taking full ownership of production stability in a high-traffic (5–7k RPS) environment

  • Investigate incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence

  • Build and continuously improve monitoring, alerting, and observability across the Kubernetes (EKS) ecosystem

  • Deploy, manage, and optimise infrastructure using Terraform, Helm, and GitOps tools (Flux/ArgoCD)

  • Drive automation and proactively improve system resilience, reducing manual intervention and recurring issues

  • Maintain and evolve CI/CD pipelines and infrastructure-as-code practices

  • Collaborate closely with engineering teams to support deployments and minimise user impact in a live environment

  • Introduce and integrate new tools and technologies to enhance scalability, reliability, and performance

  • Handle environment-specific requests and ensure smooth day-to-day platform operations under constant load

Requirements

  • Strong hands-on experience with Kubernetes (deployment, scaling, troubleshooting) in high-load environments

  • Experience with GitOps tools such as FluxCD or ArgoCD

  • Proven experience in incident response, root cause analysis, and postmortems in production systems

  • Solid experience with AWS, Terraform, Docker, and CI/CD pipelines

  • Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, and logging stacks like ELK or CloudWatch

  • Strong understanding of networking concepts and protocols

  • Proficiency in at least one scripting language (e.g. Python, Go, Node.js)

  • Experience working with version control systems (Git)

  • Familiarity with incident management tools like PagerDuty, Opsgenie, or similar

  • Ability to operate effectively in a fast-paced, high-pressure environment with strong ownership and accountability

  • Proactive, resilient mindset with a focus on continuous improvement and system stability

What We Offer

  • Competitive Salary

  • Quarterly Bonuses

  • Unlimited Paid Time Off

  • Unlimited Paid Sick Leave

  • Remote & Flexible Working

  • Private Medical Insurance

  • Financial Support for Life Events

  • Professional Development Budget

  • International Exposure

  • Regular Company Events

*Benefits may vary depending on location and contractual agreement

Recruitment Process

1. HR Interview (30-45 min)

2. Technical interview (90 min)

4. Final Interview with C-level (60 min)

Skills Required

  • Strong hands-on experience with Kubernetes in high-load environments
  • Experience with GitOps tools such as FluxCD or ArgoCD
  • Proven experience in incident response and root cause analysis
  • Experience with AWS, Terraform, Docker, and CI/CD pipelines
  • Experience with monitoring and observability tools
  • Proficiency in at least one scripting language (Python, Go, Node.js)
  • Experience working with version control systems (Git)
  • Familiarity with incident management tools like PagerDuty
  • Ability to operate effectively in a fast-paced, high-pressure environment
  • Proactive mindset with continuous improvement focus
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Year Founded: 2012

What We Do

Playson is a distinguished supplier of iGaming technology, designing and developing premium online casino games. They operate in 26 regulated markets, partnering with over 200 operators worldwide to deliver engaging game mechanics and lasting entertainment.

Similar Jobs

Deepgram Logo Deepgram

Solutions Engineer

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
Remote
EU
150 Employees

Deepgram Logo Deepgram

Senior Solutions Architect

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
Remote
EU
150 Employees

Playson Logo Playson

Legal Counsel

Gaming • Software
Remote
European Union

Playson Logo Playson

Game Engineer

Gaming • Software
Remote
2 Locations

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account