Senior Site Reliability Engineer (AI-Native)

Posted 2 Days Ago
Be an Early Applicant
Saint Julian's, MLT
In-Office
Senior level
Digital Media • Marketing Tech • Software
The Role
Lead reliability, performance, and security of web platforms while enhancing automation and AI-assisted operations. Own critical production environments and mentor teams.
Summary Generated by Built In

Paradise Media is a fast-growing performance marketing company behind some of the most successful affiliate and iGaming brands in the world. We run a global network of high-authority sites across casino, sports, and entertainment built on data, experimentation, and top-tier SEO. 


We’re a private company with strong capital reserves and no outside investors, making us a stable, independent, and fast-moving place to grow your career. You’ll work directly with the CEO and leadership team, have a real voice in strategy, and see your ideas go live fast. 


We’re scaling quickly to become one of the largest privately-owned companies in iGaming. A team where smart, driven people can have a massive impact and build something enduring.



About the role

We are seeking a Senior AI-Native Site Reliability Engineer to lead the reliability, performance, security, automation, and operational maturity of a growing portfolio of high-performing web platforms and digital products.


This role is ideal for a pragmatic senior reliability engineer who can operate and improve production systems, automate repetitive work, use AI safely to accelerate operations, understand performance and security deeply, and communicate clearly during incidents.


You will combine senior-level SRE, DevOps, infrastructure, security, and platform engineering expertise with a modern AI-first approach to operations. You will be expected not only to maintain systems, but to improve how they are designed, monitored, deployed, secured, and operated.

The successful candidate will be comfortable owning critical production environments across varied technology stacks, leading incident response, improving platform resilience, mentoring others, reducing operational toil through automation, and using AI tools responsibly to accelerate analysis, documentation, monitoring, debugging, remediation, and continuous improvement.


Roles & Responsibilities: 

Reliability & Operational Ownership

  • Own uptime, performance, scalability, and resilience of production web platforms and supporting infrastructure.
  • Define and improve SLIs, SLOs, error budgets, HA, fault tolerance, DR, and graceful degradation.
  • Lead capacity planning, identify single points of failure, and act as senior technical owner during high-severity incidents.

Performance Engineering & Scalability

  • Lead optimization across application, infrastructure, database, caching, CDN, and edge layers (Redis, Varnish, Cloudflare or similar).
  • Establish benchmarks, regression checks, dashboards; reduce technical bloat across code, dependencies, assets, and infrastructure.
  • Align performance work with SEO, product, and commercial impact.

AI-Native Operations & Automation

  • Lead safe, practical AI-assisted workflows for log analysis, incident investigation, runbook creation, monitoring, security triage, and postmortems.
  • Automate repetitive ops via scripts, IaC, and AI-assisted tooling; build anomaly detection, alert triage, and operational reporting workflows.
  • Create reusable prompts, playbooks, and templates; define guardrails for data sensitivity, access control, human approval, and auditability.

Monitoring, Observability & Incident Management

  • Own monitoring/alerting across apps, infra, databases, caches, queues, CDNs, cloud services, and critical user journeys.
  • Design actionable dashboards and alerts that reduce noise and improve MTTD/MTTR.
  • Lead incident response, RCA, postmortems, and preventive actions; mentor on troubleshooting and calm communication under pressure.

Security, Resilience & Platform Hardening

  • Own production security posture: WAF, SSL, vulnerability management, malware/bot mitigation, threat detection, and remediation.
  • Harden servers, databases, cloud, containers, CI/CD, secrets, and production access; manage secure dependency and patching processes.
  • Maintain backup, recovery, and DR practices; contribute to security incident response, containment, and prevention.

Infrastructure, Cloud & Platform Engineering

  • Design and operate hosting/runtime environments across varied stacks (web/app servers, databases, caches, queues, containers, cloud).
  • Automate backups, updates, deployments, provisioning, and health checks using Ansible, Terraform, Docker, Kubernetes, Jenkins, GitHub Actions, or similar.
  • Support AWS, GCP, Azure, or modern managed hosting; set infrastructure standards balancing reliability, security, performance, and cost.

DevOps, Release Engineering & Developer Enablement

  • Lead CI/CD design, staging environments, rollback strategies, progressive delivery, and deployment observability.
  • Partner with developers to embed reliability, performance, and security into the SDLC; build tooling and runbooks for safer shipping.

Documentation, Collaboration & Technical Leadership

  • Maintain runbooks, troubleshooting guides, architecture notes, and operational playbooks (AI-assisted where useful, technically validated).
  • Act as senior technical partner to engineering, product, SEO, and business stakeholders; mentor engineers and shape ops standards.

Requirements:

Preferred Experience

  • 6+ years in SRE, DevOps, Infrastructure, Platform, or Security Engineering.
  • Operating high-traffic web platforms, SaaS, SEO/content-heavy, affiliate, publishing, media, or e-commerce environments.
  • Cloudflare, edge caching, WAF, CDN optimization, bot mitigation; AI-assisted ops or agentic engineering workflows.
  • Leading high-severity incident response; defining SLOs, postmortems, runbooks; FinOps / cloud cost optimization.
  • Certifications in AWS, GCP, Azure, Linux, Kubernetes, or security are a plus.


Required

  • Senior-level experience in SRE, DevOps, infrastructure, platform engineering, or production operations.
  • Proficiency in Python, Bash, PHP, JavaScript/TypeScript, Go, or similar; strong Linux server administration.
  • Experience with web/app servers, databases, caches, queues, CDNs, cloud (AWS/GCP/Azure), and production traffic flows.
  • Strong Git, CI/CD, deployment automation, rollback, and release management; solid DNS, SSL, networking, and load balancing fundamentals.
  • Proven ability to troubleshoot complex production issues using logs, metrics, traces, and profiling—and to own systems without close supervision.

AI-Native Skills

  • Practical use of AI for debugging, documentation, scripting, analysis, and workflow automation, with strong judgment on validation.
  • Ability to design safe, human-in-the-loop AI workflows and reusable prompts/playbooks; sound judgment on privacy, access, and data sensitivity.

Performance & Observability

  • Hands-on with Datadog, New Relic, Grafana, Prometheus, Cloudflare Analytics, OpenTelemetry, Lighthouse, WebPageTest, or similar.
  • Strong grasp of caching, DB tuning, asset optimization, front-end and backend performance, edge delivery, SLOs/SLIs.

Security & Resilience

  • Production security practices: access control, WAF, vulnerability management, secrets, patching, incident response.
  • Backup strategy, recovery testing, DR planning; bot mitigation, dependency risk, malware detection, threat monitoring.

Automation & DevOps

  • Ansible, Terraform, Jenkins, Docker, Kubernetes, GitHub Actions; IaC, containerization, orchestration, configuration management.

Communication & Leadership

  • Calm incident leadership; clear technical communication to technical and non-technical stakeholders; mentoring and knowledge sharing.


Success in This Role Looks Like

  • Platforms are faster, more reliable, and more secure; monitoring is actionable and incidents are managed calmly with meaningful follow-up.
  • Manual work shrinks through automation and AI-assisted workflows; developers ship more safely; risks are caught before they impact the business.

Our Benefits:

We offer a competitive salary, and the opportunity to work with a talented and passionate team in a fast-paced, dynamic environment.

Skills Required

  • 6 years in SRE, DevOps, Infrastructure, Platform, or Security Engineering
  • Senior-level experience in SRE, DevOps, infrastructure, platform engineering, or production operations
  • Proficiency in Python, Bash, PHP, JavaScript/TypeScript, Go, or similar
  • Strong Linux server administration
  • Experience with web/app servers, databases, caches, queues, CDNs, cloud (AWS/GCP/Azure)
  • Proven ability to troubleshoot complex production issues
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Juan
146 Employees
Year Founded: 2019

What We Do

Some companies merely exist in the digital space. We create impactful digital marketing campaigns that empower consumers to make informed choices on the products that shape their lives. At the same time, we help developing companies reach their target audiences in a cost-effective manner, allowing them to grow as quickly as possible. By staying on top of emerging trends in the industry, we ensure that your content always has the maximum impact, for both your business and your customers.

Similar Jobs

bet365 Logo bet365

Data Insights Analyst

Digital Media • Gaming • Software • Esports • Automation
In-Office
Tas-Sliema, MLT
10000 Employees

bet365 Logo bet365

CRM Manager (Sports)

Digital Media • Gaming • Software • Esports • Automation
In-Office
Tas-Sliema, MLT
10000 Employees

bet365 Logo bet365

Cleaner, Night Shift

Digital Media • Gaming • Software • Esports • Automation
In-Office
Tas-Sliema, MLT
10000 Employees
23K-23K Annually

bet365 Logo bet365

Danish Language Customer Account Advisor (Hybrid)

Digital Media • Gaming • Software • Esports • Automation
Hybrid
Tas-Sliema, MLT
10000 Employees
36K-36K Annually

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account