AMCS Group

Site Reliability Engineering Technical Lead

Posted 9 Days Ago

Be an Early Applicant

Dublin, IRL

In-Office

Senior level

Artificial Intelligence • Cloud • Mobile

The Role

Lead SRE/DevOps efforts to ensure reliability, scalability, and security across multi-cloud environments. Define SLIs/SLOs, lead incident response and postmortems, evolve observability (Prometheus/Grafana/OpenTelemetry), drive automation to reduce toil, optimize cost and performance, apply AI/LLM for ops, and provide architectural oversight and mentoring.

Summary Generated by Built In

Sustainability that means business

Who we are:

Sustainability software specialist, AMCS, is headquartered in Ireland, with offices in Europe, the USA, and Australasia. With over 1,300 highly-skilled employees across 22 countries, we specialize in delivering technology solutions to facilitate a carbon neutral future.

What we do:

Our innovative SaaS solutions increase efficiency and boost sustainability in resource-intensive industries. Over 5,000 customers across 23 countries already benefit from our Performance Sustainability software, ensuring we deliver practical solutions for improved profitability and environmental resilience across the globe.

Our people

AMCS offers team members more than just a job, but an opportunity to map out a career with a company that is growing, evolving and setting out new ways of working that are having a positive impact on the world around us. AMCS was established in Ireland and holds onto those local roots and ‘start-up’ mentality with a culture of connection. Connection to our work, our customers, our colleagues and our community that creates a working environment that fosters openness, collaboration and creativity.

Job Description:

We are seeking a highly skilled and motivated DevOps/SRE Tech Lead to join our dynamic engineering team. The ideal candidate will have a deep understanding of cloud technologies, a strong technical background and a passion for driving operational excellence. As a Tech Lead, you will not only mentor and guide our DevOps engineers but also participate in architectural and key decision-making forums regarding our infrastructure and application development processes ensuring a focus is always on the reliability of our systems and centered on positive customer experience. You will collaborate with cross-functional teams to ensure the reliability, scalability, and security of our systems and infrastructure.

Key Responsibilities:

Build SLIs, SLOs, and SLAs: Partner with development and business teams to define indicators and objectives that reflect real customer experience
Incident Response: Lead through complex incidents and continuously improve how quickly we detect, diagnose, and resolve issues — sharpening alerting, tooling, and on-call practices to shorten MTTD and MTTR over time.
Evolve Monitoring and Observability Stack: Consistently improve the observability stack (Prometheus, Grafana, Mimir, Loki, Tempo, OpenTelemetry) with a customer-centric lens leading our operations to be more effective
Drive RCAs and Postmortems: Run blameless root cause analyses and postmortems that turn incidents into durable improvements, closing the developer and operations loop
High Availability & Performance: Ensure platform availability and responsiveness meet customer expectations. Identify and remove performance bottlenecks before they impact customer
AI for Operations: Apply AI/LLM capabilities to incident triage, log/trace analysis, runbook execution, and anomaly detection to shorten MTTR and reduce on-call load.
Optimization for Cost: Right-size workloads, eliminate waste, and design for cost-efficient scaling across our cloud platforms (Azure, AWS, GCP) and container infrastructure (Docker, Kubernetes).

Toil Reduction: Build automated processes to reduce toil within SRE, such as remediation for known failure modes so the platform heals itself where possible, escalating to humans only when judgement is genuinely required.
Architectural Oversight: Participate in architectural design and decision-making processes, ensuring that design choices align with organizational goals and best practices.

What Success Looks Like:

High-Signal Alerting: Alerts are accurate and actionable — when something fires, it matters, and the team trusts it. Noise is actively driven down rather than tolerated.
Fewer Production Incidents: The number and severity of customer-impacting incidents trend down over time, as recurring failure modes are addressed at the root rather than worked around.
Tight Product–SRE Feedback Loop: Continuous, two-way feedback between product engineering and SRE — reliability concerns shape what gets built, and operational learnings flow back into product decisions.
Reduced Toil: Engineers spend less time on repetitive operational work and more time on improvements that compound — measured by what gets automated, eliminated, or self-healed away.

Qualifications:

Education: Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
Experience:5+ years of experience in DevOps, Site Reliability Engineering (SRE), or related fields, with at least 2 years in a leadership or mentoring role.
Cloud Technologies: Deep understanding of cloud providers (Azure, AWS, GCP) and hands-on experience with cloud architecture.
Architectural Design: Proven experience in providing architectural oversight, with a strong ability to make informed decisions that drive system performance and scalability.
Containerization: Proven experience with container orchestration platforms, particularly Kubernetes.
Scripting: Proficiency in scripting languages such as PowerShell, Python or Bash.
Monitoring and Logging: Familiarity with monitoring and logging tools like Prometheus, Grafana, and the Grafana stack.
Automation Tools: Experience with automation tools such as Ansible, Terraform, or Chef.
Soft Skills: Strong leadership qualities, excellent communication skills, and a collaborative mindset.

Preferred Qualifications:

Experience with CI/CD pipelines and relevant tools (Azure DevOps, Jenkins, GitLab CI, CircleCI, etc.).
Kubernetes certification (CKA, CKAD) and/or cloud certifications (Azure, AWS, GCP) are highly desirable.
Knowledge of security best practices and compliance standards in cloud environments.
Familiarity with Agile methodologies and project management tools.

#LI-JA1

Skills Required

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
5+ years experience in DevOps, Site Reliability Engineering, or related fields.
At least 2 years in a leadership or mentoring role.
Hands-on experience with cloud providers: Azure, AWS, GCP.
Proven experience with containerization and orchestration (Docker, Kubernetes).
Proficiency in scripting languages such as PowerShell, Python, or Bash.
Familiarity with monitoring and observability stack: Prometheus, Grafana, Mimir, Loki, Tempo, OpenTelemetry.
Experience with automation and IaC tools such as Ansible, Terraform, or Chef.
Proven experience providing architectural oversight and cloud architecture design.
Experience defining/implementing SLIs, SLOs, SLAs, incident response, RCA/postmortems, and reducing operational toil.
Experience applying AI/LLM capabilities to operations (incident triage, log/trace analysis, runbook automation).
Experience with CI/CD tools (Azure DevOps, Jenkins, GitLab CI, CircleCI).
Kubernetes certification (CKA, CKAD) and/or cloud certifications (Azure, AWS, GCP).
Knowledge of cloud security best practices and compliance standards.
Familiarity with Agile methodologies and project management tools.

View all jobs at AMCS Group

View AMCS Group Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

Limerick, County Limerick

828 Employees

Year Founded: 2004

What We Do

AMCS is a global leader of integrated software and vehicle technology for the environmental, waste, recycling and resource industries. We help our customers reduce their operating costs, increase asset utilization, optimize margins and improve customer service. Our enterprise software and SaaS solutions deliver digital innovation to the emerging circular economy around the world. We are AMCS, Digital ways to a cleaner world