Join our mission to grow and transform the subscription economy by simplifying subscription service delivery.
Infiterra enables IT distributors, Managed Service Providers (MSPs), and telcos to succeed in the subscription economy. Our subscription commerce platform automates and unifies subscription workflows - from quote to bill- driving operational efficiency, billing accuracy, and scalable growth.
Recognized as a global leader in subscription commerce, Infiterra combines innovation, performance excellence, and trusted expertise to help partners transform and grow.
About the RoleAs our platform continues to scale globally, reliability becomes even more critical. We are evolving toward a more Azure-native architecture, expanding our Kubernetes (AKS) footprint, and strengthening our operational maturity. 2026 is a foundational year for elevating uptime standards, observability, and incident management discipline.
To support this evolution, we are establishing a dedicated Site Reliability Engineering team within our Platform Infrastructure team to bring focused ownership of uptime, resilience, and production excellence.
We foster a collaborative, growth-oriented culture where you can contribute meaningfully while being part of a dynamic, forward-thinking team.
As our Site Reliability Engineer, you will play a key role in maintaining and improving the reliability and stability of our Azure-based SaaS platform, ensuring our services run smoothly. You’ll work hands-on with AKS clusters and infrastructure, strengthen monitoring and observability, and build the processes that prevent incidents before they happen.
This role is not a pipeline-only DevOps, helpdesk, or networking position; it’s a chance to make a tangible impact on production systems and play a central role in keeping our platform running at its best.
We’re looking for someone with experience in live production environments, who has handled real incidents, and who understands the accountability and ownership required to maintain uptime and reliability.
What You Will Be DoingReliability & Operations
Maintain and continuously improve production uptime, supporting our ≥99.9% target for 2026.
Monitor systems proactively and respond effectively to production incidents.
Drive improvements in MTTR (Mean Time to Resolution).
Perform structured root cause analysis and contribute to long-term preventive actions.
Participate in an evolving on-call model as we mature toward structured production support.
Cloud & Infrastructure
Manage and optimize Azure infrastructure across compute, networking, and identity components.
Work hands-on with AKS clusters as part of our growing Kubernetes adoption.
Maintain networking components including load balancers and private endpoints.
Contribute to improving platform resilience and scalability as demand grows.
Observability & Automation
Design and improve observability practices, including metrics, logs, and alerting standards across production systems.
Contribute to and improve Infrastructure as Code practices (Terraform or similar), ensuring consistent and repeatable deployments.
Reduce manual operational effort through scripting and automation.
Collaboration
Work closely with DevOps to ensure smooth CI/CD integration and reliable production deployments.
Support Security initiatives related to infrastructure hardening.
Partner with DevOps on deployment reliability and configuration changes impacting production.
Experience & Background
3+ years working experience in cloud or production infrastructure roles (SRE, Cloud Engineer, or similar) ideally in a SaaS company with cloud infrastructure (Azure preferred)
Hands-on experience with Windows and Linux server systems
Proven experience managing monitoring/observability tools in live environments
Experience with handling production incidents, troubleshooting, responding, and improving processes
Exposure to Kubernetes (AKS ideally) and Infrastructure as Code (Terraform or similar)
Technical Skills
Strong knowledge of Azure services, including compute, networking, and IAM
Familiarity with relational databases in production environments (basic troubleshooting level)
Kubernetes basics and networking fundamentals
Familiarity with monitoring platforms: Azure Monitor, Application Insights, Prometheus, Datadog, etc.
CI/CD familiarity (experience with DevOps pipelines is a plus)
Comfortable scripting in PowerShell, Bash, Python, or similar
Soft Skills & Approach
Strong communication & collaboration: Work effectively with DevOps, Security, and other stakeholders to ensure reliable operations.
Calm and effective under pressure: Maintain composure during incidents while coordinating resolution.
Problem-solving & troubleshooting: Analyze complex application and infrastructure issues and resolve them efficiently.
Adaptability & learning mindset: Quickly learn new tools, technologies, and processes as the platform evolves.
Proactive ownership: Identify areas for improvement, propose actionable solutions, and follow through to completion.
Work-from-anywhere scheme (travel and work)
Flexible working hours
Health and life insurance program
Learning & development budget
A passionate, international, and supportive team
If you feel you’re a great fit, please apply!
We’d love to hear from you!
All applications will be treated with confidentiality.
Please note that due to the high volume of CVs received, only candidates who are a good fit will be contacted for an interview.
As part of our commitment to diversity in the workforce, Infiterra is dedicated to Equal Employment Opportunity, ensuring that all individuals are treated with respect and consideration without regard to race, color, national origin, ethnicity, gender, disability, sexual orientation, gender identity, or religion.
Top Skills
What We Do
Infiterra helps IT Distributors and MSPs transform and grow with a uniquely adaptable platform built for subscription commerce.
By connecting quote-to-bill processes and enabling highly configurable subscription workflows, Infiterra empowers businesses to automate, simplify, and scale—acting as an extension of their teams.








