Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
Cape Town, Western Cape, ZAF
In-Office
Mid level
Fintech • Insurance • Financial Services
The Role
Maintain and improve Azure-based infrastructure and applications to ensure reliability, scalability, and security. Monitor systems, respond to incidents, perform RCA, build observability and automation, collaborate with developers and SecOps, participate in on-call rotation, mentor junior engineers, and act as technical escalation and CAB approver.
Summary Generated by Built In

WHAT WE DO

We're Lula. We build innovative fintech products to help SMEs make cash flow. From instant access to funding to all-in-one business banking accounts and cutting-edge financial analysis tools, we're on it!

Our purpose is to help SMEs manage their business better, faster, simpler, Lula, so they can spend more time doing what they love.

Speaking of love, we’re looking for Lulas who love to make a difference to join our team and change the game.

OVERALL PURPOSE

We are seeking an experienced Site Reliability Engineer to join our team. The ideal candidate should have a deep understanding of Microsoft Azure, cloud computing, and distributed systems. As a Site Reliability Engineer, you will be responsible for monitoring, maintaining and improving our Azure-based infrastructure and applications, ensuring their reliability, scalability, and security as well as acting as the technical escalation to both Junior and Intermediate Site Reliability engineers and representing the Site Reliability team in CAB as approver. You’ll also play a key role in guiding reliability practices, mentoring the team, and improving how we operate.

Responsibilities will include: 

  • Monitor system health, alerts, and application behaviour, responding to incidents and contributing to root cause analysis and remediation
  • Triage and resolve service requests related to cloud infrastructure and applications in a timely manner
  • Build and continuously improve monitoring and alerting using Azure-native tooling (Azure Monitor, Log Analytics, KQL) to provide meaningful visibility into system performance
  • Analyze performance, reliability, and usage metrics to identify optimization opportunities and potential risks
  • Partner with our internal Developers and DevOps teams to build, monitor and manage highly available, reliable, scalable and resilient architectures with high levels of visibility on Azure
  • Partner with Microsoft to resolve complex remediation and improvement as required in our Azure environment
  • Identify gaps in logging, metrics, and tracing, and collaborate with developers to improve overall system visibility
  • Partner with our internal SecOps team to ensure the security of the Azure infrastructure and applications by implementing and enforcing security policies and best practices
  • Develop and maintain automation scripts and tools to streamline deployment and management of Azure services
  • Continuously research and evaluate new Azure features and services to optimise our infrastructure and improve our application development workflows
  • Participate in on-call rotation to provide 24/7 support for critical systems
  • Act as a monitoring resource on all Changes and Releases happening in your on-call rotation as is required.

THE COMPETENCIES WE’RE AFTER

  • Strong written and verbal communication skills
  • Ability to communicate complex technical concepts to non-technical stakeholders
  • Ability to work independently and as part of a team
  • A proactive, collaborative and high attention to detail approach to issues
  • A quick and hungry learner
  • Highly credible and trustworthy with an open and honest approach
  • Strong planning skills and ability to prioritise
  • Adaptable and flexible with resilience to change and ambiguity
  • Adaptable between proactive and reactive support in real time
  • Ability to mentor and grow others

THE SKILLS AND EXPERIENCE WE’RE LOOKING FOR

  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
  • 3-5 years experience in a Site Reliability Engineering, DevOps, or Software Engineering role
  • Strong understanding of Azure services such as Web Applications, Functions and Application Gateways
  • Experience working with cloud platforms (Azure preferred)
  • Experience with observability tooling
  • Experience in monitoring, logging and troubleshooting in Azure using App Insights,  Azure Monitor, Log Analytics, Logic Apps  and Query Performance measures in SQL Databases
  • Experience with automation tools such as PowerShell, Azure CLI and ARM templates
  • Strong troubleshooting and problem-solving skills
  • Excellent communication and collaboration skills to work with cross-functional teams
  • Familiarity with CI/CD pipelines and modern DevOps practices (e.g. GitHub Actions, Azure DevOps)
Nice to Have:
  • Experience with OpenTelemetry or vendor-neutral observability approaches
  • Experience with microservices or modular monolith architectures
  • Exposure to performance or load testing practices
  • Experience with tools such as Grafana and Prometheus or similar platforms

Skills Required

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
  • 3-5 years experience in Site Reliability Engineering, DevOps, or Software Engineering
  • Strong understanding of Azure services such as Web Applications, Functions and Application Gateways
  • Experience working with cloud platforms (Azure preferred)
  • Experience with observability tooling
  • Monitoring, logging and troubleshooting in Azure using Application Insights, Azure Monitor, Log Analytics, Logic Apps and SQL query performance measures
  • Experience with automation tools such as PowerShell, Azure CLI and ARM templates
  • Familiarity with CI/CD pipelines and modern DevOps practices (e.g., GitHub Actions, Azure DevOps)
  • Strong troubleshooting and problem-solving skills
  • Excellent communication and collaboration skills to work with cross-functional teams
  • Participate in on-call rotation to provide 24/7 support for critical systems
  • Act as technical escalation for Junior/Intermediate SREs and represent Site Reliability team in CAB as approver
  • Ability to mentor and grow others
  • Experience with OpenTelemetry or vendor-neutral observability approaches
  • Experience with microservices or modular monolith architectures
  • Exposure to performance or load testing practices
  • Experience with Grafana and Prometheus or similar platforms
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Cape Town
199 Employees
Year Founded: 2014

What We Do

Lula is an innovative, and human-focused FinTech company on a mission to help small businesses make cash flow. We want to make it fast, simple, Lula for people to run their business through a digital business bank account. We’re making business banking fast, easy & human

Similar Jobs

MoonPay Logo MoonPay

Senior Site Reliability Engineer

Blockchain • Fintech • Payments • Cryptocurrency • Web3
In-Office or Remote
5 Locations
244 Employees

Robin AI Logo Robin AI

Site Reliability Engineer

Artificial Intelligence • Legal Tech • Software
In-Office or Remote
Cape Town, Western Cape, ZAF
184 Employees

Robin AI Logo Robin AI

Site Reliability Engineer

Artificial Intelligence • Legal Tech • Software
In-Office or Remote
Cape Town, Western Cape, ZAF
184 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account