The Role
The Site Reliability Engineer will ensure system reliability, optimize performance, develop observability, and collaborate with teams to enhance service scalability.
Summary Generated by Built In
About HighLevel:
HighLevel is an AI powered, all-in-one white-label sales & marketing platform that empowers agencies, entrepreneurs, and businesses to elevate their digital presence and drive growth. We are proud to support a global and growing community of over 2 million businesses, comprised of agencies, consultants, and businesses of all sizes and industries. HighLevel empowers users with all the tools needed to capture, nurture, and close new leads into repeat customers. As of mid 2025, HighLevel processes over 4 billion API hits and handles more than 2.5 billion message events every day. Our platform manages over 470 terabytes of data distributed across five databases, operates with a network of over 250 microservices, and supports over 1 million hostnames.
Our People
With over 1,500 team members across 15+ countries, we operate in a global, remote-first environment. We are building more than software; we are building a global community rooted in creativity, collaboration, and impact. We take pride in cultivating a culture where innovation thrives, ideas are celebrated, and people come first, no matter where they call home.
Our Impact
As of mid 2025, our platform powers over 1.5 billion messages, helps generate over 200 million leads, and facilitates over 20 million conversations for the more than 2 million businesses we serve each month. Behind those numbers are real people growing their companies, connecting with customers, and making their mark - and we get to help make that happen.
About the Role:
We are looking for a Site Reliability Engineer (SRE) to join our team and help ensure the availability, performance, and scalability of our critical systems. You will work closely with development and operations teams to automate processes, enhance system reliability, and improve observability.
Responsibilities:
- Develop and improve observability using monitoring, logging, tracing, and alerting tools (Prometheus, Grafana, ELK, OpenTelemetry, etc.).
- Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA to prevent future issues.
- Collaborate with developers to enhance application reliability, scalability, and performance.
- Drive cost optimisation efforts in cloud environments.
- Experience with multiple databases Mongo, Redis, ES, Queue based etc
Requirements:
- Experience: 4+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
- Cloud Expertise: Hands-on experience with GCP and AWS.
- Infrastructure as Code (IaC): Terraform, Helm, or equivalent tools.
- Containerization & Orchestration: Docker, Kubernetes (GKE).
- Observability: Experience with Prometheus, Grafana, ELK, OpenTelemetry, or similar monitoring/logging tools.
- Programming/Scripting: Proficiency in Python, Bash, or Shell scripting. Basic understanding of API parsing and JSON manipulation.
- CI/CD Pipelines: Hands-on experience with Jenkins, GitHub Actions, ArgoCD, or similar tools.
- Incident Management: Experience with on-call rotations, SLOs, SLIs, SLAs, Escalation Policies, and incident resolution.
- Databases: Experience in monitoring Mongo, Redis, ES, Queue based etc
EEO Statement:
The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.
#LI-Remote #LI-RS1
Top Skills
Argocd
AWS
Bash
Docker
Elk
GCP
Github Actions
Grafana
Helm
Jenkins
Kubernetes
Opentelemetry
Prometheus
Python
Shell Scripting
Terraform
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company
What We Do
https://www.gohighlevel.com/quick-links
One white-labeled marketing app to rule them all. HighLevel is everything your business needs to succeed!
Capture leads using our landing pages, surveys, forms, calendars, inbound phone system & more!
Automatically message leads via voicemail, forced calls, SMS, emails, FB Messenger & more!
Use our built in tools to collect payments, schedule appointments, and track analytics

.jpg)







