EdgeUno

Senior Site Reliability Engineer

Reposted 16 Hours Ago

Be an Early Applicant

Uberlândia, Minas Gerais, BRA

Hybrid

Senior level

Information Technology

The Role

As an SRE & AI Automation Engineer, you will enhance EdgeUno's cloud infrastructure by developing automation workflows, monitoring solutions, and AI-powered operational tools, while collaborating with various teams to establish reliability and observability practices.

Summary Generated by Built In

About EdgeUno

EdgeUno is a US-based technology infrastructure company headquartered in Miami, with a strong operational presence across Latin America, including Colombia, Brazil, Mexico, Argentina, Peru, and Ecuador. We enable digital businesses to scale with high performance and reliability by providing connectivity, IP Transit, private networks, data centers, bare metal, and cloud solutions to ISPs, hyperscalers, content providers, and global technology companies.

Through our own infrastructure platform and strategic interconnection with major global hubs, we deliver low latency, security, and operational resilience across the Americas and beyond. Our Cloud Engineering team is actively expanding EdgeUno's cloud product portfolio across our LATAM infrastructure footprint, with a strong focus on Kubernetes-based distributed cloud platforms, automation, observability, and AI-driven operational efficiency.

Role Overview

We are looking for a Senior Site Reliability Engineer to join our Cloud Engineering team, helping build the reliability, observability, and automation foundations that support EdgeUno's growing cloud infrastructure across Latin America.

This role combines Site Reliability Engineering with AI-powered operational automation. The ideal candidate should be comfortable operating production infrastructure environments while also building intelligent automation workflows, internal AI agents, and operational tooling that improve efficiency, reduce toil, and accelerate infrastructure delivery.

You will work closely with Cloud Engineering leadership and cross-functional teams to help scale modern infrastructure platforms, observability systems, Kubernetes operations, and AI-assisted workflows across EdgeUno's distributed environment.

Core Responsibilities

Site Reliability Engineering

Define and implement SLOs, SLIs, and reliability practices across cloud services
Build and maintain observability environments using Prometheus, Grafana, Alertmanager, Loki, and related tooling
Reduce operational toil through automation and infrastructure engineering initiatives
Support incident management processes, post-mortems, runbooks, and operational workflows
Collaborate on Kubernetes operations, cluster lifecycle management, and infrastructure scalability
Implement GitOps workflows using tools such as ArgoCD, Flux, and Infrastructure-as-Code frameworks

AI & Automation Engineering

Design and develop AI-powered operational tools and internal assistants
Build automation workflows integrating cloud APIs, ticketing systems, Slack, dashboards, and operational platforms
Integrate LLMs and AI services into internal workflows using APIs and RAG architectures
Develop AI-driven reporting, incident summarization, and operational intelligence solutions
Evaluate and prototype agentic AI frameworks and automation platforms

Platform & Infrastructure Automation

Develop Infrastructure-as-Code environments using Terraform, Ansible, and related technologies
Build CI/CD pipelines and infrastructure validation workflows
Automate provisioning, upgrades, monitoring, and infrastructure operations across distributed environments
Improve deployment reliability and operational visibility across cloud services

Cross-Team Collaboration

Help establish SRE best practices across engineering teams
Collaborate with infrastructure, support, operations, and leadership teams to identify automation opportunities
Maintain clear technical documentation for systems, workflows, and operational processes
Support tooling evaluation and technical decision-making related to cloud infrastructure and AI operations

Requirements

English B2+
5+ years of experience in SRE, DevOps, Platform Engineering, or related infrastructure roles
Strong experience with observability and monitoring stacks such as Prometheus, Grafana, Alertmanager, Loki, or equivalent
Hands-on experience building or integrating AI/LLM-powered applications, tools, or workflows
Strong proficiency in Python and/or TypeScript
Experience operating Kubernetes environments in production
Experience with Infrastructure-as-Code and automation tooling such as Terraform, Ansible, ArgoCD, or similar
Strong understanding of SLOs, SLIs, reliability engineering, and operational best practices

Strong Differentiators

Experience with workflow automation platforms such as n8n
Experience building RAG pipelines and working with vector databases such as Qdrant, Pinecone, or Weaviate
Familiarity with AI agent frameworks such as LangChain, LangGraph, CrewAI, AutoGen, or similar
Experience with K3s, K0s, Kamaji, Cluster API, or multi-cluster Kubernetes environments
Experience with Proxmox, Ceph, MinIO, Cilium, eBPF, or distributed infrastructure environments
Background working for cloud providers, infrastructure companies, or telecommunications environments
Experience with networking fundamentals such as BGP and connectivity environments
GitHub or portfolio demonstrating infrastructure automation, AI tooling, or SRE-related projects

What We Offer

Opportunity to work on strategic cloud and AI infrastructure initiatives across Latin America
Direct exposure to modern cloud-native, Kubernetes, and AI-driven operational environments
Close collaboration with Cloud Engineering leadership and product strategy initiatives
Multinational and multicultural team environment across LATAM

Portfolio Requirement

Applicants must include a portfolio, GitHub, GitLab, or other practical examples demonstrating relevant technical work.
We are looking for evidence of real systems, automation workflows, AI tooling, infrastructure projects, operational artifacts, or engineering initiatives built and maintained in practice.

Skills Required

5+ years of experience in SRE, DevOps, Platform Engineering, or related infrastructure roles
Strong experience with observability and monitoring stacks such as Prometheus, Grafana, Alertmanager, Loki, or equivalent
Hands-on experience building or integrating AI/LLM-powered applications, tools, or workflows
Strong proficiency in Python and/or TypeScript
Experience operating Kubernetes environments in production
Experience with Infrastructure-as-Code and automation tooling such as Terraform, Ansible, ArgoCD, or similar
Strong understanding of SLOs, SLIs, reliability engineering, and operational best practices

View all jobs at EdgeUno

View EdgeUno Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Bogota

151 Employees

Year Founded: 2019

What We Do

EdgeUno: Your Key to LATAM! Welcome to EdgeUno, where we're pioneering the next era of digital empowerment and connectivity. As a dynamic leader in the tech landscape, we're committed to redefining how you experience the internet in Latin America and beyond. 🚀 Unleashing Innovation: With our innovative mindset, we're reshaping the connectivity landscape. We're not just providers; we're enablers of seamless experiences that empower businesses and individuals. 💡 Cutting-Edge Data Centers: Our network of 50+ modern data centers sets the stage for unparalleled performance. These hubs of innovation serve as the backbone of high-speed, low-latency connectivity. 🌐 Global Connectivity: We're not bound by borders – our best-connected network spans continents, ensuring your content reaches audiences worldwide, effortlessly. 🎮 Gaming & Streaming Excellence: Whether you're a passionate gamer or content creator, EdgeUno is your partner in delivering smooth, immersive experiences that captivate and engage. 🔗 Simplified Cloud Deployment: Seamlessly deploy your cloud applications and services, thanks to our advanced infrastructure that makes complexity disappear. Join us in this journey as we revolutionize digital experiences, spark innovation, and pave the way for a connected world that knows no boundaries. Your online adventure begins with EdgeUno. 🌎🌟 #EdgeUno #Connectivity #Innovation #DigitalEmpowerment #FutureTech #as7195