Site Reliability Engineer (SRE) - GCP

Sorry, this job was removed at 04:14 p.m. (CST) on Tuesday, Feb 17, 2026
Be an Early Applicant
Hiring Remotely in Brazil
Remote
Information Technology • Software
The Role

We are seeking a Site Reliability Engineer (SRE) with deep expertise in monitoring, observability, and reliability engineering to support systems running across on-premises infrastructure and Google Cloud Platform (GCP).

This role is primarily responsible for designing, operating, and improving monitoring, alerting, and observability platforms, with a strong focus on Grafana and Kubernetes environments.

As a secondary responsibility, this role provides backup coverage for the Application Support team during periods of resource constraints or major incidents, offering L2/L3 technical support when required.

Responsibilities

Monitoring & Observability (Core Focus)

  • Own and operate the monitoring and observability stack across on-prem and GCP environments
  • Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and applications
  • Define, tune, and maintain alerts to ensure high signal-to-noise ratio
  • Establish observability standards and best practices across teams
  • Improve visibility into system health, performance, and reliability

Site Reliability Engineering

  • Apply SRE principles to improve availability, performance, and resilience
  • Define and track SLIs, SLOs, and error budgets
  • Participate in on-call rotations and SEV incident response
  • Lead or contribute to incident investigations and root cause analysis (RCA)
  • Drive preventative actions to reduce repeat incidents

Kubernetes & Platform Reliability

  • Support and monitor Kubernetes environments (GKE and on-prem clusters)
  • Monitor cluster health, capacity, and resource utilization
  • Troubleshoot platform-level issues impacting application reliability
  • Collaborate with Platform and Engineering teams on reliability improvements
Secondary Responsibilities (Backup Application Support)
  • These responsibilities are activated as needed, not part of day-to-day operations.
  • Provide L2/L3 application support coverage during:
    • Support team resource shortages
    • High-severity incidents (SEVs)
    • Peak support periods or escalations
  • Triage and troubleshoot application issues using existing runbooks and dashboards
  • Collaborate with Application Support and Engineering teams during incidents
  • Ensure all actions, findings, and resolutions are documented in ServiceNow (SNOW)

Requirements
  • Strong experience as a Site Reliability Engineer or Reliability Engineer
  • Deep hands-on expertise with Grafana (dashboards, alerting, troubleshooting)
  • Solid experience with monitoring and observability systems
  • Production experience operating Kubernetes environments
  • Experience supporting systems in GCP and on-prem environments
  • Strong Linux systems and troubleshooting skills
  • Fluent English (written and spoken).
  • Ability to work in PST time zone.
  • Ability to participate in an on-call rotation that includes coverage for one weekend day. Time worked during the weekend is compensated with one day off during the week, in accordance with the established work schedule.

Technology Stack:

  • Observability: Grafana, Prometheus, logging platforms
  • Containers: Kubernetes (GKE and on-prem)
  • Cloud: Google Cloud Platform (GCP)
  • Operations: Linux, networking, infrastructure monitoring
  • Incident Tools: PagerDuty, ServiceNow, Slack (or equivalents)

Nice to have: 

  • Experience supporting application teams during SEV incidents
  • Knowledge of capacity planning and performance tuning
  • Scripting skills (Python, Bash, etc.)
  • Experience with hybrid infrastructure environments

Benefits

At Devsu, we believe in creating an environment where you can thrive both personally and professionally. By joining our team, you’ll enjoy:

  • A stable, long-term contract with opportunities for career growth
  • Private health insurance
  • A remote-friendly culture that promotes work-life balance
  • Continuous training, mentorship, and learning programs to keep you at the forefront of the industry
  • Free access to AI training resources and state-of-the-art AI tools to elevate your daily work
  • A flexible Paid Time Off (PTO) policy as well as paid holiday days
  • Challenging, world-class software projects for clients in the US and LatAm
  • Collaboration with some of the most talented software engineers in Latin America and the US, in a diverse work environment

Join Devsu and discover a workplace that values your growth, supports your well-being, and empowers you to make a global impact.

Similar Jobs

Félix (felixpago) Logo Félix (felixpago)

Site Reliability Engineer

Fintech • Financial Services
Remote
4 Locations
343 Employees

Félix (felixpago) Logo Félix (felixpago)

Site Reliability Engineer

Fintech • Financial Services
Remote
4 Locations
343 Employees

SailPoint Logo SailPoint

RVP-Enterprise and Commercial Sales- Brazil

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
Brazil
2461 Employees

Motorola Solutions Logo Motorola Solutions

Quality Assurance Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
Brazil
23000 Employees
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Orlando, FL
223 Employees
Year Founded: 2010

What We Do

Devsu is a trusted technology partner that delivers world-class software delivery and staff augmentation services to startups, scale-ups, and enterprise companies.
With over a decade of experience in the industry, our team of seasoned professionals has the necessary knowledge, expertise, and experience to help you build, scale, and launch your next digital product.
We take pride in our customer-centric approach and our commitment to delivering high-quality solutions that meet and exceed our client’s expectations. Our nearshore model allows us to offer cost-effective and flexible services tailored to your needs without sacrificing quality.
At Devsu, we believe in the power of technology to transform businesses and improve people’s lives. That’s why we invest heavily in our people, processes, and IP to provide you with the best talent and cutting-edge solutions to help you achieve your goals and stay ahead of the curve. Whether you need to develop a new product from scratch, augment your existing team, or optimize your software development processes, Devsu has the expertise and team to make it happen.

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account