The Role
The Site Reliability Engineer will ensure service stability, implement monitoring systems, manage cloud infrastructure, automate workflows, and respond to incidents.
Summary Generated by Built In
Site Reliability Engineer (SRE)Overview
Compensation
We're looking for a passionate and hands-on Site Reliability Engineer (SRE) to join our team. This role is critical for ensuring the stability, performance, and scalability of our production services. You'll be the bridge between development and operations, with a strong focus on using code to manage infrastructure and eliminate toil.
Key Responsibilities- Monitoring and Alerting: Design, implement, and maintain robust monitoring and alerting systems (e.g., GCP Monitoring, Prometheus, Grafana, Traces, Logs) to provide visibility into application performance and infrastructure health.
- Infrastructure Management: Build, provision, and maintain our core infrastructure, with a strong emphasis on Cloud environments and Kubernetes clusters.
- Automation and Tooling: Write and maintain scripts and automation workflows (e.g., Python, Bash, TypeScript (Pulumi)) to streamline deployment, scaling, and operational tasks, embracing the philosophy of "automating everything."
- Incident Response: Provide hands-on, real-time incident response and participate in an on-call rotation to quickly mitigate service disruptions and restore functionality.
- Production Debugging: Deeply debug and troubleshoot complex production problems across the entire stack, from network issues to application code defects.
- Process Improvement: Conduct blameless post-mortems for major incidents, implementing long-term solutions to prevent recurrence and continuously improve service reliability.
- Proven experience as an SRE, DevOps Engineer, or similar role.
- Expertise in managing and scaling Kubernetes in a production environment.
- Strong proficiency in a scripting or programming language (e.g., Python, Go, Bash).
- Deep understanding of monitoring, logging, and alerting best practices.
- Solid experience with at least one major Cloud provider (AWS, GCP, or Azure).
- Experience with Infrastructure as Code (IaC) tools like Terraform or Pulumi is a plus.
A proactive, data-driven approach to reliability and a passion for managing complex systems at scale.
The base pay range for this role is $50,000 – $60,000 per year.
Skills Required
- Proven experience as an SRE or DevOps Engineer
- Expertise in managing and scaling Kubernetes in production
- Strong proficiency in Python or similar scripting languages
- Deep understanding of monitoring and alerting best practices
- Experience with major Cloud providers (AWS, GCP, or Azure)
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company
What We Do
RC Talent Solutions is a premier technology recruiting partner specializing in personalized tech staffing for IT roles, helping startups, enterprise teams, and global brands build world-class tech.







