Senior Site Reliability Engineer - GP

Posted 24 Days Ago
Be an Early Applicant
Hiring Remotely in Colombia
Remote
3-5 Years Experience
Software
The Role
As a Senior Site Reliability Engineer, you will lead efforts in blackbox and whitebox monitoring, implement synthetic tests, improve platform reliability, and collaborate on incident management. Required skills include Golang, Prometheus, Dynatrace, Kubernetes, OpenShift, and CI/CD expertise.
Summary Generated by Built In

Senior Site Reliability Engineer


As a Senior Site Reliability Engineer within the SRE Cloud Space team, you will be at the forefront of developing and maintaining advanced observability solutions. This role focuses on enhancing blackbox and whitebox monitoring, implementing synthetic tests, and improving platform reliability across both on-premise and GCP environments, utilizing a variety of cutting-edge technologies.


Responsibilities


*Blackbox Monitoring and Health Mesh Development: Lead efforts in blackbox monitoring, including the development and enhancement of the Health Mesh product. Implement and manage synthetic tests that monitor critical platform services, providing early detection of incidents. Utilize Prometheus for blackbox monitoring and develop simple Go APIs to support these activities.

*Whitebox Monitoring with SLO Approach: Implement whitebox monitoring strategies with a focus on Service Level Objectives (SLOs) for core Google Cloud Platform (GCP) services and applications on OpenShift. Ensure that both platform operators and customers have clear visibility into the system's performance and health.

*Anomaly Detection: Develop and refine anomaly detection mechanisms using the same metrics applied in whitebox monitoring. Leverage tools such as Prometheus and Dynatrace to identify and address potential issues before they escalate, contributing to overall platform stability.

*Separation of Platform Incidents from User Errors: Create tools and processes that help operators distinguish between platform-level incidents and individual user errors. This includes enhancing the observability of API gateways and other critical infrastructure components.

*Support for On-Premise and GCP Environments: Maintain and improve observability tools that support both on-premise and cloud environments, ensuring seamless operation across different infrastructure setups. Provide application support for on-premise applications and utilize technologies such as OpenShift for managing on-premise environments

*Collaboration and Incident Management: Collaborate with various teams to ensure effective incident management and response. Focus on separating platform incidents from individual application teams, providing clear communication and resolution strategies.


Technical Requirements


*Education: Bachelor's degree in Computer Science, Engineering, or equivalent experience.

*DevOps & SRE Experience: +3 years of experience in DevOps and Site Reliability Engineering, with a focus on automation, infrastructure as code, and continuous integration/continuous deployment (CI/CD) practices.

*Programming Experience: 3+ years of experience in programming, with a strong focus on Golang development.

*Monitoring Tools Expertise: 3+ years of experience with APM and monitoring tools such as Dynatrace, Prometheus, ELK, Splunk, or similar.

*Cloud and On-Premise Proficiency: Proficiency in Google Cloud Platform (GCP) and experience with on-premise environments, particularly with application deployment and management on OpenShift.

*Container Orchestration: Experience with container orchestration technologies like Kubernetes (K8s) and OpenShift.

*CI/CD Expertise: Experience with CI/CD deployment pipelines, ensuring automated and reliable deployment processes.

*System Architecture: Demonstrable experience in designing and deploying scalable and resilient systems, with an understanding of cloud-native principles.

*System Monitoring and Anomaly Detection: Extensive experience in implementing both blackbox and whitebox monitoring solutions, with a focus on SLOs and anomaly detection.


Bonus Skills


*Linux Background: Knowledge of both Debian and Ubuntu environments.

*Familiarity with Additional Tools: Experience with Jenkins, Terraform, Datadog, K6, or similar technologies.

*Web Technologies: Understanding of web protocols and technologies such as HTTP, TLS, REST, Nginx, and API gateways.

Top Skills

Go
The Company
HQ: Broomfield, CO
725 Employees
On-site Workplace
Year Founded: 2002

What We Do

Gorilla Logic is a premier nearshore software development partner, providing Agile development teams to Fortune 500 and emerging companies. We bring unparalleled expertise in the delivery of full-stack web, mobile, and enterprise applications. Our Colorado headquarters and nearshore development hubs in Costa Rica, Colombia, and Mexico come together to build high-performance, integrated Agile teams with top-quality engineering talent. Our highly-collaborative developers work with your existing processes and work schedules to deliver game-changing results on your most critical products.

We pride ourselves on providing a true boutique consulting and software development experience that is personalized and customized to your unique business needs. Our deep talent pool of onshore and nearshore development resources is made up of experts in Java, Ruby, Python, .NET, Angular, React, iOS, and Android development.

Interested in a career with Gorilla Logic? We are always looking for smart, talented developers and engineers. Check here for our current job openings, and to submit your resume: https://gorillalogic.com/careers/

Jobs at Similar Companies

Cencora Logo Cencora

DevSecOps Application Security Engineer III

Healthtech • Logistics • Software • Pharmaceutical
Conshohocken, PA, USA
46000 Employees
87K-124K Annually
Louisville, CO, USA
69 Employees
80K-134K Annually

Similar Companies Hiring

TrainHeroic (A Peaksware Company) Thumbnail
Software • Fitness
Louisville, CO
23 Employees
TrainingPeaks (A Peaksware Company) Thumbnail
Software • Fitness
Louisville, CO
69 Employees
Cencora Thumbnail
Software • Pharmaceutical • Logistics • Healthtech
Conshohocken, PA
46000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account