Site Reliability Engineer

Posted 13 Hours Ago
Be an Early Applicant
India, Sardarpur, Dhar, Madhya Pradesh
Mid level
Software
The Role
The Site Reliability Engineer at Catchpoint is responsible for building and automating infrastructure deployment, monitoring system health, and ensuring high reliability of the monitoring platform. Responsibilities include managing service lifecycles, troubleshooting incidents, and leveraging cloud services for deployment.
Summary Generated by Built In

Who monitors the monitoring system? A Site Reliability Engineer at Catchpoint is responsible for supporting the systems that run Catchpoint’s global monitoring platform. In this role, you will interact directly with operations and development teams on building and automating infrastructure (IaC) deployment at scale, then monitoring it to ensure Catchpoint has a scalable and highly reliable system for our customers. 

 

What will success look like in this position? 

The role requires an operational mindset and a love of solving problems on a global scale with solutions that ensure high reliability and availability. You’ll be exploring and making sense of systems telemetry, logs, passive monitoring and using our own synthetic monitors to create an automation that controls, rolls out, and maintains our platform. 

 

Responsibilities 

  • Define and refine the whole service lifecycle - from inception and design, through deployment, operation and finally retirement.
  • Assess services once they are live by measuring and monitoring availability, latency and overall system health. Establish performance baselines, define actions and automations based on data correlated from multiple sources.
  • Design, build, and maintain logging and telemetry systems that are used to manage all services.
  • Design, code, test, and deliver software to automate manual operational work.
  • Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents.
  • Identify application patterns and analytics in support of better service level objectives.
  • Deploy and maintain systems that run on multiple cloud providers (AWS, GCP, Azure, Alibaba, Tencent, Oracle, IBM) and physical systems around the world.
  • Be part of an on-call rotation to support production systems. 

 

 Required Skills & Qualifications 

  • Strong Experience/knowledge of administering application servers, web servers, and databases.
  • Familiarity with Infrastructure Automation, configuration management and CI/CD tools (preferably terraform)
  • Experience with multiple cloud platforms (AWS, GCP, Azure)
  • Good networking knowledge and experience with Internet Architecture (BGP, peering, DNS).
  • 2+ years of incident resolution experience in a large-scale operations environment.
  • Hands-on experience with cloud deployment, monitoring, and ops analysis tools such as Prometheus, Elasticsearch, Grafana, Kibana, Splunk, Terraform, Jenkins, etc.
  • 3+ years programming experience with python, bash, PowerShell, C, etc.
  • Virtualization experience required.
  • BS degree in Computer Science or related technical field involving coding or equivalent practical experience.
  • Appreciation of the value of diversity of opinions 

 

 

Overview

Catchpoint is the Internet Resilience Company™.  The top online retailers, Global2000, CDNs, cloud service providers, and xSPs in the world rely on Catchpoint to increase their resilience by catching any issues in the Internet stack before they impact their business. The Catchpoint platform offers synthetics, RUM, performance optimization, high fidelity data and flexible visualizations with advanced analytics. It leverages thousands of global vantage points (including inside wireless networks, BGP, backbone, last mile, endpoint, enterprise, ISPs and more) to provide unparalleled observability into anything that impacts your customers, workforce, networks, website performance, applications and APIs.

Catchpoint is an equal opportunity employer that strongly prohibits Discrimination and Harassment of any kind. We celebrate diversity and are committed to creating an inclusive and engaging environment for all employees. We welcome applications from all candidates and look forward to receiving yours!

#LI-REMOTE

Top Skills

Bash
C
Powershell
Python
The Company
HQ: New York, NY
295 Employees
On-site Workplace

What We Do

Catchpoint is the Internet Resilience Company. The top online retailers, Global2000, CDNs, cloud service providers, and xSPs in the world rely on Catchpoint to increase their resilience by catching any issues in the internet stack before they impact your business. The Catchpoint platform offers synthetics, RUM, performance optimization, high fidelity data and flexible visualizations with advanced analytics. It leverages thousands of vantage points (including inside wireless networks, BGP, backbone, last mile, endpoint, enterprise, ISPs and more) to provide unparalleled observability into anything that impacts your customers, workforce, networks, website performance, applications, and APIs.

Similar Jobs

Būsī, Churhāṭ, Sīdhī, Madhya Pradesh, IND
68787 Employees

Deutsche Bank Logo Deutsche Bank

DevOps/SRE

Fintech • Financial Services
Būsī, Churhāṭ, Sīdhī, Madhya Pradesh, IND
68787 Employees

Arrow Electronics, Inc. Logo Arrow Electronics, Inc.

C++/Video streaming Senior Engineer/Technical Lead

Cloud • Enterprise Web • Hardware • Information Technology • Internet of Things • Robotics • Semiconductor
Indore, Madhya Pradesh, IND
22000 Employees

CrowdStrike Logo CrowdStrike

Engineer III, C++ Linux Kernel (Remote, IND)

Cloud • Information Technology • Sales • Security • Cybersecurity
Remote
16 Locations
10000 Employees

Similar Companies Hiring

Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account