Site Reliability Engineer

Sorry, this job was removed at 08:10 p.m. (CST) on Wednesday, Jun 11, 2025
Be an Early Applicant
Yerevan
In-Office
Cloud • Software
The Role


As a Site Reliability Engineer, you will work as an integral member of product teams, helping to build, deploy, and monitor cloud services reliably. You will contribute to complex software development projects to maintain essential, revenue-critical services. Additionally, you will actively develop code and build frameworks to monitor services deployed in production, driving reliability and performance across a large scale. You will be responsible for ensuring the reliability, availability, and performance of our Elasticsearch infrastructure. We're seeking a talented Site Reliability Engineer who can work with minimal supervision, define test procedures, and collaborate effectively with Developers, Designers, Customer Support, and Engineering Leadership.
Key Responsibilities

  • Build systems and infrastructure to monitor complex, large-scale distributed systems.
  • Identify stability/performance issues and collaborate with developers to triage critical issues in production systems.
  • Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.
  • Devise ways to actively monitor system throughput, capacity, and reliability.
  • Have the ability to debug complex systems and evolve a running environment without causing downtime.
  • Engage in service capacity planning and demand forecasting, as well as software performance analysis and system tuning.
  • Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.
  • Monitor and troubleshoot Elasticsearch performance issues and outages.

Who You Are
  • Bachelor’s degree in Computer Science or equivalent work experience as a System Administrator with programming skills.
  • Fundamental knowledge of technologies across a broad range of disciplines, including virtualization, storage, networking, server, and security.
  • Understanding of systems and application design, including the operational trade-offs of various designs.
  • Experience with monitoring and logging solutions such as Prometheus, Grafana, and ELK stack.
  • Proficiency in scripting languages such as Python.
  • Experience with infrastructure-as-code tools such as Terraform or CloudFormation.
  • Strong understanding of Linux system administration and networking concepts.
  • Excellent troubleshooting and problem-solving skills.
  • Ability to work independently and collaboratively in a fast-paced environment.
  • Strong communication and interpersonal skills.
  • Demonstrable knowledge of Unix, TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.
  • Experience in analyzing logs and troubleshooting large-scale distributed systems.
  • Excellent organizational, time management, and communication skills.

Nice to Have
  • Experience with instrumenting and monitoring production systems using tools such as ELK stack, Zabbix, Nagios, Statsd/Graphite, APM, etc.
  • Experience with Amazon AWS Infrastructure (including EC2, S3, VPC, Security Groups, RDS) and related services is desirable.
  • A working understanding of Docker, Vagrant, and configuration management tools like Ansible, Chef, or Puppet.
  • Experience with one or more general-purpose programming/scripting languages, including but not limited to Python, Bash, Perl, or Go.

Benefits include:
  • Medical Insurance
  • Flexible PTO
  • Flex Friday
  • Hybrid Work Option Available
  • Tuition Reimbursement
  • And more!


 

Similar Jobs

Exely Logo Exely

Legal Advisor

Software • Hospitality
In-Office or Remote
2 Locations
139 Employees
In-Office or Remote
10 Locations
81 Employees
In-Office or Remote
9 Locations
81 Employees

Provectus Logo Provectus

Product Manager

Artificial Intelligence • Information Technology • Consulting
In-Office or Remote
4 Locations
572 Employees
5-5 Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Denver, CO
215 Employees
Year Founded: 2006

What We Do

Axcient backup and disaster recovery solutions allow MSPs to Protect Everything.
With a single, easy-to-use platform, Axcient helps you keep your clients secure, and build a healthy business. New x360 Recover Direct-to-Cloud frees you from the complexity and expense of appliances. Take the Axcient Challenge, and see how you can get full BDR for all your clients' use cases--for up to 50% less than what you pay today for backup alone.

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account