Site Reliability Engineer (SRE) - Engineering Productivity

Reposted 12 Days Ago
Be an Early Applicant
Nashua, NH
In-Office
Senior level
Cloud • Security • Software • Analytics
The Role
The SRE will ensure system reliability and scalability, managing infrastructure, monitoring alerts, debugging issues, and creating automated solutions for development productivity.
Summary Generated by Built In
Company Description

Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation. We leverage the latest advancements in cloud computing, artificial intelligence, and software-defined networking to provide our clients with a competitive edge in an increasingly interconnected world. Our solutions are designed to not only meet the current demands of the digital landscape but to also anticipate and adapt to future challenges.

At Arista we value the diversity of thought and perspectives that each employee brings to the table. We  believe that fostering an inclusive environment, where individuals from various backgrounds and experiences feel welcome, is essential for driving creativity and innovation.

Our commitment to excellence has earned us several prestigious awards, such as Best Engineering Team, Best Company for Diversity, Compensation, and Work-Life Balance. At Arista, we take pride in our track record of success and strive to maintain the highest standards of quality and performance in everything we do.

Job Description

Who You’ll Work With

Arista Networks is looking for world-class Site Reliability Engineers passionate about driving systems reliability and scalability to provide the best possible development experience for our 2000+ person engineering team. You will be part of a fast paced, high caliber team building the internal systems and infrastructure used to build the routing and switching products driving the industry's largest data center networks. 

Arista’s Software Engineering team runs at a scale rarely found - TBs of source control, 60GB work trees with 1000s of developer branches in flight at any given time, over 400K daily build/test jobs and over 150 homegrown and cloud native services running on a 60 node Kubernetes cluster.  Operating these systems takes vigilance, responsiveness to alerts, and a steady stream of updates and bug fixes to keep things running smoothly and efficiently as well as to increase our ability to monitor, understand and visualize them. The SRE role will cover all aspects of our software development infrastructure, and may include monitoring, responding to, and enhancing alerts, working to unify and standardize our alerts, fine tuning code for scalability and performance, debugging problems and the addition of new features. You will own your projects from definition to deployment and customer interactions, and you will be responsible for the quality of everything you deliver. 

Working in Engineering Productivity (EngProd), you will collaborate and work with other engineers to design, build, scale, and operate the systems that the rest of Arista’s development teams use.  The EngProd team uses industry-standard systems like Ansible, Jenkins, Kubernetes, Grafana, Gerrit, MySQL, ElasticSearch, Google Cloud, and Redis and also internal systems that we’ve built from the ground-up to automate CI/CD, testing, analysis, and visualization.

What You’ll Do

  • Keeping the production status green all the time
  • Proactively monitor, respond to, and enhance alerts
  • Build automated responses to the most common alerts or work with the rest of the EngProd team to build them
  • Create and maintain the incident response runbooks working with the service dev teams
  • Debug and resolve issues impacting developer user experience and infrastructure stability
  • Develop patterns to support system reliability and socialize them within the EngProd team
  • Review and contribute to the specifications and implementations written by other team members.
  • Work with Arista’s software engineers to identify bottlenecks and limitations in our workflows, tooling, and infrastructure and provide fixes for those problems.
  • Provide support for our tools and infrastructure to Arista’s development team.

Qualifications

  • At least BS Computer Science or Engineering + 5 years’ experience, MS Computer Science or Engineering + 3 years’ experience, or equivalent work experience.
  • Knowledge of one or more of Go, Python, Javascript, Shell Scripting.
  • Knowledge of Linux (or UNIX).
  • Experience operating and managing software systems at scale
  • Strong understanding of the fundamentals of storage and networking
  • Comfortable with Ansible and GitOps 
  • Applied understanding of software engineering principles.
  • Strong problem solving and software troubleshooting skills.
  • Ability to design a solution and implement features independently. Ability to work in small teams.

#LI-SP1

Additional Information

Arista Networks is an equal opportunity employer.  Arista makes all hiring and employment-related decisions in a non-discriminatory manner without regard to race, color, religion, sex, sexual orientation, gender identity, national origin or any other factor determined to be unlawful under applicable federal, state, or law law.  All your information will be kept confidential according to EEO guidelines.

Top Skills

Ansible
Elasticsearch
Gitops
Go
GCP
Grafana
JavaScript
Kubernetes
Linux
MySQL
Python
Redis
Shell Scripting
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
29 Employees
Year Founded: 2004

What We Do

Arista Networks is a leader in data-driven, client to cloud networking for data center, campus, and routing environments. Arista’s award-winning platforms deliver availability, agility, automation, analytics, and security.

We've created this space to keep you updated Arista channel and partner news and updates.

Similar Jobs

CrowdStrike Logo CrowdStrike

Engineer III - Cloud (Remote)

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
22 Locations
10000 Employees
120K-180K Annually

CrowdStrike Logo CrowdStrike

Detection Content Release Engineer III (Remote, Mountain, Central or Eastern US)

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
44 Locations
10000 Employees
120K-180K Annually

CrowdStrike Logo CrowdStrike

Product Manager

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
38 Locations
10000 Employees
140K-215K Annually

CrowdStrike Logo CrowdStrike

Cloud Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
28 Locations
10000 Employees
120K-180K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account