Site Reliability Engineer

Reposted 10 Days Ago
Be an Early Applicant
Ashburn, VA
In-Office
Senior level
Cloud • Information Technology
The Role
As a Site Reliability Engineer, you will design and implement monitoring solutions, establish monitoring frameworks, automate incident management, and integrate monitoring into IT processes to enhance system reliability.
Summary Generated by Built In
As a Site Reliability Engineer (SRE), you will be an integral part of the team at LightEdge Solutions. This position will report to the DevOps Manager, and will be responsible for reliable operation of the organization’s systems and services.  You will play a key role in identifying our monitoring strategy and vision across multiple products and work with a variety of teams to improve the accuracy of our monitoring systems. 

Responsibilities

  • Monitoring and Observability: Design and implement monitoring solutions to track the performance, availability, and health of various systems and services. Establish robust monitoring frameworks, set up alerts, and analyze system metrics to identify and resolve issues proactively.
  • Establish and align metrics, including SLAs, SLOs, and SLIs, to closely tie system performance to business objectives, ensuring that the site reliability engineering efforts support the overall goals and customer satisfaction.
  • Utilize AIOPS techniques to leverage automation in Incident Management and Response. Develop and maintain automated incident response systems that can detect and mitigate issues automatically. This includes automated incident triaging, remediation, and escalation workflows to minimize manual intervention and improve response times.
  • Leverage the IT service management platform’s capabilities to integrate monitoring into incident management, change management, and other operational processes, enhancing the efficiency and effectiveness of site reliability engineering practices.
  • Working closely with IT functional owners & SME’s.
  • Perform complex systems design, proof of concept, implementation and integration functions.
  • Tasks will consist of developing detailed designs, execution and troubleshooting of strategic solutions in support of effective monitoring, alerting, escalation, automation, reporting and event correlation

Education and Experience

  • 5 years hands-on experience with enterprise monitoring solutions
  • Must possess knowledge of Network Switches, Server hardware, Storage, and Virtualization Technologies
  • Understanding of VMware Infrastructure
  • Experience working with variety of monitoring systems such as Zabbix, vRealize Operations Manager, Nagios and Science Logic
  • Experience and proficiency in integrating with ServiceNow or similar IT service management platforms.
  • Experience with managing automations within a monitoring environment.
  • Ability to provide guidance with design, maintenance, and improvements to enterprise level monitoring solutions.
  • Excellent verbal and written communication skills, ability to present complex ideas and designs to a variety of technical or non-technical stakeholders.
  • Experience with design, implementation, and support of monitoring tools in a complex, multi-platform environment.
  • High level of understanding monitoring requirements for Storage, Network, and Compute servers.

Top Skills

Aiops
Nagios
Science Logic
Servicenow
VMware
Vrealize Operations Manager
Zabbix
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Des Moines, IA
180 Employees

What We Do

LightEdge Solutions is a leading provider of fully-managed network and business services for small and medium sized businesses.

Similar Jobs

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
Virginia, USA
8697 Employees
119K-170K Annually

NinjaOne Logo NinjaOne

Marketing Manager

Information Technology • Productivity • Software • Infrastructure as a Service (IaaS)
Remote or Hybrid
17 Locations
2000 Employees
100K-140K Annually

BAE Systems, Inc. Logo BAE Systems, Inc.

Intelligence Analyst (FMV)

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Remote or Hybrid
Virginia, USA
40000 Employees
64K-108K Annually

NinjaOne Logo NinjaOne

Marketing Manager

Information Technology • Productivity • Software • Infrastructure as a Service (IaaS)
Remote or Hybrid
16 Locations
2000 Employees
100K-130K Annually

Similar Companies Hiring

Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
LayerOne Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account