Lytx

Staff SRE

Reposted 3 Days Ago

Hiring Remotely in Virginia, USA

Remote

184K-233K Annually

Senior level

Information Technology

The Role

Lead technical strategy for observability, operational intelligence, and reliability. Architect telemetry and automation platforms, drive AIOps and large-scale IaC, lead incident response, mentor senior engineers, and standardize SLO/SLI and reliability practices across AWS cloud-native environments.

Summary Generated by Built In

Why Lytx:

Site Reliability Engineering team is responsible for the availability, reliability, observability and resilience of Infrastructure and related automation of the entire fleet of servers on-prem and the expanding cloud posture of the organization. This team’s responsibilities are very critical to the continuity of business of the organization. If you love crafting new solutions and building a scalable cloud and on-prem infrastructure, then this role may be an excellent match for you!

You’ll get to:

Build tools and frameworks to monitor systems and ensure highest level of uptime on production environments.
Mentor the SRE team on best practices. Develop culture of innovation.
Take lead in enhancing our 24/7 on call and incident management process. Build and maintain Run-books. Contribute to design and documentation of the cloud services and SOPs.
Influence service design by working closely with Architects, DBAs, Developers, DevOps, Data engineers to bake reliability, scalability and cost optimizations early in the development process.
Lead blameless post-mortems. Take ownership of publishing RCA documents for internal and external consumption.
Lead initiatives with Service Owners to define the SLOs and build SLIs to ensure systems are meeting the SLAs.
Research and evaluate new cloud technologies and vendor offerings to enhance product stability and manageability.
Reduce Operational Toil and maintain high degree of automation by adapting IaC first and Gitops principals.
Acquire and maintain significant understanding of Lytx production services to ensure timely resolution of production incidents.

You’ll Need:

8+ years of experience as a SRE in an AWS environment at medium to large scale organization.
6+ years of hands-on experience implementing and managing Observability tools (Prometheus, New Relic, Grafana, etc.)
High degree of proficiency in programing, preferably using Python, groovy and bash.
Hands-on experience managing database technologies (SQL and NoSQL).
5+ years of experience building Infrastructure deployment pipelines using git, Terraform, Helm, Jenkins/JenkinX/ArgoCD etc.
Proficient in designing production environments in AWS cloud using various AWS services (VPCs, EKS, IAM, AMI, EC2, CloudWatch, CloudTrail’s, Control Tower, Guard duty, MSK, S3, Glacier, Gateways, Direct Connects, Route53, RDS, ALBs, Autoscaling etc)
Extensive with Linux systems and various protocols and technologies (HTTP, REST, TCP/IP, SSL, DNS, SMTP, SSH, NTP, Load Balancing, SQL/NoSQL, Message Brokers, Nginx, Vault , ELK etc)
Hands-on experience with Kubernetes and various container and cloud native technologies.
Significant experience in participating, implementing, and managing 24-7 on call rotation for SRE team, creating run books, building support procedures and proactively monitor systems across geographical locations
Ability to work well under pressure within a technically challenging environment.

Preferred Experience:

Hands-on experience managing sophisticated networks in AWS cloud (Direct Connects, Transit gateways, VPNs, BGP, Firewalls, CDNs)
Hands-on experience managing Cloud Databases (AWS RDS, Mongo, Elastic Search, Snowflake)
Certifications: Multiple AWS Certificates, Kubernetes, Linux, Programming, CI/CD.

Benefits:

Medical, dental and vision insurance
Health Savings Account
Flexible Spending Accounts
Telehealth
401(k) and 401(k) match
Life and AD&D insurance
Short-Term and Long-Term Disability
FTO or PTO
Employee Well-Being program
11 paid holidays plus 1 inclusive holiday per year
Volunteer Time Off
Employee Referral program
Education Reimbursement Program
Employee Recognition and Appreciation program
Additional perk and voluntary benefit programs

Salary is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. This position is also eligible for an incentive compensation plan. The expected hiring salary for this position is:

$183,500.00 - $232,500.00

You’re driven to succeed and so are we. At Lytx, our mission is to protect a world in motion, and we do it by building technology and partnerships that help keep people safe on the road. The way we work is guided by our shared values: Deliver for the customer, Responsibility in every outcome, Innovate with purpose, Velocity with excellence, and Elevate each other.

If you’re looking for meaningful work, a team that challenges and supports you, and the chance to grow your career while making a real impact, we’d love to meet you.

Together, we’re helping make roadways safer and saving lives!

Lytx, Inc. is proud to be an equal opportunity employer. We’re committed to building a diverse and inclusive workforce and do not discriminate based on race, color, religion, sex, sexual orientation, gender identity or expression, gender, genetic information, uniformed service, national origin, age, veteran status, disability, pregnancy, or any other status protected by federal or state law. We are committed to providing reasonable accommodation for candidates with disabilities who need assistance during the hiring process. To request a reasonable accommodation, please email [email protected].  Lytx conducts background checks on applicants who receive a conditional offer of employment in accordance with applicable local, state, federal and regional laws. Qualified applicants with arrest or conviction records will be considered. Background check results may potentially result in the withdrawal of a conditional offer of employment and will be made in accordance with all applicable local, state, federal and regional laws.

Skills Required

8-10+ years SRE, platform engineering, or cloud infrastructure experience supporting large-scale production environments.
Demonstrated experience leading architecture, reliability strategy, or operational platforms across multiple teams.
Proven track record operating 24/7 production environments, incident leadership, and postmortem practices.
Deep expertise designing and operating large-scale AWS environments (VPC, EC2, EKS/ECS, RDS/DynamoDB, S3, ALB/NLB, IAM, KMS, Route 53, multi-account).
Experience designing resilient, fault-tolerant systems using multi-AZ/multi-region patterns, graceful degradation, rate limiting, and capacity management.
Senior-level experience with observability platforms and telemetry (New Relic, Datadog, Prometheus, Grafana, OpenTelemetry) and low-noise alerting.
Experience defining telemetry standards, instrumentation strategies, centralized dashboards, and improving operational signal quality (correlation, noise reduction).
Experience implementing or evaluating AIOps capabilities (anomaly detection, event correlation, predictive alerting, automated remediation).
Expert-level Infrastructure-as-Code with Terraform and/or CloudFormation, reusable modules, and GitOps workflows.
Strong scripting/programming skills (Python, Go, Bash, or similar) for automation and operational tooling.
Expert understanding of Linux systems, networking (TCP/IP, DNS, TLS), and distributed system behavior.
Expert with Kubernetes and cloud-native architecture patterns.
Demonstrated ability to influence technical direction without direct authority and mentor senior engineers.

View all jobs at Lytx

View Lytx Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

Framingham, MA

790 Employees

Year Founded: 1998

What We Do

Learn how Lytx video telematics can help you improve safety, efficiency, and DOT compliance in your fleet. Start improving your fleet operations today.