As technology organizations scale, so does operational friction. IT support teams become overloaded with repetitive tickets — account lockouts, access requests, provisioning tasks, and standard “ask IT” issues that drain time and attention from higher-value work.
EverOps partners directly with enterprise engineering and IT organizations to solve complex operational challenges from within their environments. We don’t patch symptoms — we eliminate root causes.
We are seeking a Lead Site Reliability Engineer to own and execute a comprehensive IT support automation strategy designed to significantly reduce ticket volume and human intervention.
The ChallengeThis is not a reactive support role.
This is a systems-level engineering role focused on:
Eliminating tickets before they are created
Automating resolution paths when tickets do occur
Building durable automation frameworks across SaaS and internal platforms
Removing systemic friction across the IT lifecycle
You will operate heavily within the IT support domain, addressing areas such as:
Account lockouts and access management
Provisioning and deprovisioning workflows
Device and asset lifecycle management
Standard internal IT requests
SaaS integrations and workflow orchestration
The expectation is leadership-level ownership. You will define the automation roadmap, architect solutions, and drive initiatives from intake through deployment with measurable outcomes.
The MissionAs a Lead SRE, your mission is to:
Reduce human intervention across IT support workflows
Build automation systems that scale without increasing headcount
Architect reliable, observable, production-grade automation services
Establish engineering standards for automation development
Mentor junior engineers while maintaining direct ownership of delivery
Success is measured in outcomes:
Reduced ticket creation rates
Increased fully automated resolution percentages
Improved user satisfaction while lowering operational burden
This role requires deep technical capability combined with strong execution discipline and cross-functional influence.
What You’ll Do1. Root-Cause Ticket EliminationAnalyze ticket trends and identify systemic failure patterns
Redesign workflows to remove recurring pain points
Replace reactive fixes with preventative engineering solutions
Partner with IT and engineering stakeholders to prioritize high-leverage automation opportunities
Design and implement automation workflows across multiple SaaS platforms
Integrate with third-party and internal APIs (e.g., identity providers, collaboration tools, asset systems, ticketing platforms)
Architect resilient API integrations including:
Authentication & authorization flows (OAuth2, SAML, token management)
Rate limiting and retry strategies
Error handling and observability
Build self-service systems that allow users to resolve common requests without human escalation
When no off-the-shelf solution exists, you will:
Build lightweight microservices or serverless functions (Python or Go preferred)
Develop internal middleware, proxies, or orchestration services
Create background automation jobs (cron-style processes)
Containerize and deploy services using modern DevOps practices
You will make thoughtful build-vs-buy decisions, balancing speed, maintainability, and long-term scalability.
4. Reliability, Observability & Production StandardsAutomation must be as reliable as any production system.
You will:
Implement Infrastructure as Code (Terraform, Pulumi, or similar)
Maintain CI/CD pipelines for automation services
Design monitoring, logging, and alerting frameworks
Define SLIs/SLOs to measure automation reliability
Ensure automation services are secure, observable, and resilient
This is not scripting — this is platform-grade engineering.
5. Lead-Level Ownership & ExecutionThis role requires operating as a single-threaded owner for major initiatives.
You will:
Define solution architecture from concept to deployment
Set timelines and milestones autonomously
Conduct feasibility validation in development environments
Communicate proactively with stakeholders
Re-scope tactically to maintain forward momentum when blocked
Deliver measurable impact — not just activity
You are expected to think systemically, move with urgency, and drive initiatives to completion without requiring micro-management.
You HaveExperience8+ years in SRE, Platform Engineering, DevOps, or Automation Engineering
Proven experience designing enterprise-scale automation systems
Strong exposure to IT support domains (access, provisioning, identity, device lifecycle, SaaS operations)
API & Integration Expertise
Deep experience designing and consuming REST APIs
Strong understanding of authentication and authorization patterns
Experience orchestrating workflows across multiple SaaS platforms
Programming & Automation
Strong proficiency in Python or Go
Experience building production-ready services
Advanced scripting for orchestration and automation logic
Cloud & Infrastructure
Strong familiarity with at least one major cloud provider (AWS, GCP, or Azure)
Containerization and Kubernetes exposure
Infrastructure as Code experience
Systems Thinking
Networking fundamentals
Identity and access concepts
Understanding of asset lifecycle management
Experience leading technical initiatives from idea through deployment
Ability to mentor junior engineers
Strong written and verbal communication skills
Comfortable influencing cross-functional stakeholders
Data-driven decision-making approach
You think in terms of leverage, scale, and long-term impact.
What Success Looks LikeWithin 6–12 months, you will have:
Eliminated entire categories of recurring IT tickets
Implemented durable automation frameworks across core IT workflows
Increased automated resolution rates quarter over quarter
Reduced manual provisioning and access overhead
Established scalable, observable automation systems that continue to compound value
Your impact will be visible in metrics — not anecdotes.
Nice to HaveExperience integrating AI/LLM capabilities into workflow automation
Familiarity with ITSM frameworks
Background building internal self-service platforms
Experience presenting technical strategy to senior leadership
Experience operating in high-scale, compliance-sensitive environments
100% Remote Workplace
Unlimited Paid Time Off
Equity – Become a true owner of the company
401K with company contribution and sponsored healthcare
Professional Growth – Access to training and certification programs
Top Skills
What We Do
Introducing a New Kind of Partner:
THE EMBEDDED SERVICE PROVIDER
A PARTNER THAT CAN PERFORM COMPLEX DELIVERY AS PART OF YOUR TEAM
Companies have a lot of trouble finding partners that can perform complex deliveries and services. A partner that can co-own problems from within their organization. Enter the Embedded Service Provider: An ESP performs a service from within the client team structure.
THE EVEROPS TECHPOD
For It Operations, Production DevOps and Identity
Our TechPod model is what allows us to take on complex parts of your technology from within your team structure. As part of every contract, you get all TechPod elements:
- Pod Leader
- Architect
- Engineering
- Project work as part of the monthly cost
- Operations
ENGINEERED OPERATIONS
The foundation of our TechPods is our Engineered Operations group: The relentless pursuit of applying engineering & automations to operations functions. All clients benefit from:
- EverOps Labs - Speeds architecting and validates deployments
- EverOps GitOps models
- EverOps Alternative Compute models
- EverOps ZeroTrust models for corp & engineering
- EverOps Cloud Governance models
- EverOps Deployment Automation
- EverOps Site Reliability Engineering
- EverOps NOC Automation-monitoring -> Alerting -> Slack / Pagerduty
- EverOps Site build & PM templates

.png)






