Astreya

IT Infrastructure Operations Engineer - Lead

Reposted 6 Days Ago

Be an Early Applicant

Atlanta, GA, USA

In-Office

114K-180K Annually

Senior level

Information Technology

The Role

Oversee infrastructure automation, manage engineering teams, improve system resilience, lead incident management, and implement Infrastructure-as-Code tools.

Summary Generated by Built In

About the Job

This role focuses on building and operating highly reliable infrastructure and automation supporting physical security systems. Infra Automation Engineer applies principles such as SLIs, SLOs, error budgets, and toil reduction to improve system resilience and operational efficiency. The role works closely with
Google leadership to deliver secure, scalable, and automated infrastructure.

Key Responsibilities

● Lead, mentor, and manage a team of Automation Engineers, fostering a culture of ownership, collaboration, and continuous improvement
● Partner with client IT leadership to define, implement, and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical infrastructure services
● Manage error budgets to maintain optimal balance between feature development velocity and system stability
● Act as the primary escalation point for severe incidents (Sev 1/2) and ensure effective incident management, communication, and resolution
● Facilitate blameless post-mortem analysis for all major incidents and drive systemic improvements in tooling and infrastructure resilience
● Manage the team´s project backlog, prioritize work, and ensure balance between reliability engineering and toil reduction initiatives
● Drive automation strategies to reduce manual operational tasks with measurable targets (e.g., 50% reduction in manual server build time)
● Oversee Infrastructure-as-Code implementations using Ansible, Terraform, Puppet, or Chef for configuration management and drift remediation
● Ensure robust observability through standardized monitoring, alerting, and centralized logging across all managed infrastructure
● Manage 24x5 on-call rotations and ensure adequate team coverage for incident response and support
● Collaborate with cross-functional stakeholders to gather requirements, define project scope, and integrate SRE practices into existing workflows
● Drive Mean Time To Repair (MTTR) reduction through automation, improved runbooks, and proactive reliability engineering

Required Skills & Experience

● Experience: 8+ years in Site Reliability Engineering, or Infrastructure Engineering with minimum 3 years in a technical leadership role managing engineering teams
● Technical Proficiency: Strong hands-on experience with Linux/Windows server administration, Infrastructure-as-Code tools (Terraform, Ansible, Chef, Puppet), and scripting languages (Python, Bash, PowerShell)
● SRE Practices: Deep understanding of principles including SLIs, SLOs, error budgets, toil elimination, and experience implementing observability stacks (Prometheus, Grafana, ELK, or Datadog)
● Incident Management: Proven track record in leading incident response, conducting blameless post-mortems, and driving systemic reliability improvements across complex infrastructure environments
● Networking & Security: Solid understanding of networking fundamentals, Cisco device administration, and experience with network automation protocols (NETCONF/RESTCONF) and security compliance frameworks

● Leadership & Communication: Excellent communication and stakeholder management skills
with demonstrated ability to mentor teams, manage backlogs, and balance competing priorities in

Salary Range

$114,000.00 - $180,000.00 USD (Salary)

Please note that the salary information provided herein is base pay only (gross); it does not include other forms of compensation which may or may not apply to this specific position, namely, performance-based bonuses, benefits-related payments, or other general incentives - none of which are guaranteed, may be subject to specific eligibility requirements, and are wholly within the discretion of Astreya to remit.
Further, the salary information noted above is a range that consists of a minimum and maximum rate of pay for this specific position. Where an applicant or employee is placed on this range will depend and be contingent on objective, documented work-related considerations like education, experience, certifications, licenses, preferred qualifications, among other factors.

Astreya offers comprehensive benefits to all Regular, Full-Time Employees, including:

Medical provided through UHC (PPO, HSA, Surest options) / Medical provided through Kaiser (HMO option only) for California employees only
Dental provided through UHC
Nationwide Vision provided by UHC
Flexible Spending Account for Health & Dependent Care
Pre-Tax Account for Commuter Benefit/Parking & Transit (location-specific)
Continuing Education and Professional Development via various integrated platforms, e.g. Udemy and Coursera
Corporate Wellness Program provided by Goomi Group
Employee Assistance Program
Wellness Days
401k Plan
Basic and Supplemental Life Insurance
Short Term & Long Term Disability
Critical Illness, Critical Hospital, and Voluntary Accident Insurance
Tuition Reimbursement (available 6 months after start date, capped)
Paid Time Off (accrued and prorated, maximum of 120 hours annually)
Paid Holidays
Any other statutory leaves, paid time, or other ancillary benefits required under state and federal law

Skills Required

8+ years in Site Reliability Engineering or Infrastructure Engineering
Minimum 3 years in a technical leadership role managing engineering teams
Strong hands-on experience with Linux/Windows server administration
Experience with Infrastructure-as-Code tools (Terraform, Ansible, Chef, Puppet)
Proficient in scripting languages (Python, Bash, PowerShell)
Deep understanding of SLIs, SLOs, and reliability engineering practices
Proven record in incident management and conducting post-mortems
Solid understanding of networking fundamentals and automation protocols
Excellent communication and stakeholder management skills

Astreya Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Astreya and has not been reviewed or approved by Astreya.

Healthcare Strength — Health coverage includes multiple medical plan options plus dental and vision, complemented by FSAs, an EAP, disability and life insurance, and wellness programs. Feedback suggests these offerings provide solid core protection across many roles.
Wellbeing & Lifestyle Benefits — Client-site amenities at some large tech campuses can add non-cash value such as meals or on-site perks that enhance the day-to-day experience. Wellness Days and access to learning resources and tuition reimbursement further support overall wellbeing.
Flexible Benefits — Choice among medical plan types and tax-advantaged accounts enables some customization to individual needs. Some roles also offer remote or flexible work, adding practical flexibility to the total package.

Learn more about Astreya's Compensation & Benefits →

Astreya Insights

What's It Like to Work at Astreya? Astreya Culture & Values Astreya Career Growth & Development What's the Work-Life Balance Like at Astreya? Astreya Leadership & Management Astreya Company Growth, Stability & Outlook

View all jobs at Astreya

View Astreya Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Francisco, California

1,958 Employees

Year Founded: 2001

What We Do

Astreya is the leading IT solutions provider for some of the world's most recognizable and innovative organizations. Our journey started in 2001 in the heart of Silicon Valley and reaches thirty-three countries with over 2200+ IT professionals. We enable businesses to make better decisions, achieve operational efficiency and gain a competitive edge. The Astreya advantage is centered around focus and clear- vision, world-class talent, and innovative technology: Creativity is in our DNA. Our dedicated Software and Service Innovation teams bring best-in-class technology and tools to bear for our clients.