Customer Reliability Engineer - Infrastructure

Posted An Hour Ago
Be an Early Applicant
9 Locations
In-Office or Remote
125K-130K Annually
Senior level
Artificial Intelligence • Big Data • Cloud • Software • Analytics • Infrastructure as a Service (IaaS) • Big Data Analytics
Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life.
The Role
Operate, monitor, and maintain Astronomer's managed Airflow platform and underlying cloud/Kubernetes infrastructure. Troubleshoot customer environments, participate in on-call rotation, build monitoring/automation, improve observability, and work directly with customers to meet SLAs and drive reliability.
Summary Generated by Built In

Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life and is the company behind Astro, the industry-leading unified DataOps platform powered by Apache Airflow®. Astro accelerates building reliable data products that unlock insights, unleash AI value, and powers data-driven applications. Trusted by more than 800 of the world's leading enterprises, Astronomer lets businesses do more with their data. To learn more, visit www.astronomer.io.

About this role

The Astronomer Customer Reliability Engineering (CRE) team is responsible for the success of our customers' usage of our managed Airflow service.

The CREs are responsible for operating, monitoring, and maintaining the platform to ensure availability, predictability, and reliable operations.

As an infrastructure specialist within the team, you will focus on the reliability of the underlying cloud infrastructure and Kubernetes clusters. This entails responding to incidents either raised by a customer, or from our monitoring system and then taking further steps to ensure problems are permanently resolved or monitored. As owners of the observability platform, CRE has unlimited potential to improve the reliability of the product and deliver the best possible outcome for our customers.

This role is directly customer-facing and gives exposure to very diverse problems and requirements. CRE get the opportunity to interface with customers from a variety of industries across different cloud providers, and all with different expectations. Your contributions will directly impact customers' success with using the Astronomer products, and you will be able to help make meaningful improvements to the customer experience.

 
What you get to do:
  • Provide solutions to customers to make them successful using our products.

  • Troubleshoot customer environments and engage in active triaging with customers

  • Participate in on-call rotation for weekend coverage

  • Provide feedback to the product development teams on customer needs and pain points.

  • Build out our monitoring and alerting systems.

  • Build and maintain automation to ensure daily operational tasks are handled as efficiently as possible.

  • Help direct the architecture of the products and contribute where possible.

  • Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide “white glove” guidance on the path to production.

  • Participate remotely within a fully distributed team.

  • Enhance and enrich customer documentation

  • Work with the latest technology and multi-cloud implementations

 
What you bring to the role:
  • 5 years of experience, preferably with large, complex cloud infrastructures operating at scale

  • 3 years of experience with Kubernetes

  • Experience managing a Production distributed system with at least one major cloud provider (one or all: AWS, GCP, Azure)

  • Strong Linux experience

  • Knowledge of how to operate and monitor issues for distributed systems

  • Previous experience in handling customers issues (internal or external)

  • Strong communication skills

  • DevOps or CI/CD experience

  • Python scripting

  • Good troubleshooting Skills

 
Bonus points if you have:
  • Experience as a Site Reliability Engineer

  • Worked with Kubernetes Custom Resources

  • Depth of knowledge with Azure

  • Airflow/Big Data Orchestration experience

  • IaC experience

 

The estimated total compensation for this role ranges from $125,000 - $130,000 based on leveling and geography, along with an equity component and a comprehensive benefits package. This range is merely an estimate; actual compensation may deviate from this range based on skills, experience, and qualifications.

#LI-Fulltime

#LI-Remote

At Astronomer, we value diversity. We are an equal opportunity employer: we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Skills Required

  • 5 years of experience
  • Experience with large, complex cloud infrastructures operating at scale
  • 3 years of experience with Kubernetes
  • Experience managing a production distributed system on AWS, GCP, or Azure
  • Strong Linux experience
  • Knowledge of operating and monitoring distributed systems
  • Previous experience handling customer issues (internal or external)
  • Strong communication skills
  • DevOps or CI/CD experience
  • Python scripting
  • Good troubleshooting skills
  • Experience as a Site Reliability Engineer
  • Worked with Kubernetes Custom Resources
  • Depth of knowledge with Azure
  • Airflow / Big Data orchestration experience
  • Infrastructure as Code (IaC) experience
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, New York
344 Employees
Year Founded: 2018

What We Do

Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life and is the company behind Astro, the industry-leading unified DataOps platform powered by Apache Airflow®. Astro accelerates building reliable data products that unlock insights, unleash AI value, and powers data-driven applications. Trusted by more than 800 of the world's leading enterprises, Astronomer lets businesses do more with their data. To learn more, visit www.astronomer.io. Apache® and Apache Airflow® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by the use of these marks. All other trademarks are the property of their respective owners.

Similar Jobs

Forward Financing Logo Forward Financing

Lead Data Scientist

Fintech • Financial Services
Remote
United States
529 Employees
174K-220K Annually

CDW Logo CDW

Commissions Analyst

Information Technology
Remote or Hybrid
US
15100 Employees
28K-39K Hourly

Globe Life Logo Globe Life

Outbound Sales Specialist (Remote)

Insurance • Financial Services
Remote
USA
3000 Employees

SambaSafety Logo SambaSafety

Sales Development Representative

Insurance • Logistics • Software • Transportation • Business Intelligence
Remote or Hybrid
2 Locations
300 Employees
55K-60K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account