Site Reliability Engineer II

Reposted 24 Days Ago
Be an Early Applicant
Hiring Remotely in India
Remote
Mid level
Big Data • Software • Analytics
The Role
As a Site Reliability Engineer II, you will manage incident response, enhance observability, automate operations, and collaborate across teams to maintain system reliability.
Summary Generated by Built In

Data is at the core of modern business, yet many teams struggle with its overwhelming volume and complexity. At Atlan, we’re changing that. As the world’s first active metadata platform, we help organisations transform data chaos into clarity and seamless collaboration.

From Fortune 500 leaders to hyper-growth startups, from automotive innovators redefining mobility to healthcare organisations saving lives, and from Wall Street powerhouses to Silicon Valley trailblazers — we empower ambitious teams across industries to unlock the full potential of their data.

Recognised as leaders by Gartner and Forrester and backed by Insight Partners, Atlan is at the forefront of reimagining how humans and data work together. Joining us means becoming part of a movement to shape a future where data drives extraordinary outcomes.

Why this role matters 🔗

As a key member of Atlan’s Platform & Reliability Engineering Team, your core responsibility will be to strengthen our alert management and incident response capabilities, ensuring every customer experience remains fast, reliable, and uninterrupted.

Whether you’re handling production incidents, automating operational workflows, or enhancing observability and monitoring, your work will directly contribute to Atlan’s mission of empowering modern data teams with a resilient and seamless platform.

At Atlan, we’re building high-performance, reliability-driven engineering teams across every function — and this role is foundational. We’re looking for curious, self-driven engineers who thrive under pressure, love solving real-world reliability challenges, and are passionate about keeping systems stable as we scale globally.

We value engineers who use data, automation, and deep systems thinking to make reliability a core part of how we build and operate not just a function, but a culture.

Your Mission at Atlan 🌟
  • Own and operate end-to-end reliability for critical systems — from alert triage and incident resolution to long-term preventive improvements.

  • Proactively manage incidents within defined SLAs (60 mins for Critical, 180 mins for High) and ensure smooth collaboration across teams during resolution.

  • Enhance observability by improving monitoring systems, refining alerts, and reducing noise to focus on what truly matters.

  • Automate operations and incident workflows to eliminate manual toil, improving speed, consistency, and reliability.

  • Collaborate across teams — work with Platform, Observability, and Product Engineering teams to strengthen uptime and service stability.

  • Contribute to documentation and playbooks, ensuring that every incident drives learning, process improvement, and team efficiency.

What makes you a great fit 😍
  • Proven experience managing alerts, incidents, and root cause analyses in production environments.

  • Hands-on knowledge of cloud platforms (AWS, GCP, or Azure) and Kubernetes — including networking, deployments, and troubleshooting.

  • Familiarity with monitoring and observability tools such as Prometheus, Grafana, ELK/EFK, or Datadog.

  • Ability to automate repetitive operational tasks using scripting (Python, Bash, or Shell).

  • Strong communication and collaboration skills — especially in distributed or remote-first teams.

  • A mindset of ownership, curiosity, and calm under pressure — you thrive in incident response and turn challenges into learning opportunities.

Why you’ll love working here 💙
  • Real impact from Day 1: Your work directly shapes reliability for thousands of users across the globe.

  • Modern tech stack: Work with cutting-edge tools — Kubernetes, Terraform, Prometheus, Datadog, and more.

  • Learning culture: Collaborate with world-class platform engineers and senior SREs who believe in mentorship and continuous growth.

  • Autonomy & trust: Freedom to experiment, improve, and own your work end-to-end.

  • Clear growth path: Grow from SRE II → Senior SRE → Senior SRE II → Staff SRE → Principal SRE as you expand your technical depth and ownership scope.

Join us if you want to...
  • Help build the backbone of Atlan’s global data platform.

  • Turn reactive operations into proactive reliability.

  • Be part of a culture that treats reliability not as a checklist — but as a craft.

Why Is Atlan for You?

At Atlan, we believe the future belongs to the humans of data. From curing diseases to advancing space exploration, data teams are powering humanity's greatest achievements. Yet, working with data can be chaotic—our mission is to transform that experience. We're reimagining how data teams collaborate by building the home they deserve, enabling them to create winning data cultures and drive meaningful progress.

Joining Atlan means:

  1. Ownership from Day One: Whether you're an intern or a full-time teammate, you’ll own impactful projects, chart your growth, and collaborate with some of the best minds in the industry.

  2. Limitless Opportunities: At Atlan, your growth has no boundaries. If you’re ready to take initiative, the sky’s the limit.

  3. A Global Data Community: We’re deeply embedded in the modern data stack, contributing to open-source projects, sponsoring meet-ups, and empowering team members to grow through conferences and learning opportunities.

As a fast-growing, fully remote company trusted by global leaders like Cisco, Nasdaq, and HubSpot, we’re creating a category-defining platform for data and AI governance. Backed by top investors, we’ve achieved 7X revenue growth in two years and are building a talented team spanning 15+ countries.

If you’re ready to do your life’s best work and help shape the future of data collaboration, join Atlan and become part of a mission to empower the humans of data to achieve more, together.

We are an equal opportunity employer
At Atlan, we’re committed to helping data teams do their lives’ best work. We believe that diversity and authenticity are the cornerstones of innovation, and by embracing varied perspectives and experiences, we can create a workplace where everyone thrives. Atlan is proud to be an equal opportunity employer and does not discriminate based on race, color, religion, national origin, age, disability, sex, gender identity or expression, sexual orientation, marital status, military or veteran status, or any other characteristic protected by law.

Top Skills

AWS
Azure
Bash
Datadog
Efk
Elk
GCP
Grafana
Kubernetes
Prometheus
Python
Shell
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, NY
192 Employees
Year Founded: 2018

What We Do

Built by a data team for data teams, Atlan is the active metadata platform for the modern data stack. It stitches together metadata from various sources (Snowflake, dbt, Databricks, Looker, Tableau, Postgres, etc.) to create a unified data discovery, cataloging, lineage, and governance experience across all your data assets, from columns and queries to metrics and dashboards. Atlan facilitates a two-way movement of metadata, bringing context back into the tools and workflows that your data team uses every day — for example, in your BI tool when you wonder what a metric on the dashboard means.

A pioneer in the space, Atlan was named a Leader in Forrester Wave™️: Enterprise Data Catalogs for DataOps in 2022 and was recognized by Gartner seven times in 2021, including as a Cool Vendor in DataOps and in the inaugural Market Guide for Active Metadata Management. Today, we power pioneering data teams like WeWork, Plaid, Postman, Unilever, and Ralph Lauren. We recently raised a Series B, backed by top investors (including Insight Partners, Sequoia, and Salesforce Ventures) and founders & CEOs from the modern data stack (including Snowflake, Looker, and Stitch).

For more information, visit http://www.atlan.com/ or follow us on Twitter at AtlanHQ.

Similar Jobs

Easy Apply
Remote
India
1891 Employees
Easy Apply
Remote
India
364 Employees
29K-36K Annually
Remote
India
1170 Employees

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account