TERN Group

Site Reliability Engineer

Posted 23 Days Ago

Hiring Remotely in United States

Remote

175K-200K Annually

Senior level

Artificial Intelligence • Healthtech • HR Tech • Software

The Role

Own the Heroku-to-GCP migration, maintain Postgres and data pipelines, optimize high‑traffic code paths, build monitoring/alerting, lead incident response and post‑mortems, reduce costs and scale proactively, and coach other infrastructure engineers.

Summary Generated by Built In

Tern's user base is about to triple. Large host agencies are coming on board this year, and the infrastructure needs to be ready before they arrive. If you want to own the migration, build the monitoring, and be the person every other engineer at Tern depends on, this is your role.

ABOUT TERN

Tern is a venture-backed software company on a mission to reshape the $127B travel agency industry by giving power back to the entrepreneurs who built it.

Nearly 98% of travel agencies are small businesses. These businesses have been chronically underserved by technology. We're here to change that. Our platform helps travel advisors run more efficient, professional, and profitable operations, giving them the modern infrastructure they need to lead the next chapter of travel.

But the impact goes beyond business. Travel advisors help clients move more intentionally through the world. When a traveler works with an advisor, they're more likely to avoid overtouristed hotspots and more likely to spend their dollars in places where they can do real good. That's the kind of travel we want more of.

At Tern, we believe in small business. We believe in the power of travel. And we're building the future of both.

SITE RELIABILITY ENGINEER

Everything Tern ships rides on the infrastructure underneath it. We're a Ruby on Rails application on Heroku, migrating to Google Cloud Platform, with a Postgres core and a data pipeline through Fivetran, BigQuery, and Hex. It's a solid foundation. It won't scale on its own.

We're preparing for a major step-up in load, large host agencies coming on board, user base tripling, and this role builds the infrastructure that holds under that growth. You'll own the migration to GCP, own the monitoring and alerting that keeps production reliable, and own the hot paths that need to get faster before volume climbs. Do it well and every other engineer at Tern moves faster and sleeps better. This is a force multiplier role at the foundation of the product.

This is a player-coach role. You'll start as the hands-on technical lead for infrastructure and grow into coaching and managing alongside it. That's the expectation from day one.

Why Tern?

Tern is building the platform travel advisors run their entire business on, and we believe the next great aggregator in travel gets built right now, on AI. The window to win this market is open, and we intend to take it. With capital and real momentum behind us, we're growing the engineering team this summer to put fuel behind what's already working.

This role owns a real part of how Tern works, the kind of work the rest of the team depends on. We hire engineers with high agency and the trust to execute independently, set direction through their work, and lift the standard of everyone around them. We hire for proven execution, so we want to see specific evidence of what you've shipped and the impact it had, in your resume and in the room.

Engineering Standards

Ship regularly. Every engineering pair ships a working, tested feature every week. Fast and good are not in tension. Testing is part of the build.
We own our code and how users experience it. We watch what we ship in production, stay close to support, AppSignal, Bugsnag, and Canny, and own our fixes end to end.
We make the people around us faster and better, through code and design reviews that teach, through clear written work, and by unblocking our colleagues quickly.
Every team is an AI team. We use AI fluently in daily work, from design review to test generation to debugging. We run Claude Code with a deep library of custom skills and agents, unlimited token usage, and we're actively building agentic and MCP-based tooling on top of our own systems. This is how we move fast and build well.

Our stack

Tern is a Ruby on Rails application with a Hotwire front end, backed by a Postgres database and hosted on Heroku, though we are migrating to Google Cloud Platform. Our data flows through to BigQuery, where we build reporting in Hex. Claude Code, with our own library of custom skills and agents, is part of daily development. You don't need to have used every piece, but you should be fluent enough to be productive quickly and excited to work this way.

WHAT YOU'LL DO

Own the migration from Heroku to Google Cloud Platform: architecture, execution, and a cutover that doesn't surprise anyone
Build and maintain the Postgres core, Fivetran pipeline: BigQuery data layer, and Hex reporting infrastructure
Optimize the hot paths that matter most: key backend code paths and our heaviest third, party syncs, so performance holds as volume climbs
Own monitoring, alerting, cost reduction, and proactive scaling: surface problems early, keep spend sane, and stay ahead of growth rather than reacting to it
Lead incident response and write post-mortems that turn an outage into a permanent fix and a smarter team
Set the operational bar across engineering and pull others up to it

WHAT YOU HAVE

Production reliability ownership: Track record of personally owning production reliability at meaningful scale. Concrete stories of incidents you led, fixed, and prevented from recurring, not just participated in. This is a primary responsibility, not something you've done on the side.
Infrastructure migrations: Real experience owning a cloud migration end to end, not just contributing to one. Fluent in GCP (or a comparable cloud), infrastructure-as-code, and the failure modes of distributed systems.
Observability and proactive operations: You build monitoring and alerting that surfaces problems before users find them. You know what to instrument, what to alert on, and what's just noise.
High agency: You find the highest-leverage reliability problem and go fix it without being assigned to it. You don't wait for an outage to justify the work.
AI in your working habits: Specific examples of how AI has made your debugging, automation, or operational workflows faster or more reliable.

BONUS POINTS

GCP migration experience, specifically from Heroku or another PaaS
Experience with Fivetran, BigQuery, or Hex in a production data pipeline
Has managed or coached infrastructure engineers

WHAT WE VALUE @ TERN

🌱 We're always leveling up. Whether you're deepening your craft, learning from a teammate, or embracing a new challenge, growth is core to our identity.

🧭 We act with optimistic agency. We take initiative, seek clarity, and move forward, even when the path isn't obvious. Through every peak and valley, we lead with curiosity, laughter, kindness, and resolve.

💪 We expect operational excellence. We ship value to our users every single week. We believe that compounding habits lead to sustainable productivity, consistency, and mutual trust.

👂 We deeply understand our users. At every level of the organization, we obsess about understanding those we serve and the industry we operate in.

➕ We embrace the power of and, and not now. We challenge trade-offs by asking better questions. We break hard problems into small pieces and tackle them with intention. We also know to make the hard call to say "not right now".

📣 We speak up and move forward. Everyone at Tern has a voice and a responsibility to use it. We invite healthy tension, share dissenting views early, and challenge each other with curiosity, not ego.

🏃 We move fast and sweat the details. Velocity matters and relentless progress beats perfection every time. But speed isn't chaos: we stay aligned, own our outcomes, and care deeply about quality.

🤣 We take the work seriously, but not ourselves. We hire kind, driven people who elevate the room. If you've got a big ego or take yourself too seriously, you won't last.

HOW WE HIRE

Our interview is built to surface demonstrated execution, and we keep it deliberately light rather than a long gauntlet
We screen resumes for specific, verifiable things you shipped and the impact they had, not responsibilities or titles
We go deep on one or two things you personally built and shipped. We want the real story: the ambiguity, the dead ends, what broke, and how you owned the fix.
We give a practical exercise that reflects the actual work, with AI tools available and expected
And throughout, we look for evidence that working with you made other engineers better at their jobs

We take the work seriously, not ourselves. If you're here to grow, build something industry-changing, and raise the bar for the people around you, we'd love to meet you.

WHY JOIN TERN?

Be part of a mission-driven team transforming the travel planning space
Work with a supportive, curious, and creative team
Own the infrastructure layer of a product used by thousands of travel advisors running their businesses
Competitive salary, equity, and benefits package

Tern is committed to building a team that represents people from many different backgrounds and life experiences, reflecting our worldview coinciding with the users and customers we serve across the world. We prefer that you apply, so think of our postings as the start of the conversation. Take the chance, you may be a wonderful add to our Tern team, even if you don't fully match every requirement on the job description.

Skills Required

Production reliability ownership (led incidents, implemented fixes, prevented recurrence)
End-to-end cloud migration ownership (GCP or comparable)
Fluency with Google Cloud Platform or comparable cloud and infrastructure-as-code
Experience maintaining PostgreSQL databases in production
Designing and operating observability: monitoring, alerting, instrumentation
High agency and autonomous ownership of reliability work
Practical use of AI tools in debugging, automation, or operational workflows (concrete examples)
Experience with Fivetran, BigQuery, or Hex in production data pipelines
Experience migrating from Heroku or another PaaS to GCP
Managed or coached infrastructure/infrastructure engineers