Site Reliability Engineer

Posted 3 Days Ago
Hiring Remotely in United States
Remote
175K-200K Annually
Senior level
Artificial Intelligence • Healthtech • HR Tech • Software
The Role
Own the Heroku-to-GCP migration, maintain Postgres and data pipelines, optimize high‑traffic code paths, build monitoring/alerting, lead incident response and post‑mortems, reduce costs and scale proactively, and coach other infrastructure engineers.
Summary Generated by Built In

Tern's user base is about to triple. Large host agencies are coming on board this year, and the infrastructure needs to be ready before they arrive. If you want to own the migration, build the monitoring, and be the person every other engineer at Tern depends on, this is your role.

ABOUT TERN

Tern is a venture-backed software company on a mission to reshape the $127B travel agency industry by giving power back to the entrepreneurs who built it.

Nearly 98% of travel agencies are small businesses. These businesses have been chronically underserved by technology. We're here to change that. Our platform helps travel advisors run more efficient, professional, and profitable operations, giving them the modern infrastructure they need to lead the next chapter of travel.

But the impact goes beyond business. Travel advisors help clients move more intentionally through the world. When a traveler works with an advisor, they're more likely to avoid overtouristed hotspots and more likely to spend their dollars in places where they can do real good. That's the kind of travel we want more of.

At Tern, we believe in small business. We believe in the power of travel. And we're building the future of both.

SITE RELIABILITY ENGINEER

Everything Tern ships rides on the infrastructure underneath it. We're a Ruby on Rails application on Heroku, migrating to Google Cloud Platform, with a Postgres core and a data pipeline through Fivetran, BigQuery, and Hex. It's a solid foundation. It won't scale on its own.

We're preparing for a major step-up in load, large host agencies coming on board, user base tripling, and this role builds the infrastructure that holds under that growth. You'll own the migration to GCP, own the monitoring and alerting that keeps production reliable, and own the hot paths that need to get faster before volume climbs. Do it well and every other engineer at Tern moves faster and sleeps better. This is a force multiplier role at the foundation of the product.

This is a player-coach role. You'll start as the hands-on technical lead for infrastructure and grow into coaching and managing alongside it. That's the expectation from day one.

Why Tern?

Tern is building the platform travel advisors run their entire business on, and we believe the next great aggregator in travel gets built right now, on AI. The window to win this market is open, and we intend to take it. With capital and real momentum behind us, we're growing the engineering team this summer to put fuel behind what's already working.

This role owns a real part of how Tern works, the kind of work the rest of the team depends on. We hire engineers with high agency and the trust to execute independently, set direction through their work, and lift the standard of everyone around them. We hire for proven execution, so we want to see specific evidence of what you've shipped and the impact it had, in your resume and in the room.

 
Engineering Standards
  1. Ship regularly. Every engineering pair ships a working, tested feature every week. Fast and good are not in tension. Testing is part of the build.

  2. We own our code and how users experience it. We watch what we ship in production, stay close to support, AppSignal, Bugsnag, and Canny, and own our fixes end to end.

  3. We make the people around us faster and better, through code and design reviews that teach, through clear written work, and by unblocking our colleagues quickly.

  4. Every team is an AI team. We use AI fluently in daily work, from design review to test generation to debugging. We run Claude Code with a deep library of custom skills and agents, unlimited token usage, and we're actively building agentic and MCP-based tooling on top of our own systems. This is how we move fast and build well.

Our stack

Tern is a Ruby on Rails application with a Hotwire front end, backed by a Postgres database and hosted on Heroku, though we are migrating to Google Cloud Platform. Our data flows through to BigQuery, where we build reporting in Hex. Claude Code, with our own library of custom skills and agents, is part of daily development. You don't need to have used every piece, but you should be fluent enough to be productive quickly and excited to work this way.

 
WHAT YOU'LL DO
  • Own the migration from Heroku to Google Cloud Platform: architecture, execution, and a cutover that doesn't surprise anyone

  • Build and maintain the Postgres core, Fivetran pipeline: BigQuery data layer, and Hex reporting infrastructure

  • Optimize the hot paths that matter most: key backend code paths and our heaviest third, party syncs, so performance holds as volume climbs

  • Own monitoring, alerting, cost reduction, and proactive scaling: surface problems early, keep spend sane, and stay ahead of growth rather than reacting to it

  • Lead incident response and write post-mortems that turn an outage into a permanent fix and a smarter team

  • Set the operational bar across engineering and pull others up to it

WHAT YOU HAVE
  • Production reliability ownership: Track record of personally owning production reliability at meaningful scale. Concrete stories of incidents you led, fixed, and prevented from recurring, not just participated in. This is a primary responsibility, not something you've done on the side.

  • Infrastructure migrations: Real experience owning a cloud migration end to end, not just contributing to one. Fluent in GCP (or a comparable cloud), infrastructure-as-code, and the failure modes of distributed systems.

  • Observability and proactive operations: You build monitoring and alerting that surfaces problems before users find them. You know what to instrument, what to alert on, and what's just noise.

  • High agency: You find the highest-leverage reliability problem and go fix it without being assigned to it. You don't wait for an outage to justify the work.

  • AI in your working habits: Specific examples of how AI has made your debugging, automation, or operational workflows faster or more reliable.

BONUS POINTS
  • GCP migration experience, specifically from Heroku or another PaaS

  • Experience with Fivetran, BigQuery, or Hex in a production data pipeline

  • Has managed or coached infrastructure engineers

WHAT WE VALUE @ TERN

🌱 We're always leveling up. Whether you're deepening your craft, learning from a teammate, or embracing a new challenge, growth is core to our identity.

🧭 We act with optimistic agency. We take initiative, seek clarity, and move forward, even when the path isn't obvious. Through every peak and valley, we lead with curiosity, laughter, kindness, and resolve.

💪 We expect operational excellence. We ship value to our users every single week. We believe that compounding habits lead to sustainable productivity, consistency, and mutual trust.

👂 We deeply understand our users. At every level of the organization, we obsess about understanding those we serve and the industry we operate in.

➕ We embrace the power of and, and not now. We challenge trade-offs by asking better questions. We break hard problems into small pieces and tackle them with intention. We also know to make the hard call to say "not right now".

📣 We speak up and move forward. Everyone at Tern has a voice and a responsibility to use it. We invite healthy tension, share dissenting views early, and challenge each other with curiosity, not ego.

🏃 We move fast and sweat the details. Velocity matters and relentless progress beats perfection every time. But speed isn't chaos: we stay aligned, own our outcomes, and care deeply about quality.

🤣 We take the work seriously, but not ourselves. We hire kind, driven people who elevate the room. If you've got a big ego or take yourself too seriously, you won't last.

HOW WE HIRE
  • Our interview is built to surface demonstrated execution, and we keep it deliberately light rather than a long gauntlet

  • We screen resumes for specific, verifiable things you shipped and the impact they had, not responsibilities or titles

  • We go deep on one or two things you personally built and shipped. We want the real story: the ambiguity, the dead ends, what broke, and how you owned the fix.

  • We give a practical exercise that reflects the actual work, with AI tools available and expected

  • And throughout, we look for evidence that working with you made other engineers better at their jobs

We take the work seriously, not ourselves. If you're here to grow, build something industry-changing, and raise the bar for the people around you, we'd love to meet you.

 
WHY JOIN TERN?
  • Be part of a mission-driven team transforming the travel planning space

  • Work with a supportive, curious, and creative team

  • Own the infrastructure layer of a product used by thousands of travel advisors running their businesses

  • Competitive salary, equity, and benefits package

Tern is committed to building a team that represents people from many different backgrounds and life experiences, reflecting our worldview coinciding with the users and customers we serve across the world. We prefer that you apply, so think of our postings as the start of the conversation. Take the chance, you may be a wonderful add to our Tern team, even if you don't fully match every requirement on the job description.

Skills Required

  • Production reliability ownership (led incidents, implemented fixes, prevented recurrence)
  • End-to-end cloud migration ownership (GCP or comparable)
  • Fluency with Google Cloud Platform or comparable cloud and infrastructure-as-code
  • Experience maintaining PostgreSQL databases in production
  • Designing and operating observability: monitoring, alerting, instrumentation
  • High agency and autonomous ownership of reliability work
  • Practical use of AI tools in debugging, automation, or operational workflows (concrete examples)
  • Experience with Fivetran, BigQuery, or Hex in production data pipelines
  • Experience migrating from Heroku or another PaaS to GCP
  • Managed or coached infrastructure/infrastructure engineers
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
133 Employees
Year Founded: 2023

Similar Jobs

Coinbase Logo Coinbase

Site Reliability Engineer

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Easy Apply
Remote
USA
4700 Employees
218K-257K Annually

Dropbox Logo Dropbox

Site Reliability Engineer

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Remote
United States
2500 Employees
223K-302K Annually

NBCUniversal Logo NBCUniversal

Site Reliability Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
Orlando, FL, USA
68000 Employees

ServiceNow Logo ServiceNow

Site Reliability Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara, CA, USA
29000 Employees
166K-290K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account