Hex Jobs

Infra Engineer, Observability Whisperer

Hex

Infra Engineer, Observability Whisperer

Sorry, this job was removed at 12:11 a.m. (CST) on Friday, Feb 20, 2026

3 Locations

Remote or Hybrid

Artificial Intelligence • Big Data • Software • Analytics • Business Intelligence • Big Data Analytics

Hex empowers thousands of people to ask & answer new questions of data, and seamlessly share the results with everyone.

The Role

About the role

We build products people genuinely love. Our features are impactful, our business is growing, and … it’s pretty great!

We also love Datadog too – but let’s be honest: we have been operating on “ship it first, check the bill later”. And the bill has grown to the point we're actually looking for someone who can help us think through how to manage this in a more scalable way. We need a hero. A detective. Someone with a deep-seated love for logs, metrics, and most importantly, savings.

You are not just an Infra Engineer; you are an economic covert ops specialist. Your glorious mission is to make our Datadog spend dramatically and sustainably go down. We're talking down down. The bill should look like it's been body-slammed by a professional wrestler.

You will be embedded within the Infrastructure team, and will have the autonomy to look across every service to streamline and purge that which needs streamlining and purging. As you rack up wins, you'll increasingly become the person we introduce at company meetings as, "The reason we could spend $$ on that nice company offsite.”

What you will do

Mitigation of myriad metrics: Hunt down and decommission all high-cardinality custom metrics that no one actually uses, replacing them with sane, aggregated alternatives, or build a system that insulates us from this risk area entirely.
Liberation from legions of logs: Audit the log ingestion for every service. You'll work with engineering teams to tune logging levels, apply intelligent sampling and exclusion filters at the source (i.e., the agent), and implement better categorization and archiving strategies.
Analysis of Performance Monitoring (APM): Analyze our APM and trace ingestion and ensure it’s smartly used. You'll champion distributed tracing strategies that are both informative and economical.
Standardization: Use automation to enforce cost-saving policies across our entire fleet, ensuring developers can't accidentally check in a new, expensive monitoring configuration
Evangelization: Be the champion for cost-aware engineering. Create internal documentation, run "Datadog Dojo" workshops, and embed the mindset of "monitor what matters" across the entire engineering organization.

About you

3+ years as an Infrastructure, DevOps, or Site Reliability Engineer.
Expert-level, obsessive knowledge of Datadog's pricing model and platform architecture. You know how to read the usage report better than you know your own credit card statement.
Deep proficiency with AWS and Kubernetes.
Strong programming skills for infrastructure automation.
The courage to tell a founder or principal engineer that their favorite metric is financially irresponsible.

Bonus:

Experience with other monitoring/observability tools (Prometheus, Grafana, Honeycomb, Splunk) and a view on whether we should be using any of them to displace some Datadog functionality.
Experience implementing OpenTelemetry standards and agents for cost-effective vendor neutrality.
A proven track record of actually reducing cloud costs, not just talking about it.

Our stack

Our product is a web-based notebook and app authoring platform. Our frontend is built with Typescript and React, using a combination of Apollo GraphQL and Redux for managing application state and data. On the backend, we also use Typescript to power an Express/Apollo GraphQL server that interacts with Postgres, Redis, and Kubernetes to manage our database and Python kernels. Our backend is tightly integrated with our infrastructure and CI/CD, where we use a combination of Terraform, Helm, and AWS to deploy and maintain our stack.

In addition to our unique culture, Hex proudly offers a competitive total rewards package, including but not limited to, market-benched salary & equity, comprehensive health benefits, and flexible paid time off.

The salary range for this role is: Variable, depends on how much $$ you save

The salary range shown may be a reflection of additional factors such as geographical location and skill ranges/levels we’re open to. Placement in the salary range will be decided upon completion of the interview process, taking into account factors like leaving room for growth, internal fairness & parity, your demonstrated skills, and the depth of your experience. Our Recruiting team will be able to provide more details during the interview process.

By submitting an application the candidate consents to the use of their personal information in accordance with the Hex Privacy policy: https://learn.hex.tech/docs/trust/privacy-policy.

View all jobs at Hex

View Hex Profile

Report Job

Similar Jobs

Hex

Software Engineer

Artificial Intelligence • Big Data • Software • Analytics • Business Intelligence • Big Data Analytics

Remote or Hybrid

160 Employees

176K-220K Annually

Hex

Web Designer

Artificial Intelligence • Big Data • Software • Analytics • Business Intelligence • Big Data Analytics

Remote or Hybrid

160 Employees

170K-240K Annually

Hex

Security GRC Manager

Artificial Intelligence • Big Data • Software • Analytics • Business Intelligence • Big Data Analytics

Remote or Hybrid

160 Employees

221K-295K Annually

Hex

Software Engineer

Artificial Intelligence • Big Data • Software • Analytics • Business Intelligence • Big Data Analytics

Remote or Hybrid

160 Employees

176K-220K Annually

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Francisco, CA

160 Employees

Year Founded: 2019

What We Do

Hex is changing the way people work with data. Our platform makes analytics workflows more powerful, collaborative, and shareable. Hex solves key pain points with today's data and analytics tooling, and is loved by thousands of users all over the world for the beautiful UI, new superpowers, and boundless flexibility. We are a tight-knit crew of engineers, designers, and data aficionados. Our roadmap is full of big ideas and little details, and we would love your help bringing them to life. Hex has raised over $100m from great VCs and angels, giving us many years of runway and the ability to pay competitive salaries, offer great benefits, and provide meaningful equity.

Why Work With Us

We’re a team of builders. We spend our days designing and developing beautiful products, growing and supporting our user base, and helping each other succeed and learn. To do that, we've developed a culture focused on quality, craft, and speed -- with very few meetings, bias toward action, and taking care of each other through feedback and support.

Gallery

Hex Offices

Learn More

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

We have a truly hybrid workforce, with many employees working fully remotely, and others coming into one of our two offices (SF or NYC) 2-3 days per week.

Typical time on-site: Flexible

HQSan Francisco, CA

New York, NY

Learn more

View all jobs at Hex

View Hex Profile

Report Job