At LangChain, our mission is to make intelligent agents ubiquitous. We build the foundation for agent engineering in the real world, helping developers move from prototypes to production-ready AI agents that teams can rely on. We began as widely adopted open-source tools and have grown to also offer a platform for building, evaluating, deploying, and operating agents at scale.
With $125M raised at Series B from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we’re at a stage where we’re continuing to develop new products, growth is accelerating, and all team members have meaningful impact on what we build and how we work together. LangChain is a place where your contributions can shape how this technology shows up in the real world.
Today, our platform includes LangSmith (Observability, Evaluation, Deployment, Fleet, and Sandboxes), our open source frameworks (LangChain, LangGraph, and Deep Agents), and the newly launched LangSmith Engine for autonomous agent improvement. We have 100M+ monthly open source downloads, 6,000+ active LangSmith customers, and 5 of the Fortune 10 use LangSmith in production (+ 35% of the Fortune 500 overall), including teams at Klarna, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, LinkedIn, Monday.com, Nvidia, and Bridgewater.
About the teamThe LangSmith team owns and builds LangChain's core platform for observability, evaluation, and production reliability of AI systems. From tracing and annotation to run rules, evaluations, and beyond, this team owns LangSmith end-to-end. If you want to define what great AI observability looks like at production scale, this is where that work gets done.
About the roleThis role sits at the core of LangSmith: you'll own the ingestion systems, query systems, and the API, SDK, and CLI surfaces that thousands of development teams use every day. You'll work at the intersection of distributed systems and developer experience, on infrastructure that teams across the industry depend on.What you'll do:
Build and scale critical systems: design and operate high-throughput, data-intensive ingestion and trace-query systems supporting LangSmith, built on SmithDB, our purpose-built database for agent observability. Build monitoring, alerting, and automated recovery so the pipeline stays resilient.
Set API, SDK, and CLI standards: define and enforce the standards, tooling, and CI that power SDK generation across Python, TypeScript, Go, and Java; keep our developer surfaces consistent, high-quality, and self-served across feature teams.
Own integrations: build new integrations and maintain existing ones so it's easy to use LangSmith with any AI framework, agent, or tool — keeping us framework-agnostic
Solve complex problems: debug performance bottlenecks, optimize database queries, and architect solutions for distributed-system challenges
Respond to incidents: participate in an on-call rotation focused on post-incident learning, automation, and prevention
Many of these will apply to you — we don't expect every box checked.
Platform engineering: hands-on experience designing and running data-intensive systems at scale
Developer experience: a track record of building high-quality, widely-adopted CLIs, SDKs, or API standards that developers actually enjoy using
Database expertise: production experience with OSS datastores (PostgreSQL, Redis)
Backend languages: Strong backend software engineering skills with production-level experience in Go, Python, or TypeScript.
Infrastructure expertise: solid knowledge of cloud object storage, Kubernetes, containerized infrastructure, and cloud platforms (GCP, AWS)
Observability mastery: hands-on experience with observability stacks (Datadog, Prometheus/Grafana, OpenTelemetry, or similar)
Operational mindset and high agency: "you build it, you run it, you own it," with a focus on sustainable practices
Experience: 5+ years building and operating production systems, developer-facing APIs, or both
Strong experience with Java
Knowledge of columnar file, memory formats and OLAP databases
Background in high-growth startups
Location: This role is fully remote within Europe, excluding France.
Compensation & Benefits
We offer competitive compensation that includes base salary, meaningful equity, and benefits such as health and dental coverage, flexible vacation, a 401(k) plan, and life insurance. Actual compensation will vary based on role, level, and location. For team members in the EU and UK, we provide locally competitive benefits aligned with regional
Compensation Philosophy:
We offer competitive compensation that includes base salary, variable compensation for relevant roles, meaningful equity, benefits, and perks. Actual compensation and offerings will vary based on role, level, and location. Team members in the EU, UK, and APAC receive locally competitive benefits aligned with regional norms and regulations.
BenefitsBenefits include medical, dental, and vision coverage, flexible vacation, a 401(k) plan, meals on in-office days in the US and more.
Skills Required
- Designing and operating data-intensive, high-throughput ingestion and trace-query systems
- Building and maintaining CLIs, SDKs, and API standards (developer experience)
- Production experience with PostgreSQL
- Production experience with Redis
- Production-level backend development experience in Go, Python, or TypeScript
- Experience defining SDK generation and tooling across multiple languages
- Knowledge of cloud object storage and containerized infrastructure
- Kubernetes experience and familiarity with cloud platforms (GCP or AWS)
- Hands-on experience with observability stacks (Datadog, Prometheus/Grafana, OpenTelemetry)
- Operational mindset: on-call participation, monitoring, alerting, incident response and automation
- 5+ years building and operating production systems or developer-facing APIs
- Strong experience with Java
- Knowledge of columnar file/memory formats and OLAP databases
- Background in high-growth startups
What We Do
LangChain is the platform for building reliable agents. Our products power top engineering teams — from fast-growing startups like Lovable, Mercor, and Clay to global brands including AT&T, Home Depot, and Klarna. LangGraph is a low-level orchestration framework for building controllable agents and long-running workflows. It’s used in production by teams at Replit, Uber, LinkedIn, GitLab, and more. LangSmith offers unified evaluation and monitoring to help developers debug, evaluate, and improve their agents at scale. LangChain provides hundreds of integrations and composable components, making it easy to connect with the latest models, tools, and databases — with minimal engineering overhead. Together, these tools help teams build, deploy, and manage enterprise-grade agents, faster.
.png)








