Thinking Machines Lab
Jobs at Thinking Machines Lab
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Recently posted jobs
Artificial Intelligence • Information Technology
Build, operate, and maintain research infrastructure (evaluation frameworks, RL training systems, experiment tracking, visualization). Develop scalable distributed pipelines, ensure reproducibility and observability, and partner with researchers and infrastructure teams to accelerate ML research and tooling adoption.
Artificial Intelligence • Information Technology
Build and operate ML research infrastructure: evaluation frameworks, RL training systems, experiment tracking, distributed evaluation pipelines, observability, and reproducibility tooling. Partner with researchers to identify bottlenecks, drive adoption, and integrate tools across infrastructure, data, and product stacks.
Artificial Intelligence • Information Technology
The Executive Business Partner will manage calendars, coordinate travel, support recruiting, and track projects for technical leaders in a fast-paced startup environment.
Artificial Intelligence • Information Technology
Lead financial strategy and modeling for major functions, own budgeting and forecasting, build compute cost measurement, analyze performance and investment trade-offs, partner cross-functionally, create executive and Board materials, monitor key metrics and drive financial insights to inform strategic decisions.
Artificial Intelligence • Information Technology
Own and debug low-level GPU network fabric (RDMA/RoCE, NVLink/NVSwitch) for large-scale training/inference. Build host instrumentation, triage cross-cloud networking issues, and drive escalations with cloud providers to maintain interconnect reliability at scale.
Artificial Intelligence • Information Technology
Diagnose and remediate hardware, firmware, and OS issues across large GPU clusters. Own drivers, kernel interfaces, diagnostics, and firmware lifecycle. Automate reliability monitoring, analyze error rates, engage vendors and manage RMAs, and write postmortems to reduce failures and improve fleet reliability for large-scale AI experiments.
Artificial Intelligence • Information Technology
Serve as the San Francisco office front desk and hospitality lead: greet and check in visitors, manage badges and building access, coordinate mail, vendors, and deliveries, support interviews and events, maintain meeting rooms and common areas, and proactively improve workplace visitor experience.
Artificial Intelligence • Information Technology
As a Research Product Manager, drive complex research products, translate technical ideas into plans, collaborate across disciplines, and maintain progress communication.
Artificial Intelligence • Information Technology
The role focuses on designing and implementing data curation techniques for AI pre-training datasets, ensuring quality, collaboration, and continuous improvement of data processing systems.
Artificial Intelligence • Information Technology
Lead and scale core accounting functions: month-end close, financial reporting, internal controls, audit readiness, payroll, AP/P2P, indirect tax, cash management, and accounting team building while implementing scalable systems and automation.
Artificial Intelligence • Information Technology
The Site Reliability Engineer will drive reliability for the Tinker platform, focusing on incident response, monitoring, and ensuring system resilience while collaborating across teams.
Artificial Intelligence • Information Technology
The Forward Deployed Engineer will support customers by diagnosing issues, improving products, building tools, and informing Tinker's roadmap.
Artificial Intelligence • Information Technology
The Research Engineer will design, optimize, and scale systems for large AI models, enhancing performance, reliability, and efficiency in model inference and deployment.
Artificial Intelligence • Information Technology
The role involves designing and optimizing distributed training systems for large models to enhance research efficiency and productivity in AI.
Artificial Intelligence • Information Technology
Design and optimize large-scale distributed training infrastructure, focusing on numerical methods to enhance efficiency and stability in AI model training.
Artificial Intelligence • Information Technology
Develop and maintain full stack applications focusing on APIs, UX, system reliability, security, and collaboration across teams.
Artificial Intelligence • Information Technology
Design, build, and operate GPU supercomputing environments, automate large GPU clusters, and collaborate with researchers to optimize performance and resources.
Artificial Intelligence • Information Technology
The Software Engineer at Tinker will develop the platform systems for billing, permissions, and data management. Responsibilities include designing authorization layers and managing end-to-end billing infrastructure, collaborating with teams, and ensuring compliance with enterprise requirements.
Artificial Intelligence • Information Technology
The Software Engineer, Data Infrastructure will design scalable systems for data processing and orchestration, collaborating with teams to enhance data quality and accelerate research.
Artificial Intelligence • Information Technology
Design and build infrastructure for scalable reinforcement learning, optimize RL training pipelines, and collaborate with researchers on production-grade solutions.



