Researcher: Inference

Sorry, this job was removed at 06:08 p.m. (CST) on Thursday, Sep 11, 2025
Be an Early Applicant
San Francisco, CA
In-Office
Artificial Intelligence • Software
The Role
About Cartesia

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

What You'll Do

  • Conduct cutting-edge research to improve the efficiency, scalability, and robustness of inference for state-of-the-art AI models across various modalities, including audio, text, and vision.

  • Design and optimize inference pipelines to balance performance, latency, and resource utilization in diverse deployment environments, from edge devices to cloud systems.

  • Develop and implement novel techniques for efficient model execution, including quantization, pruning, sparsity, distillation, and hardware-aware optimizations.

  • Explore speculative decoding methods, caching strategies, and other advanced techniques to reduce latency and computational overhead during inference.

  • Investigate trade-offs between model quality and inference efficiency, designing architectures and workflows that meet real-world application requirements.

  • Prototype and refine methods for stateful inference, streaming inference, and task-specific conditioning to enable new capabilities and use cases.

  • Collaborate closely with cross-functional teams to ensure inference research seamlessly integrates into production systems and applications.

What You'll Bring

  • Deep expertise in optimizing inference for machine learning models, with a strong understanding of techniques such as speculative decoding, model compression, low-precision computation, and hardware-specific tuning.

  • Strong programming skills in Python, with experience in frameworks like PyTorch, TensorFlow, or ONNX, and familiarity with inference deployment tools such as TensorRT or TVM.

  • Knowledge of hardware architectures and accelerators, including GPUs, TPUs, and edge devices, and their impact on inference performance.

  • Experience in designing and evaluating scalable, low-latency inference pipelines for production systems.

  • A solid understanding of the trade-offs between model accuracy, latency, and computational efficiency in deployment scenarios.

  • Strong problem-solving skills and a passion for exploring innovative techniques to push the boundaries of real-time and resource-constrained inference.

Nice-to-Haves

  • Experience with speculative decoding and other emerging techniques for improving inference performance.

  • Familiarity with stateful or streaming inference techniques.

  • Background in designing hybrid architectures or task-specific models optimized for inference.

  • Early-stage startup experience or a track record of developing and deploying efficient inference systems in fast-paced R&D environments.

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

Our perks

🍽 Lunch, dinner and snacks at the office.

🏥 Fully covered medical, dental, and vision insurance for employees.

🏦 401(k).

✈️ Relocation and immigration support.

🦖 Your own personal Yoshi.

Similar Jobs

Anduril Logo Anduril

Technical Operations Engineer, Bolt

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Costa Mesa, CA, USA
6000 Employees
113K-1M Annually

CrowdStrike Logo CrowdStrike

Infrastructure Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
2 Locations
10000 Employees

CrowdStrike Logo CrowdStrike

Patent Attorney (Remote)

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
USA
10000 Employees

CrowdStrike Logo CrowdStrike

Security Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
2 Locations
10000 Employees
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
33 Employees
Year Founded: 2023

What We Do

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Try Sonic at https://play.cartesia.ai and join our Discord at https://discord.com/invite/gAbbHgdyQM.

Similar Companies Hiring

Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account