Software Engineer, Inference

Reposted 12 Days Ago
San Francisco, CA
In-Office
150K-230K Annually
Mid level
Software
Production-Grade Unstructured Document Extraction
The Role
Develop and optimize low-latency inference services for OCR and multimodal models, focusing on performance engineering and model serving. Implement autoscaling and capacity planning while building performance dashboards.
Summary Generated by Built In

Overview


Pulse is tackling one of the most persistent challenges in data infrastructure: extracting accurate, structured information from complex documents at scale. We have a breakthrough approach to document understanding that combines intelligent schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools consistently fail.

We are a small, fast-growing team of engineers in San Francisco powering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. We are backed by tier 1 investors and growing quickly.

What makes our tech special is our multi-stage architecture:

  • Layout understanding with specialized component detection models

  • Low-latency OCR models for targeted extraction

  • Advanced reading-order algorithms for complex structures

  • Proprietary table structure recognition and parsing

  • Fine-tuned vision-language models for charts, tables, and figures

If you are passionate about the intersection of computer vision, NLP, and data infrastructure, your work at Pulse will directly impact customers and shape the future of document intelligence.

What we are looking for

  • 5 days in-office at our San Francisco office

  • Eager to learn and adapt quickly

  • Prior startup or founding experience is a plus

What we are looking for

  • 5 days in-office at our San Francisco office

  • Eager to learn and adapt quickly

  • Prior startup or founding experience is a plus

About the Role
Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and autoscaling across single-tenant and multi-tenant environments.

Responsibilities

  • Build inference services with smart batching and caching

  • Optimize kernels, tokenization, and model graphs

  • Evaluate vLLM, TensorRT LLM, and Triton tradeoffs

  • Implement autoscaling and admission control with clear SLOs

  • Own performance dashboards and capacity planning

Requirements

  • 3+ years in performance engineering or ML systems

  • Strong Python, plus C++ or CUDA exposure

  • Experience with GPU profiling and model serving

Nice to have

  • Experience reducing p95 and cost in production ML systems

Sponsorship
Sponsorship available.

Compensation and benefits
Competitive base salary plus equity, performance-based bonus, relocation assistance for Bay Area moves, daily meal stipend, medical, vision, and dental coverage.

Top Skills

C++
Cuda
Gpu Profiling
Python
Tensorrt
Triton
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
26 Employees
Year Founded: 2024

What We Do

Production-Grade Unstructured Document Extraction

Similar Jobs

MongoDB Logo MongoDB

Senior Software Engineer

Big Data • Cloud • Software • Database
Easy Apply
Hybrid
Palo Alto, CA, USA
5550 Employees
118K-231K Annually

Cohere AI Logo Cohere AI

Staff Software Engineer

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
In-Office
5 Locations
224 Employees

Cohere AI Logo Cohere AI

Software Engineer

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
In-Office
5 Locations
224 Employees
In-Office
Mountain View, CA, USA
2359 Employees
170K-216K Annually

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account