AI Inference Engineer

Reposted 12 Days Ago
Be an Early Applicant
Burlingame, CA
In-Office
Senior level
Hardware • Machine Learning • Software
The Role
The AI Inference Engineer will port, optimize, and benchmark AI models for Quadric's platform while providing support and documentation.
Summary Generated by Built In

Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.

Role

The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks.

Responsibilities
  • Quantize, prune and convert models for deployment
  • Port models to Quadric platform using Quadric toolchain
  • Optimize inference deployment for latency, speed
  • Benchmark and profile model performance and accuracy
  • Collaborate across related areas of the AI inference stack to support team and business priorities
  • Develop tools to scale and speed up the deployment
  • Make Improvement to SDK and runtime
  • Provide technical support and documents to customers and developer community

Requirements
  • Bachelor’s or Master’s in Computer Science and/or Electric Engineering.
  • 5+ years of experience in AI/LLM model inference and deployment frameworks/tools
  • experience with model quantization (PTQ, QAT) and tools
  • experience with model accuracy measures
  • experience with model inference performance profiling
  • experience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacpp
  • Proficiency in C/C++ and Python
  • Demonstrate good capability in problem solving, debug and communication

Benefits
  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Life Insurance (Basic, Voluntary & AD&D)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Family Leave (Maternity, Paternity)
  • Short Term & Long Term Disability
  • Training & Development
  • Work From Home
  • Free Food & Snacks
  • Stock Option Plan

Top Skills

C+++
Huggingface-Transformer
Llamacpp
Neural-Compressor
Onnxruntime
Python
PyTorch
Vllm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Burlingame, CA
38 Employees
Year Founded: 2017

What We Do

Quadric has built a unified hardware/software architecture optimized for on-device machine learning inference. Only the Quadric GPNPU (general purpose neural processing unit) delivers high ML inference performance while also running C++ code without forcing the developer to artificially partition application code between two or three different kinds of processors. Quadric's GPNPU is a licensable processor IP core that scales from 1 to 64 TOPs and seamlessly intermixes scalar, vector and matrix code.

Similar Jobs

Capital One Logo Capital One

Artificial Intelligence Engineer

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
4 Locations
55000 Employees
197K-246K Annually

Red Hat Logo Red Hat

Forward Deployed Engineer, AI Inference (vLLM and Kubernetes)

Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation
In-Office or Remote
2 Locations
20000 Employees
193K-319K Annually

CoreWeave Logo CoreWeave

Software Engineer

Cloud • Information Technology • Machine Learning
In-Office
2 Locations
1450 Employees
92K-135K Annually
Easy Apply
In-Office or Remote
San Francisco, CA, USA
96 Employees
140K-200K Annually

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account