Technical Lead, Runtime Software/Hardware (Spatial AI Accelerator)

Reposted 19 Days Ago
Be an Early Applicant
San Jose, CA, USA
In-Office
Senior level
Artificial Intelligence • Information Technology • Software • Database • Generative AI
The Role
Lead the design and implementation of runtime systems for an AI accelerator, overseeing architecture, performance tuning, and team mentorship.
Summary Generated by Built In

Who we are:

Persimmons is building the infrastructure that will power the next decade of AI. Founded in 2023 by veteran technologists from the worlds of semiconductors, AI systems, and software innovation, We’re on a mission to enable smarter devices, more sustainable data centers, and entirely new applications the world hasn’t imagined yet.

Why join us:

We’re growing fast and looking for bold thinkers, builders, and curious problem-solvers who want to push the limits of AI hardware and software. If you're ready to join a world-class team and play a critical role in making a global impact - we want to talk to you.

Summary of Role:

Persimmons.ai seeks a multidisciplinary Technical Lead for runtime software/hardware and compiler integration, focused on our next-generation custom spatial AI accelerator. You will architect and guide the runtime system bridging compiler, host, driver, device firmware, and control hardware: enabling high-performance, robust, and scalable execution of modern AI workloads.

This is a hands-on and technical leadership role spanning system design, cross-stack engineering, technical mentorship, and collaboration with compiler, ML framework, and hardware teams.

What you’ll do:

  • Architect, design, and implement the runtime stack for Persimmons' custom spatial accelerator, covering host drivers, device runtime, and hardware/firmware control loops.
  • Lead technical direction and decisions for runtime–hardware interface, device work and command queue infrastructure, and memory management.
  • Coordinate with compiler/backend, ML systems, and hardware architects to ensure seamless end-to-end ML model execution.
  • Define and co-design hardware support features essential to runtime: queueing structures, synchronization primitives, interrupt/event signaling, dispatching and orchestrating ML workloads on spatial execution fabric.
  • Drive performance analysis, development tools for tracing, bottleneck identification, and runtime-level optimizations for latency, throughput, and hardware utilization.
  • Build and mentor a cross-disciplinary engineering team focused on runtime and system validation—establishing best practices, technical standards, and robust software-hardware collaboration.
  • Champion efficient tooling, simulation/emulation environments, and test infrastructure for system validation and robust runtime dev/debug.

Requirements

*We do not expect candidates to meet all of the requirements listed below; strong candidates will demonstrate expertise in several key areas.*

  • Deep experience architecting runtime software, device firmware, hardware interfaces, or control systems for AI accelerators and/or high-performance SoCs.
  • Hands-on expertise developing drivers, resource managers, command/queue control, and dispatching and synchronization primitives (queues, barriers, event notifications) for custom hardware.
  • Strong understanding of C/C++ multi-threaded programming and concurrent system design, including experience developing and debugging software that leverages threads, synchronization primitives, and parallel runtime constructs to maximize hardware utilization and performance in latency- and throughput-sensitive environments.
  • Solid understanding of hardware–software co-design principles: memory hierarchies, DMA engines, interconnects, job scheduling, on-device synchronization.
  • Experience integrating kernel libraries into device runtime stacks—connecting optimized compute kernels (such as SIMD operations and common AI operator libraries) to runtime software through seamless invocation and well-defined APIs, efficient scheduling and memory/resource management.
  • Experience with modern large language model (LLM) inference servers and serving stacks (e.g., vLLM, TensorRT-LLM, Triton Inference Server, Hugging Face Text Generation Inference, Ray Serve), including their architecture, runtime scheduling, memory management, batching, streaming, and distributed deployment. Understanding of how runtime design, kernel integration, and hardware acceleration impact performance, scalability, and latency in LLM serving workloads.
  • Experience with system-level performance tuning, debugging complex hardware–software interactions, and building scalable test/validation infrastructure.
  • High level of understanding and 5+ years of experience with in C/C++; familiarity with hardware description languages (Verilog/VHDL/SystemVerilog), or firmware development is a strong plus.
  • Demonstrated fluency with modern AI tools and workflows (e.g., leveraging AI assistants for research, analysis, or productivity).
  • Drive for innovation—keeping up with new architectures, techniques, and runtime models in ML or spatial computing.

Benefits
  • Competitive salary and benefits package.
  • Flexible PTO
  • 401k

Please note: Our organization does not accept unsolicited candidate submissions from external recruiters or agencies. Any such submissions, regardless of form (including but not limited to email, direct messaging, or social media), shall be deemed voluntary and shall not create any express or implied obligation on the part of the organization to pay any fees, commissions, or other compensation. Direct contact of employees, officers, or board members regarding employment opportunities is strictly prohibited and will not receive a response.

Skills Required

  • Deep experience architecting runtime software, device firmware, hardware interfaces for AI accelerators
  • Hands-on expertise developing drivers and resource managers for custom hardware
  • Strong understanding of C/C++ multi-threaded programming and concurrent system design
  • Solid understanding of hardware-software co-design principles
  • Experience integrating kernel libraries into device runtime stacks
  • Experience with modern LLM inference servers and serving stacks
  • Experience with system-level performance tuning and debugging complex hardware-software interactions
  • 5+ years of experience with C/C++
  • Familiarity with hardware description languages is a strong plus
  • Drive for innovation in ML or spatial computing
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Jose, California
17 Employees
Year Founded: 2023

What We Do

From custom silicon to intelligent algorithms, we’re breaking through the bottlenecks holding AI back, delivering orders-of-magnitude more performance, anywhere it’s needed. Persimmons, Inc. is on a mission to redefine what’s possible in AI inference computing. We design breakthrough full-stack solutions. From custom silicon to cutting-edge algorithms that shatter today’s performance limits and unlock AI’s true potential. Our next-generation inference platform delivers orders-of-magnitude higher compute efficiency, scaling effortlessly from the edge to the largest HPC environments. By fusing high-performance architecture with seamless software integration and proprietary, optimized algorithms, we make Generative AI faster, and dramatically more cost-effective. Founded in 2023 by veteran technologists from the worlds of semiconductors, AI systems, and software innovation, Persimmons is building the infrastructure that will power the next decade of AI. Enabling smarter devices, more sustainable data centers, and entirely new applications the world hasn’t imagined yet. We’re growing fast and looking for bold thinkers, builders, and problem-solvers who want to push the limits of AI hardware and software. If you’re ready to make a global impact, join us!

Similar Jobs

Zscaler Logo Zscaler

Staff SWE, Provisioning Platform

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
San Jose, CA, USA
8697 Employees
130K-185K Annually

Zscaler Logo Zscaler

Sr. Staff SWE, Provisioning Platform

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Hybrid
San Jose, CA, USA
8697 Employees
154K-220K Annually

Micron Technology Logo Micron Technology

Design Engineer

Artificial Intelligence • Hardware • Information Technology • Machine Learning
In-Office
San Jose, CA, USA
45000 Employees
116K-246K Annually

Micron Technology Logo Micron Technology

Principal Engineer

Artificial Intelligence • Hardware • Information Technology • Machine Learning
In-Office
San Jose, CA, USA
45000 Employees
187K-318K Annually

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York City, NY
100 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account