quadric.io

Senior Performance Architect

Reposted 21 Days Ago

Be an Early Applicant

Burlingame, CA, USA

In-Office

110K-270K Annually

Senior level

Hardware • Machine Learning • Software

The Role

The Senior Performance Architect will analyze and optimize performance across software and hardware, implement solutions, and collaborate with technical teams to improve product outcomes.

Summary Generated by Built In

Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.

As a Senior Performance Architect, you will be the critical link between software and hardware, responsible for understanding how code executes on Quadric's architecture and identifying opportunities for optimization. You will analyze workloads from high-level C++ and Python down through generated assembly to pinpoint performance bottlenecks. This is a hands-on role: beyond analysis, you will prototype solutions yourself - whether that means writing optimized code, modifying compiler passes, or building proof-of-concept implementations to validate proposed fixes before handing off to the appropriate team for productization.

This role requires regular work from the Quadric office in Burlingame, CA, a minimum of 2–3 days per week, with some weeks requiring more days onsite based on business needs. Candidates must be able to commute to the office.

Responsibilities

Analyze application performance across the full stack: C++/Python source, compiler output, assembly, and hardware execution
Identify and localize performance bottlenecks to specific code regions, assembly sequences, or architectural limitations
Implement proof-of-concept fixes and optimizations to validate proposed solutions before broader rollout
Develop and maintain profiling infrastructure, benchmarks, and performance regression tests
Collaborate with compiler engineers to improve code generation and optimization passes
Work with hardware architects to identify microarchitectural improvements and validate performance models
Create performance models that predict workload behavior and guide optimization priorities
Document findings and communicate performance insights to both technical and non-technical stakeholders
Support customer engagements by analyzing their workloads and recommending optimizations

Requirements

BS/MS in Computer Science, Computer Engineering, or Electrical Engineering with 5+ years of performance analysis experience
Strong proficiency in C++ and Python; ability to read, reason about, and write optimized code at the assembly level
Hands-on mentality: comfortable implementing proof-of-concept solutions, not just identifying problems
Deep understanding of computer architecture: pipelines, caches, memory hierarchies, SIMD/vector execution
Experience with profiling tools (perf, VTune, custom trace analysis) and performance debugging methodologies
Ability to trace performance issues from application behavior down to microarchitectural root causes
Strong analytical and problem-solving skills with attention to detail
Excellent communication skills; ability to explain complex performance issues to diverse audiences
Experience working cross-functionally with compiler, runtime, and hardware teams

Nice to Have

Experience with ML/AI workloads and frameworks (PyTorch, TensorFlow, ONNX)
Background in compiler development or code generation
Experience with GPU, DSP, or custom accelerator architectures
Familiarity with cycle-accurate simulation and performance modeling tools

Expected Outcomes in First 12 Months

Establish systematic performance analysis methodology and tooling for Quadric's software stack
Identify and drive resolution of top performance bottlenecks in key customer workloads
Build performance models that accurately predict workload behavior within 10-15% of actual measurements
Become the go-to expert for performance questions spanning the hardware/software boundary

Benefits

At Quadric, we value Integrity, Humility, and Happiness. What we expect from one another is simple and clear: Initiative, Collaboration, and Completion. We are a collaborative team focused on building something extraordinary in the edge computing space.

Competitive salary and meaningful equity
Medical, dental, and vision plan options starting on day one
401(k) retirement plan
Flexible paid time off (unlimited, non-accrual) to support work-life balance
When working in-office, enjoy company-provided lunches and a stocked kitchen
Convenient office location within walking distance of the Caltrain station
Support for commuting, including monthly parking or Caltrain passes
Downtown Burlingame office location, close to shops, cafes, and local amenities
A politics-free, highly collaborative environment where talented people can do their best work and make an immediate impact
The opportunity to build long-term career relationships in a company that values strong personal connections alongside professional excellence

The base salary range for this position is $110,000 to $270,000. This range reflects the full span of levels and geographies at which Quadric hires for this role. The actual base salary offered will depend on a number of factors, including the specific level of the role, years and depth of relevant experience, technical skills and competencies, the criticality of the role to the business, internal equity, and work location. In addition to base salary, this role is eligible for equity and a discretionary annual performance bonus as applicable to the role and level.

Quadric also offers the generous benefits package outlined above and other programs designed to support your health and wellbeing.

Founded in 2016 and based in downtown Burlingame, California, Quadric is building the world’s first supercomputer designed for the real-time needs of edge devices. Quadric aims to empower developers in every industry with superpowers to create tomorrow’s technology, today. The company was co-founded by technologists from MIT and Carnegie Mellon, who were previously the technical co-founders of the Bitcoin computing company 21.

Quadric is proud to be an equal opportunity employer. We are committed to creating an inclusive environment where people from all backgrounds can do their best work. We consider all qualified applicants without regard to race, color, religion, sex, gender identity or expression, sexual orientation, national origin, age, disability, veteran status, or any other protected characteristic under applicable law.

If this role resonates with you, we encourage you to apply even if your experience does not perfectly match every qualification. We value potential, curiosity, and a willingness to learn just as much as direct experience. Skills and growth come in many forms, and we would love to hear your story.

By submitting an application, you acknowledge that Quadric will collect and process your personal information as part of the hiring process. Please review our Privacy Policy to understand how we handle your data.

Skills Required

BS/MS in Computer Science, Computer Engineering, or Electrical Engineering
5+ years of performance analysis experience
Strong proficiency in C++ and Python
Ability to read, reason about, and write optimized code at the assembly level
Deep understanding of computer architecture
Experience with profiling tools and performance debugging methodologies
Strong analytical and problem-solving skills
Excellent communication skills; ability to explain complex issues

View all jobs at quadric.io

View quadric.io Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Burlingame, CA

38 Employees

Year Founded: 2017

What We Do

Quadric has built a unified hardware/software architecture optimized for on-device machine learning inference. Only the Quadric GPNPU (general purpose neural processing unit) delivers high ML inference performance while also running C++ code without forcing the developer to artificially partition application code between two or three different kinds of processors. Quadric's GPNPU is a licensable processor IP core that scales from 1 to 64 TOPs and seamlessly intermixes scalar, vector and matrix code.