AI Infrastructure & Experience Engineer

Posted 10 Days Ago
Mountain View, CA, USA
In-Office
70-79 Hourly
Mid level
Marketing Tech • Business Intelligence
The Role
Deploy, optimize, and integrate LLMs and multimodal models on local GPU/ARM64 hardware. Develop custom CUDA kernels, tune inference (TTFT, tokens/sec), connect backends to orchestration layers, build prototypes and frontends, and implement device communication protocols for local AI compute.
Summary Generated by Built In

FocusKPI is seeking an AI Infrastructure & Experience Engineer to join one of our clients, a high-tech SaaS company. 

Work Location: Mountain View, CA (Onsite role, 5 days/week onsite)
Duration: 4-month contract 
Pay Range: $70 - 79/hr
**No C2C resumes are considered**
 

Position Responsibilities:

  • Inference Optimization: Deploy and tune multiple LLMs and generative multimodal models on local inference hardware. Optimize performance metrics (TTFT, tokens/sec) via model quantization, caching strategies, and architecture-specific adjustments.
  • Systems Engineering & CUDA: Leverage deep knowledge of the CUDA environment to build custom kernels, ensuring maximum utilization of the low-cost GPU compute.
  • Orchestration & Integration: Seamlessly bridge inference backends with orchestration layers (LiteLLM, Ollama, etc.) and frontends like OpenWebUI.
  • Rapid Prototyping: Build functional, high-fidelity demos showcasing model memory capabilities, agentic workflows, and context-aware web search.
  • Peripheral Connectivity: Implement communication protocols to bridge local AI compute with peripheral devices, including smart TVs, household appliances, and XR hardware.
Requirements/Technical qualifications:
  • Recent experience in model optimization is required
  • Hardware & Compute: Proven experience with NVIDIA ecosystems and ARM64 architecture.
  • Systems Programming: Advanced proficiency in C++, Python, and Rust. Deep familiarity with CUDA and the ability to author/debug custom CUDA kernels for compute-intensive tasks.
  • AI/ML Frameworks: Extensive experience with modern inference engines (llama.cpp, TensorRT-LLM, Ollama) and orchestration frameworks (LiteLLM).
  • Software Engineering: Robust understanding of asynchronous programming (FastAPI), containerization (Docker/Kubernetes), sandbox environments, and API design for low-latency communication.
  • Full-Stack Prototyping: Ability to quickly spin up modern frontend UIs (React, Next.js, or similar) to present AI-driven intelligence to end users.
  • Communication Protocols: Familiarity with WebSockets, gRPC, and REST for device-to-device communication in a local network environment.
  • Overall Mandatory skills required: Model optimization recent exparience, Interference Optimization, NVIDIA ecosystems, Custom CUDA Kernel Development, ARM64 architecture, Python
Ideal Candidate Profile:
  • A minimum of 3 years of relevant industry experience is required
  • The "Builder" Mindset: You are energized by the prospect of building proofs-of-concept in days rather than months. You thrive in environments where speed and creativity are paramount.
  • Problem Solver: You approach unsolved, messy engineering challenges with enthusiasm rather than trepidation.
  • Architectural Vision: You see the "big picture" of how AI becomes part of consumers' daily lives, not just how the model generates text.
  • Agile & Adaptable: You are comfortable working in a fast-paced environment where priorities shift based on the results of rapid experimentation.
  • Degree in Computer Science, Machine Learning, or Artificial Intelligence Specialization preferred, but not required

**No C2C resumes are considered**
 

Thank you!

FocusKPI Hiring Team

Founded in 2010, FocusKPI, Inc. (FocusKPI) is a data science and technology firm specializing in predictive analytics practice and methodologies. FocusKPI is a US company headquartered in Silicon Valley, California, with an East Coast office in Boston, Massachusetts.

Skills Required

  • Recent experience in model optimization
  • Proven experience with NVIDIA ecosystems
  • Proven experience with ARM64 architecture
  • Advanced proficiency in C++, Python, and Rust
  • Deep familiarity with CUDA and ability to author/debug custom CUDA kernels
  • Extensive experience with inference engines (llama.cpp, TensorRT-LLM, Ollama)
  • Experience with orchestration frameworks (LiteLLM)
  • Robust understanding of asynchronous programming (FastAPI)
  • Containerization experience (Docker, Kubernetes)
  • Ability to build frontends with React or Next.js
  • Familiarity with WebSockets, gRPC, and REST
  • Minimum of 3 years of relevant industry experience
  • Onsite work in Mountain View, CA (5 days/week)
  • Degree in Computer Science, Machine Learning, or AI specialization
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
31 Employees
Year Founded: 2010

What We Do

FocusKPI brings deep domain experience in business and marketing analytics to enable our clients to unlock growth-driving insights from data. We help our clients develop action-oriented analytics and data science products that are customized to company-specific needs and integrated into their platforms for ongoing use. Our Accelerators, a toolbox of frameworks and models built over 10+ years, fast-track projects by capitalizing on our experience. Capabilities: Predictive Analytics AI / Machine Learning Measurement Text Analysis Key Industries Served: Retail Media B2B & B2C Sales, Marketing, and Merchandising Software & Applications

Similar Jobs

CoreWeave Logo CoreWeave

Accounting Manager

Cloud • Information Technology • Machine Learning
In-Office
Sunnyvale, CA, USA
1450 Employees
127K-168K Annually

Magnite Logo Magnite

Senior Accountant

AdTech • Big Data • Digital Media • Software
Hybrid
Los Angeles, CA, USA
950 Employees
95K-105K Annually

Atlassian Logo Atlassian

Technical Revenue Accounting Sr. Manager

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
In-Office or Remote
San Francisco, CA, USA
11000 Employees

Cox Enterprises Logo Cox Enterprises

Client Service Quality Supervisor (Manheim)

Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Hybrid
Riverside, CA, USA
50000 Employees
73K-110K Annually

Similar Companies Hiring

ClickMint Thumbnail
AdTech • eCommerce • Marketing Tech • Generative AI
Malibu, CA
9 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account