Lead Machine Learning Engineer, Inference & Performance

Posted Yesterday
Be an Early Applicant
Hiring Remotely in USA
Remote
159K-250K Annually
Senior level
Artificial Intelligence • Big Data • Cloud • Information Technology • Machine Learning
The Role
Design, optimize, and operate production LLM inference and training pipelines. Improve latency, throughput, and GPU utilization using techniques like batching, quantization, FlashAttention, and kernel-level profiling. Deploy and autoscale multiple models on shared GKE GPU clusters, consult with clients on performance and cost requirements, and carry prototypes to robust, scalable production services.
Summary Generated by Built In
About Egen: 
 
Egen is a fast-growing and entrepreneurial company with a data-first mindset. We bring together the best engineering talent working with the most advanced technology platforms, including Google Cloud and Salesforce, to help clients drive action and impact through data and insights. We are committed to being a place where the best people choose to work so they can apply their engineering and technology expertise to envision what is next for how data and platforms can change the world for the better. We are dedicated to learning, thrive on solving tough problems, and continually innovate to achieve fast, effective results. If this describes you, we want you on our team.
 
Want to learn more about life at Egen? Check out these resources in addition to the job description.
 
Meet Egen
Life at Egen
Culture and Values at Egen
Career Development at Egen
Benefits at Egen
 
About the opportunity: 
 
As a Senior AI Engineer, you will be at the forefront of our Generative AI initiatives. We treat AI as a software engineering discipline. You will be responsible for the full lifecycle of our AI features—specifically document intelligence and RAG pipelines—taking them from initial prototype to robust, scalable production services. You will solve for real-world constraints like latency, error handling, and cost optimization.
 
You’ll collaborate with a diverse range of clients to translate business needs into high-performance AI architectures. This role requires a blend of deep technical expertise in LLMs and a disciplined Software Engineering approach to ensure our solutions are robust, ethical, and scalable.

What You Will Do:

  • Optimize Inference: Build and tune production LLM serving with vLLM and SGLang—maximizing throughput and minimizing latency through batching, paged attention, quantization, and KV-cache strategies

  • Profile & Accelerate Training: Instrument and profile training runs to find bottlenecks, then resolve them with the right attention implementations (e.g. FlashAttention) tuned to the underlying hardware (H200, GB200)

  • Engineer for the Hardware: Apply a working understanding of GPU architecture and attention internals to choose the right approach per accelerator, rather than relying on defaults

  • Serve at Scale: Deploy and operate multiple models within shared GPU clusters on GKE, with autoscaling, efficient bin-packing, and graceful handling of mixed workloads

  • Drive Efficiency: Own GPU utilization as a first-class metric—measure it, improve throughput-per-dollar, and continuously raise the ceiling on what our fleet can deliver

  • Collaborate & Consult: Work directly with clients to understand performance, latency, and cost requirements, and translate them into pragmatic serving and training architectures

Your Technical Toolkit:

  • Core Languages: Mastery of Python and shell scripting; comfort reading and reasoning about lower-level (CUDA-adjacent) performance code is a strong plus

  • Inference Frameworks: Hands-on experience with vLLM, SGLash, or comparable high-performance serving stacks

  • GPU & Model Internals: Solid grasp of GPU architecture, the fundamentals of LLM inference, and the attention mechanism—including where the bottlenecks live and how FlashAttention and similar techniques address them across hardware generations (H200, GB200)

  • Profiling: Fluency with profiling tools to diagnose training and inference bottlenecks (compute-bound vs. memory-bound, kernel-level analysis)

  • Infrastructure: Strong Kubernetes (GKE) experience—deploying and autoscaling multiple models on shared GPU clusters on Google Cloud

  • Mindset: A strong software engineering foundation—you write clean, maintainable code, measure before optimizing, and understand the full SDLC

Basic Qualifications:

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field

  • 5+ years of experience in ML/AI engineering, with a meaningful portion focused on performance, infrastructure, or systems

  • Proven track record of deploying and optimizing models in a production environment

  • Demonstrated experience profiling and improving GPU utilization for training and/or inference

  • Experience with Classic Machine Learning (neural nets, training, tuning) is a strong plus

  • Knowledge of Data Engineering and SQL

Personal Attributes:

  • Ownership: You take pride in your work and see optimizations through from profile to production

  • Curiosity: Hardware and serving frameworks change fast; you are a lifelong learner who stays ahead of the curve

  • Rigor: You measure before you optimize and let data, not intuition, guide where you spend effort

  • Consultative Spirit: You enjoy interacting with clients and can translate technical complexity into business value

  • Ethics: You prioritize responsible AI development and data privacy

Compensation & Benefits:
 
This role is eligible for our competitive salary and comprehensive benefits package to support your well-being:
- Comprehensive Health Insurance
- Paid Leave (Vacation/PTO)
- Paid Holidays
- Sick Leave
- Parental Leave 
- Bereavement Leave
- 401 (k) Employer Match
- Employee Referral Bonuses
 
Check out our complete list of benefits here - >https://egen.ai/people/#benefits
 
Important: All roles are subject to standard hiring verification practices, which may include background checks, employment verification, and other relevant checks.
 
EEO and Accommodations:
 
Egen is an equal opportunity employer and is committed to inclusion, diversity, and equity in the workplace. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veterans’ status, or any other characteristic protected by federal, state, or local laws. Egen will also consider qualified applications with criminal histories, consistent with legal requirements. Egen welcomes and encourages applications from individuals with disabilities. Reasonable accommodations are available for candidates during all aspects of the selection process. Please advise the talent acquisition team if you require accommodations during the interview process.

Skills Required

  • Bachelor's or Master's degree in Computer Science, Engineering, or related technical field
  • 5+ years of experience in ML/AI engineering with focus on performance, infrastructure, or systems
  • Proven track record deploying and optimizing models in production
  • Demonstrated experience profiling and improving GPU utilization for training and/or inference
  • Mastery of Python and shell scripting
  • Hands-on experience with vLLM, SGLang, SGLash, or comparable high-performance serving stacks
  • Strong Kubernetes (GKE) experience deploying and autoscaling models on shared GPU clusters
  • Solid grasp of GPU architecture, LLM inference fundamentals, and attention mechanisms (e.g., FlashAttention)
  • Fluency with profiling tools to diagnose training and inference bottlenecks (compute vs memory bound, kernel analysis)
  • Knowledge of Data Engineering and SQL
  • Comfort reading and reasoning about lower-level (CUDA-adjacent) performance code
  • Experience with classic machine learning (neural nets, training, tuning)
  • Experience tuning for accelerators such as H200 or GB200
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Naperville, IL
240 Employees
Year Founded: 2000

What We Do

Egen is a data engineering and cloud modernization firm partnering with leading Chicagoland companies to launch, scale, and modernize industry-changing technologies. We are catalysts for change who create digital breakthroughs at warp speed. Our team of cloud and data engineering experts are trusted by top clients in pursuit of the extraordinary. Our mission is to be an enabler of amazing possibilities for companies looking to use the power of cloud and data. We want to stand shoulder to shoulder with clients, as true technology partners, and make sure they succeed at what they have set out to do. We want to be disruptors, game-changers, and innovators who have played an important part in moving the world forward.

Similar Jobs

Affirm Logo Affirm

Program Manager

Big Data • Fintech • Mobile • Payments • Financial Services
Easy Apply
Remote
United States
2200 Employees
146K-225K Annually

Affirm Logo Affirm

Solutions Engineer

Big Data • Fintech • Mobile • Payments • Financial Services
Easy Apply
Remote
United States
2200 Employees
195K-280K Annually

eClinical Solutions Logo eClinical Solutions

Consultant

Cloud • Healthtech • Professional Services • Software • Pharmaceutical
Easy Apply
Remote or Hybrid
United States
400 Employees
116K-145K Annually

CrowdStrike Logo CrowdStrike

Sr. Threat Hunting Intelligence Analyst - Mountain/Pacific (Remote)

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
11 Locations
10000 Employees
100K-155K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account