ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time/Direct Hire
Looking for ML Infra experts (Bay Area preferred) with deep experience in CUDA, GPU optimization, VLLMs, and LLM inference—pure language focus, no vision/audio.
Client Opportunity | Through Phizenix
Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language models—built for faster generation, multimodal integration, and scalable enterprise deployment.
We’re looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You’ll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.
ResponsibilitiesDesign and manage distributed infrastructure for ML training at scale
Optimize model serving systems for low-latency inference
Build automated pipelines for data processing, model training, and deployment
Implement observability tools to monitor performance in production
Maximize resource utilization across GPU clusters and cloud environments
Translate research requirements into robust, scalable system designs
Masters or PhD in Computer Science, Engineering, or a related field (or equivalent experience)
Strong foundation in software engineering, systems design, and distributed systems
Experience with cloud platforms (AWS, GCP, or Azure)
Proficient in Python and at least one systems-level language (C++/Rust/Go)
Hands-on experience with Docker, Kubernetes, and CI/CD workflows
Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective
Understanding of GPU programming and high-performance infrastructure
Experience with large-scale ML training clusters and GPU orchestration
Knowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)
Experience with distributed training strategies (e.g., data/model/pipeline parallelism)
Familiarity with orchestration tools like Kubeflow or Airflow
Background in performance tuning, system profiling, and MLOps best practices
At Phizenix, we’re committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let’s build the future—together.
Top Skills
What We Do
We provide Talent Solutions for the AI Era. Our mission is to connect businesses with exceptional talent and consulting solutions that align with your company’s culture and values. We offer AI consulting services to enable businesses in leveraging cutting-edge artificial intelligence. We help discover, design and deploy AI solutions that streamline operations, boost productivity, and unlock new growth opportunities. Our team of AI experts, strategists, and technology specialists work closely with organizations to integrate AI-driven solutions that align with their unique goals and challenges. From automation and data analytics to predictive modeling and AI-based customer experiences, we provide end-to-end support for businesses embarking on their AI transformation journey.








