Founding Engineer, AI Infra

Posted Yesterday
8 Locations
Remote or Hybrid
Senior level
Angel or VC Firm • Artificial Intelligence
The Role
Design, build, and operate end-to-end training and inference infrastructure for large language and multimodal models. Improve efficiency (memory, parallelism, kernel optimizations), ensure robust scalable training and RL pipelines, optimize low-latency/high-throughput serving (quantization, caching, speculative decoding), manage multi-GPU and multi-cloud orchestration, and productionize new algorithms with strong observability and reproducibility.
Summary Generated by Built In
About Goaly

At Goaly, our mission is to make custom AI affordable for every business. Our founding team comes from the front lines of top AI labs and tech giants (Meta MSL, TikTok AI, Google DeepMind, xAI, Microsoft Research, etc.), where we built large-scale training infrastructure powering trillion-parameter models and scaled GenAI models to a global user base. Now, we are building something we wish we had before: a platform that makes training and adapting custom AI affordable for all modern companies, not just Big Tech. Our north star is ambitious: for a domain-specific task, reach 90% of SOTA performance at less than 10% of the cost. To get a taste of what we are doing, see our first tech blog.


About the Role

You will sit at the intersection of systems engineering and applied ML, building specialized infrastructure that keeps large language and multimodal models fast, reliable, and cost-effective. You will partner with research, product, and infra teams to ship production-ready platforms for training and serving AI at scale.


Key Responsibilities

  • Efficiency & performance: Improve LLM training and inference efficiency through better memory utilization, optimized parallelism, and kernel-level innovations (e.g. FlashAttention, CUDA/Triton).
  • Training & RL robustness: Build scalable, stable training and RL pipelines with strong reproducibility, observability, and debuggability.
  • Serving & inference optimization: Design and tune high-throughput, low-latency model serving systems, including quantization, caching, and speculative decoding.
  • Scalability & infrastructure: Own end-to-end training and inference infrastructure — from data ingestion and checkpointing to multi-GPU and multi-cloud orchestration.
  • Production enablement: Work closely with researchers and product engineers to turn new algorithms into reliable, production-ready systems.

Requirements

  • 5+ years building or operating ML infrastructure at scale, ideally supporting large language or multimodal models.
  • Deep understanding of GPU architecture, distributed training frameworks (PyTorch, DeepSpeed, Megatron, Ray), and parallelism strategies.
  • Hands-on experience running inference stacks (vLLM / SGLang, TGI, Triton) and optimizing them via low-level profiling.
  • Strong software engineering fundamentals in Python and one of C++/Rust/Go, with clean, reliable code shipped to production.
  • Working knowledge of modern data pipelines, feature stores, and vector databases used in production AI systems.
  • Comfort automating infrastructure with Kubernetes, Terraform/Pulumi, and observability stacks (Prometheus, Grafana, OpenTelemetry).


Bonus Points

  • Experience deploying open-source LLMs (Llama 3, Qwen, DeepSeek) or training custom foundation models.
  • Contributions to ML systems tooling (compilers, kernels, inference runtimes) or open-source infrastructure projects.
  • Background in reinforcement learning, evaluation harnesses, or alignment tooling that hardens production AI systems.

Skills Required

  • 5+ years building or operating ML infrastructure at scale, ideally supporting large language or multimodal models.
  • Deep understanding of GPU architecture, distributed training frameworks (PyTorch, DeepSpeed, Megatron, Ray), and parallelism strategies.
  • Hands-on experience running inference stacks (vLLM / SGLang, TGI, Triton) and optimizing them via low-level profiling.
  • Strong software engineering fundamentals in Python and one of C++, Rust, or Go, with production-quality code shipped.
  • Working knowledge of modern data pipelines, feature stores, and vector databases used in production AI systems.
  • Comfort automating infrastructure with Kubernetes, Terraform or Pulumi, and observability stacks (Prometheus, Grafana, OpenTelemetry).
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
20 Employees

What We Do

CX2 (Cox Exponential) is the early stage venture vehicle for Cox Enterprises. Founded by technologists with decades of experience leading startups and building AI/ML products, CX2 partners with exceptional entrepreneurs and early stage companies to maximize their business potential.

Similar Jobs

PwC Logo PwC

Oracle HCM Cloud - Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote or Hybrid
67 Locations
370000 Employees
99K-232K Annually

NBCUniversal Logo NBCUniversal

Architect

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
Montréal, QC, CAN
68000 Employees

Block Logo Block

Account Manager

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office or Remote
Vancouver, BC, CAN
12000 Employees

Block Logo Block

Staff Product Designer

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office or Remote
8 Locations
12000 Employees
252K-377K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account