Lead AI Application Engineer (Infrastructure & LLMOps)

Posted Yesterday
Be an Early Applicant
4 Locations
Remote
Senior level
Agency
The Role
Lead design, build, and operate a multi-tenant AI platform and LLMOps infrastructure across cloud and on-prem. Own vector DBs, feature stores, model hosting (Kubernetes + GPU orchestration), and developer self-service tooling. Standardize LLM-as-a-service offerings and enable squads with APIs, templates, and documentation.
Summary Generated by Built In

At TechBiz Global, we are providing recruitment service to our TOP clients from our portfolio.

We are currently looking for a dedicated Lead AI Aplication Engineer to join one of our clients' teams. If you're looking for an exciting opportunity to grow in an innovative environment, this could be the perfect fit for you.

 

Key Responsibilities: 
 

  1. Build & Run the Shared AI Platform
  • Architect and maintain a multi-tenant AI Platform that supports the full ML lifecycle across cloud and on-premises environments.
  • Ensure high availability, low latency, and cost-efficiency for all shared AI resources.
  • Implement LLMOps/MLOps best practices, including automated deployment pipelines for models.

2. Curate the AI Services Catalogue

  • Develop and expose "as-a-service" capabilities: Inference-as-a-Service, Embeddings-as-a-Service, and RAG-as-a-Service.
  • Standardize how squads interact with LLMs, providing unified APIs and abstraction layers to prevent vendor lock-in.

3. Manage AI Data Infrastructure

  • Own the deployment and scaling of Vector Databases (e.g., Pinecone, Milvus, Weaviate) and Feature Stores (e.g., Feast, Tecton, Hopsworks).
  • Optimize data retrieval patterns to support real-time AI applications and agentic workflows.
  • Oversee Model Hosting environments, utilizing Kubernetes (K8s) and GPU orchestration to manage compute resources efficiently.

4. Enable Developer Self-Service

  • Build and maintain a Self-Service Portal or CLI that allows product squads to provision AI environments, models, and data stores independently.
  • Reduce "Time-to-Inference" for new features by providing pre-configured templates and blueprints.
  • Conduct internal workshops and provide documentation to empower squads to use the platform effectively.

Must-Have Technical Skills

  • Infrastructure: Deep experience with Kubernetes (K8s), Docker, and Terraform/Pulumi.
  • Hybrid Cloud: Proven experience managing workloads across AWS/Azure/GCP and On-Premises (NVIDIA AI Enterprise, OpenShift).
  • AI/ML Tooling: Hands-on experience with vLLM, TGI (Text Generation Inference), or NVIDIA Triton for model serving.
  • Databases: Expertise in Vector DBs and traditional SQL/NoSQL databases.
  • Languages: High proficiency in Python and Go or Rust for platform tooling.
     

Experience

  • 8+ years in Platform Engineering, DevOps, or Site Reliability Engineering (SRE).
  • 2+ years specifically focused on building AI/ML infrastructure or platforms.
  • Experience building Internal Developer Platforms (IDP) is a massive plus.

Skills Required

  • Deep experience with Kubernetes (K8s)
  • Docker
  • Terraform or Pulumi
  • Experience managing workloads across AWS, Azure, GCP and on-premises (NVIDIA AI Enterprise, OpenShift)
  • Hands-on experience with vLLM, TGI (Text Generation Inference), or NVIDIA Triton for model serving
  • Expertise in Vector Databases (e.g., Pinecone, Milvus, Weaviate) and traditional SQL/NoSQL databases
  • GPU orchestration and model hosting experience (Kubernetes + GPUs)
  • High proficiency in Python and Go or Rust for platform tooling
  • 8+ years in Platform Engineering, DevOps, or Site Reliability Engineering (SRE)
  • 2+ years building AI/ML infrastructure or platforms
  • Experience building Internal Developer Platforms (IDP)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
20 Employees

Similar Jobs

ServiceNow Logo ServiceNow

Consultant

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Warsaw, Warszawa, Mazowieckie, POL
29000 Employees

DuckDuckGo Logo DuckDuckGo

Senior Data Scientist

Information Technology
Remote
14 Locations
393 Employees
179K-179K Annually

Dropbox Logo Dropbox

Software Engineer

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Remote
Poland
2500 Employees
333K-451K Annually

Dropbox Logo Dropbox

Integration Engineer

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Remote
Poland
2500 Employees
223K-301K Annually

Similar Companies Hiring

Caxy Thumbnail
Software • Mobile • Enterprise Web • Artificial Intelligence • Agency
Chicago, IL
45 Employees
Digible Thumbnail
Social Media • PropTech • Marketing Tech • Digital Media • Artificial Intelligence • Agency • AdTech
PH
145 Employees
Fora Thumbnail
Agency • On-Demand • Professional Services • Sales • Software • Travel • Hospitality
New York, NY
200 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account