Architect - Platform Engineer

Posted 3 Days Ago
Hiring Remotely in USA
Remote
Expert/Leader
Artificial Intelligence • Big Data • Machine Learning
The Role
Design and scale GenAI/LLM infrastructure for multi-GPU environments. Perform GPU profiling and optimization, manage Slurm/OpenShift/Kubernetes clusters, enable NVIDIA GPU stack, build IaC templates, CI/CD automation, and support production deployments and client engagements for GenAI workloads.
Summary Generated by Built In

While technology is the heart of our business, a global and diverse culture is the heart of our success. We love our people and we take pride in catering them to a culture built on transparency, diversity, integrity, learning and growth.
If working in an environment that encourages you to innovate and excel, not just in professional but personal life, interests you- you would enjoy your career with Quantiphi!

About Quantiphi:

Quantiphi is an award-winning, AI-First global digital engineering company that helps the world’s leading Fortune 1000 organizations transform bold ideas into measurable business impact. We go beyond building innovative AI technologies, we solve the problems that matter most to our clients.

Since our founding in 2013, Quantiphi has built a proven track record of turning complex challenges into meaningful outcomes across industries.

Headquartered in Boston, with more than 4,000 professionals worldwide, we partner with global enterprises to deliver large-scale digital, cloud, and AI-driven transformation. #SolvingWhatMatters

We are an Elite and Premier partner to Google Cloud, AWS, NVIDIA, Snowflake, and other leading technology platforms, and our work has been recognized across the industry, including:

  • 3 NVIDIA Partner of the Year awards

  • 3 AWS AI/ML Partner of the Year awards

  • 21x Google Cloud Partner of the Year awards in the past 10 years

  • 3 Snowflake Partner of the Year awards

  • Rated Leaders by Gartner, Forrester, IDC, ISG, Everest Group and other leading analyst firms

Quantiphi delivers First-in-class AI solutions across  Life Sciences, Healthcare, Banking, Financial Services, CPG, Manufacturing, Energy, High-Tech, Telecommunications, etc., powered by cutting-edge Generative AI and Agentic AI accelerators.

For more details, visit: Website or LinkedIn Page.

Role: Architect - Platform Engineer

Experience Level: 10+ yrs

Work Location: US East/Canada (Remote)

Role Overview:

We are looking for a highly skilled Architect - Platform Engineer to design, optimize, and scale infrastructure for GenAI and LLM workloads. This role is ideal for someone with deep hands-on experience in GPU profiling, distributed training, and high-performance compute environments. You will be working with Architects from other specialties such as Data engineering, Software engineering, ML engineering to create platforms, solutions and applications that cater to latest trends

You’ll play a key role in building out GenAI platform foundations, supporting production-grade deployments, and partnering closely with data science, MLOps, and application teams to bring cutting-edge AI solutions to life.

Key Responsibilities:

  • Design and implement scalable infrastructure for LLM and GenAI workloads across multi-GPU environments

  • Perform GPU profiling, benchmarking, and performance optimization for distributed training workloads

  • Manage and schedule compute-intensive jobs using Slurm-based clusters and OpenShift/Kubernetes environments

  • Enable and optimize the NVIDIA GPU stack (CUDA, cuDNN, NCCL, Triton, RAPIDS, etc.)

  • Collaborate with cross-functional teams to deploy models in research and production environments

  • Build and support GenAI pipelines (fine-tuning, RAG, multi-modal inferencing, LLMOps)

  • Develop reusable infrastructure templates using tools like Terraform and Helm

  • Contribute to internal innovation (PoCs, workshops) and support client-facing delivery engagements

  • Develop and deliver automation software required for building & improving the functionality, reliability, availability, and manageability of applications and cloud platforms

  • Champion and drive the adoption of Infrastructure as Code (IaC) practices and mindset

  • Design, architect, and build self-service, self-healing, synthetic monitoring and alerting platform and tools

  • Automate the development and test automation processes through CI/CD pipeline (Git, Jenkins, SonarQube, Artifactory, Docker containers)

  • Build container hosting-platform using Kubernetes

  • Introduce new cloud technologies, tools; processes to keep innovating in the commerce area to drive greater business value.

  • Lead the technical discussion regarding architecture designing and troubleshooting with the clients and provide solutions proactively as required

Basic Qualifications:

  • Strong experience with Slurm and distributed training environments

  • Hands-on expertise with Red Hat OpenShift and/or Kubernetes

  • Deep knowledge of the NVIDIA GPU ecosystem (CUDA, cuDNN, NCCL, Nsight, Triton/TensorRT)

  • Strong foundation in Linux systems, performance tuning, and multi-GPU optimization

  • Experience deploying GenAI workloads (LLM fine-tuning, RAG pipelines, multi-modal systems)

  • Familiarity with Infrastructure-as-Code tools (Terraform, Ansible)

  • Experience with cloud GPU environments (GCP, Azure, AWS, OCI) and/or on-prem GPU clusters

  • Serve as a mentor or guide for senior resources / team leads.

  • Lead the technical discussion regarding architecture design

Other Qualifications (OQs):

  • Experience with NVIDIA NIMs, DGX systems, or GPU-accelerated containers

  • Knowledge of LLMOps frameworks and MLOps integration

  • Familiarity with vector databases and retrieval systems for RAG architectures

  • Comfortable working in client-facing environments and collaborating with AI solution teams

Healthcare Domain Experience (Nice to Have):

  • Experience working with FHIR R4, HL7 v2, or SMART on FHIR

  • Integration with EHR systems (e.g., Epic)

  • Understanding of HIPAA compliance and healthcare data privacy

  • Exposure to clinical workflows, CDS Hooks, or patient-facing applications

  • Experience building clinical decision support systems or healthcare interoperability solutions

What’s in it for YOU at Quantiphi:

  • Make an impact at one of the world’s fastest-growing AI-first digital engineering companies.

  • Up-skill and discover your potential as you solve complex challenges in cutting-edge areas of technology alongside passionate, talented colleagues.

  • Work where innovation happens - work with disruptive innovators in a research-focused organization with 60+ patents filed across various disciplines.

  • Stay ahead of the curve, immerse yourself in breakthrough AI, ML, data, and cloud technologies and gain exposure working with Fortune 500 companies.


If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!

Skills Required

  • Strong experience with Slurm and distributed training environments.
  • Hands-on expertise with Red Hat OpenShift and/or Kubernetes.
  • Deep knowledge of the NVIDIA GPU ecosystem (CUDA, cuDNN, NCCL, Nsight, Triton, TensorRT).
  • Strong foundation in Linux systems, performance tuning, and multi-GPU optimization.
  • Experience deploying GenAI workloads (LLM fine-tuning, RAG pipelines, multi-modal systems).
  • Familiarity with Infrastructure-as-Code tools (Terraform, Ansible).
  • Experience with cloud GPU environments (GCP, Azure, AWS, OCI) and/or on-prem GPU clusters.
  • Develop and deliver automation and CI/CD pipelines (Git, Jenkins, SonarQube, Artifactory, Docker).
  • Develop reusable infrastructure templates and helm charts; champion IaC practices.
  • Serve as a mentor or guide for senior resources and lead technical architecture discussions.
  • Manage and schedule compute-intensive jobs using Slurm-based clusters and OpenShift/Kubernetes.
  • Build container hosting-platform using Kubernetes and design self-service, monitoring, alerting tools.
  • Experience with NVIDIA NIMs, DGX systems, or GPU-accelerated containers.
  • Knowledge of LLMOps frameworks and MLOps integration.
  • Familiarity with vector databases and retrieval systems for RAG architectures.
  • Comfortable working in client-facing environments and collaborating with AI solution teams.
  • Healthcare domain experience (FHIR R4, HL7 v2, SMART on FHIR, Epic integration, HIPAA)

Quantiphi Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Quantiphi and has not been reviewed or approved by Quantiphi.

  • Flexible Benefits Hybrid and work-from-home options are commonly available and perceived as meaningful perks that increase overall package value. Flexibility by team and role often enhances day-to-day experience even when cash pay is not top-tier.
  • Healthcare Strength U.S. materials indicate medical coverage that includes dental and vision, and employee accounts align with having these plans in place. The presence of core health benefits contributes to a baseline of security across key locations.
  • Parental & Family Support Paid parental leave is available in the U.S., with examples citing generous leave lengths. Family-focused policies appear alongside other flexibility features.

Quantiphi Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Marlborough, MA
3,494 Employees
Year Founded: 2013

What We Do

Quantiphi is an award-winning AI-first digital engineering company driven by the desire to solve transformational problems at the heart of business. Quantiphi solves the toughest and complex business problems by combining deep industry experience, disciplined cloud, and data-engineering practices, and cutting-edge artificial intelligence research to achieve quantifiable business impact at unprecedented speed.

Similar Jobs

FreeWheel Logo FreeWheel

Senior Data Scientist

AdTech • Digital Media • Marketing Tech
Remote or Hybrid
Pennsylvania, USA
1249 Employees
93K-218K Annually

Comcast Logo Comcast

Enterprise Account Executive

Digital Media • Information Technology • News + Entertainment
Remote or Hybrid
Ohio, USA
115000 Employees

Comcast Logo Comcast

Account Executive

Digital Media • Information Technology • News + Entertainment
Remote or Hybrid
Virginia, USA
115000 Employees

Trail of Bits Logo Trail of Bits

Security Engineer

Artificial Intelligence • Blockchain • Professional Services • Security • Consulting • Cybersecurity • Defense
Remote
United States
125 Employees
200K-250K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account