Senior Kubernetes Developer - OPS00016

Posted 5 Days Ago
Be an Early Applicant
5 Locations
In-Office or Remote
Senior level
Information Technology • Software
The Role
Join a remote team as a Senior Kubernetes Developer, optimizing and enhancing Kubernetes platforms for HPC/AI workloads, developing custom operators, and ensuring resource management.
Summary Generated by Built In

At Dev.Pro, we partner with businesses worldwide, from startups to Fortune 500 companies — across fintech, retail, hospitality and beyond.

With a remote‑first mindset and a team in 55+ countries, we focus on aligning technical expertise with client needs, communicating clearly, and staying adaptable as priorities shift. This commitment to ownership and flexibility helps us create lasting partnerships — so you can focus on what you do best.

With a remote‑first mindset and a team in 55+ countries, we focus on aligning technical expertise with client needs, communicating clearly, and staying adaptable as priorities shift. This commitment to ownership and flexibility helps us create lasting partnerships — so you can focus on what you do best.

About this opportunity

We invite a skilled Kubernetes Developer to join our fully remote, international team. In this role, you’ll build and optimize the Kubernetes orchestration platform and develop custom operators to run HPC/AI workloads efficiently on GPU clusters. You’ll enhance infrastructure performance and reliability, create internal tools to improve the developer experience, and ensure multi-tenant HPC workloads remain secure and compliant.

 What's in it for you:

• Work on cutting-edge GPU infrastructure and next-gen HPC/AI workloads

• Build a Slurm-on-Kubernetes product from scratch and shape its architecture

• Collaborate with a top-tier international team and grow through continuous learning and conference participation


Is that you?

• 3+ years of hands-on Kubernetes experience in production

• Experience with HPC schedulers (Slurm, PBS, LSF, Volcano)

• Strong background in GPU resource management and distributed systems

• Experience with cloud/hybrid cloud architectures (AWS, GCP, Azure, on-prem GPU clusters)

• Knowledge of Kubernetes operators, CRDs, scheduling, networking, and storage

• Deep knowledge of HPC job scheduling and workload orchestration

• Expertise in IaC (Terraform, Helm, or GitOps: ArgoCD/Flux) and monitoring & observability (Prometheus, Grafana, Jaeger, ELK)

• Programming skills in Go, Python, Bash/Shell

• Familiarity with PyTorch, TensorFlow, distributed training, and model serving

• Skills in Linux administration, performance tuning, and advanced networking (RDMA, InfiniBand, TCP/IP, DNS, load balancing)

• Experience in storage management and optimization for large datasets


Key responsibilities and your contribution

In this role, you'll design, develop, and manage Kubernetes platforms for GPU-intensive AI/HPC workloads.


• Design and build a Slurm-like orchestration layer on Kubernetes for HPC/AI workloads

• Develop custom operators and controllers for GPU job scheduling and execution

• Integrate batch schedulers with Kubernetes to provide a hybrid HPC/Cloud product

• Implement advanced GPU resource management

• Build internal tools and a self-service platform to simplify AI/HPC job deployment and management

• Build a cloud-native platform for AI training, inference, and HPC workloads

• Optimize scheduling to improve GPU utilization and reduce queue times

• Monitor GPU clusters, troubleshoot production issues, and ensure high availability, fault tolerance, and disaster recovery

• Develop CI/CD pipelines for GPU-intensive workloads

• Implement best practices for multi-tenant GPU clusters with AI/HPC workloads

• Ensure compliance with data sovereignty and international regulations

• Maintain secure container, runtime, and workload isolation policies

Top Skills

Argocd
AWS
Azure
Bash
Dns
Elk
Flux
GCP
Go
Grafana
Helm
Infiniband
Jaeger
Kubernetes
Linux
Lsf
Pbs
Prometheus
Python
Rdma
Slurm
Tcp/Ip
Terraform
Volcano
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Charlotte, North Carolina
848 Employees
Year Founded: 2011

What We Do

Dev.Pro helps innovative technology companies scale their business by leveraging our software engineering expertise to support them every step of the way.

It was founded by entrepreneurs and technologists, with the goal of helping technology-driven companies to develop their innovative software products and grow their businesses.

We started as an American company in 2011 and now have offices in different locations. Part of our development centers are located in Ukraine and we support Ukrainian specialists by providing them with career opportunities around the world. Also over the past few years, we have been hiring specialists from very different countries and continue to do so, expanding and globalizing the company.

True to our roots, we remain creative and nimble, tailoring our engagement with clients to meet their specific needs. Some come to us for our engineering expertise, some for the rapid delivery, and some for cost efficiency. But what truly sets us apart is the alliance we forge with our clients over time, aligning our success with theirs.

Similar Jobs

Pfizer Logo Pfizer

Digital & Technology Business Services Manager, LATAM

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Remote or Hybrid
10 Locations

Rubrik Logo Rubrik

Join Our Sales Talent Community

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
Remote
15 Locations

Luxury Presence Logo Luxury Presence

Senior Software Engineer

Marketing Tech • Real Estate • Software • PropTech • SEO
Remote or Hybrid
12 Locations
150K-180K Annually

WeLocalize Logo WeLocalize

Shape the Future of AI — Spanish Talent Hub

Machine Learning • Natural Language Processing
In-Office or Remote
17 Locations

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account