Platform Engineer

Posted 5 Days Ago
San Francisco, CA
120K-180K Annually
5-7 Years Experience
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
We build infrastructure for machine learning
The Role
As a Platform Engineer at Voltage Park, you will maintain critical servers and systems, develop software for automation and front/back end tasks, write automation scripts for backend orchestration, and conduct root cause analyses for system downtimes, all while contributing to a young and agile team focused on engineering excellence.
Summary Generated by Built In

About Voltage Park On-Demand

​​Voltage Park’s mission is to make AI infrastructure accessible to all. Today, we own 24,000+ H100s and operate 7+ data-centers across the US. We serve customers of all sizes, from small research labs to large enterprises. We’re in search of a Platform Engineer to join our On-Demand team, where you’ll help us build a platform that allows customers to flexibly rent out these GPUs for as little or as long as they want. 

Our team is small, highly motivated, and focused on engineering excellence. We all operate with a founder mentality; several of us have founded and launched prior businesses. 

All team members are be hands-on and contribute directly to our team’s mission. 

If you join us, you’ll be an early team member and help us shape:

  • Our future company culture

  • Our engineering practices

  • People that we hire

  • The direction & focus of our products

Note: this role is in-person and you must be based in the San Francisco Bay Area to apply. We are not able to provide sponsorship for this position.

What You’ll Do

  • Maintain servers & systems integral to our platform’s reliability 

  • Develop software — either for automation or for front/back end

  • Write automation for backend orchestration systems — MaaS; Libvirt; PFsense

  • Track downtimes and conduct RCA’s

  • Write automation scripts to audit performance anomalies across our fleet of servers

Who You Are

  • 5+ years Linux administration (Ubuntu/Debian focus)

  • Strong experience with Libvirt (KVM) virtualization

  • Proficient in Python and Bash scripting

  • Experience with automation tools (preferably Ansible)

  • Solid networking knowledge

  • Experience with PostgreSQL

  • Familiarity with CEPH and NFS storage solutions

Ideal Experiences

  • Experience with GPU virtualization and PCIe passthrough

  • Knowledge of Proxmox VE, OpenStack, or OpenNebula

  • Experience with Docker and Kubernetes

  • Experience with bare metal automation (e.g., Ubuntu MAAS)

  • Monitoring experience (Prometheus, Grafana, ELK Stack)

  • Experience with infrastructure-as-code tools (e.g., Terraform)

  • Experience with Redis

  • Experience working with Python (backend), Postgres (database), and React + Tailwind (frontend)

  • Former technical founder: you’re sharp, business-minded, action-oriented, and can move quickly

Voltage Park On-Demand Team Culture

  • You are ambitious and always looking for ways to improve. We operate nearly $1B worth of assets, and the opportunity for impact is limitless. This role will give you the most responsibilities you’ve ever had and hold you to higher standards than other companies you’ve worked at. Expect to do the best and most impactful work of your career at Voltage Park. 

  • You’re focused on impact and don’t get lost in the weeds on details that don’t matter. You’re excited to work on whatever solves the biggest customer problems, not just the coolest technical challenges. You understand when making 80/20 trade-offs is the right thing to do and never compromise on your high standards when making those tradeoffs. 

  • You have a strong work ethic. As a startup, we are trying to change the world and take on many large, $B+ competitors. Raw hours make a huge difference when facing overwhelming odds. Having a strong work ethic is a competitive advantage.

  • You take ownership of your initiatives. When you say you'll do something, you get it done without anyone having to check-in on you. You ship fully baked features end-to-end. You’re accountable for the deadlines you set, and you figure out a solution if something unexpected occurs. 

  • You make tradeoffs when necessary and are open to new ideas. As a startup, we have to make decisions quickly and often with incomplete information. We also face problems that have no obvious solutions. Sometimes, the best ideas sound crazy at first. You don’t dismiss your teammate’s ideas and are open to being challenged by others. 

Voltage Park is an equal opportunity employer and makes employment decisions on the basis of merit. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. If you require an accommodation during the job application process, please notify your recruiter. 

Compensation Range: $120K - $180K

Top Skills

Python

What the Team is Saying

Melissa Du
The Company
HQ: Berkeley, CA
45 Employees
Remote Workplace
Year Founded: 2023

What We Do

The market for cutting-edge ML compute is broken. Startups, researchers and even big AI labs are scrambling to buy or rent access to the latest chips for ML training. But demand far outstrips supply, and what’s available is only accessible to the well-resourced, placing an artificial damper on innovation.

To solve this challenge, we've launched Voltage Park, and we’re on a mission to make machine learning infrastructure accessible to all, from large enterprises and research universities, to seed-stage startups and nonprofits.

With around 24,000 NVIDIA H100 GPUs, the Voltage Park cloud is one of the most powerful collections of cutting-edge ML compute in the world. Our clusters consist of 80GB H100 SXM5 GPUs fully interconnected with 3.2T InfiniBand. We currently offer bare-metal access for large-scale users that need peak performance. We will add support for short-term leases and hourly billing soon as we spin up our infrastructure along with support for familiar tools like Slurm, Kubernetes, and Mosaic for easy integration into existing training frameworks.

Why Work With Us

You’ll play a pivotal role as a member of the founding team that will change the face of machine learning infrastructure. As an early hire, you’ll have outsize influence in defining the company’s culture and ensuring mission success.

Voltage Park Offices

Remote Workspace

Employees work remotely.

Voltage Park is a 100% remote company.

Typical time on-site: None
HQBerkeley, CA

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account