Solutions Architect, Platform Infrastructure (Remote)

Sorry, this job was removed at 08:01 p.m. (CST) on Friday, May 23, 2025
Be an Early Applicant
2 Locations
In-Office or Remote
Machine Learning • Software
The Role
At Weights & Biases, our mission is to build the best tools for AI developers. We founded our company on the insight that while there were excellent tools for developers to build better code, there were no similarly great tools to help ML practitioners build better models. Starting with our first experiment tracking product, we have since expanded our solution into a comprehensive AI developer platform for organizations focused on building their own deep learning models and generative AI applications.

Weights & Biases is a Series C company with $250M in funding and over 200 employees. We proudly serve over 1,000 customers and more than 30 foundation model builders including customers such as OpenAI, NVIDIA, Microsoft, and Toyota.

The Solutions Architect role at Weights & Biases is a unique hybrid, blending the technical expertise of a Site Reliability Engineer (SRE) with the communication and advisory skills of a Solutions Architect. In this role, you will focus on all aspects of the Weights & Biases Platform, managing customer deployments across various cloud infrastructures and on-prem environments to ensure scalability, reliability, and operational excellence.

You will work closely with customers to debug issues, provide best practices, and help them unlock the full potential of Weights & Biases. Additionally, you will produce technical content such as blog posts, documentation updates, and internal enablement material to support the Field Engineering team. This role requires deep collaboration with Support, Product, and Engineering teams to drive product improvements based on customer insights.

Responsibilities:

  • Deployment & Operations:
  • Work with customer operations teams to provision Weights & Biases services in Dedicated Cloud, Private Cloud, and on-prem environments.
  • Manage complex infrastructure implementations, partnering with highly skilled customer engineers.
  • Monitor and ensure the reliability, performance, and scalability of customer deployments using SRE best practices.
  • Debugging & Troubleshooting:
  • Diagnose and resolve issues in customer environments, documenting resolutions to accelerate future problem-solving.
  • Provide hands-on support for containerized and distributed systems using Docker, Kubernetes, and related technologies.
  • Customer Engagement:
  • Lead technical discussions with customers, acting as a trusted advisor for infrastructure reliability and operational excellence.
  • Deliver training sessions, product demos, and workshops to help customers maximize the value of Weights & Biases.
  • Collaborate with customers to uncover desired outcomes and recommend solutions tailored to their needs.
  • Enablement & Collaboration:
  • Partner with AI Solution Engineers to streamline post-sales processes, including onboarding, adoption, and training.
  • Collaborate with Sales Engineering to ensure a seamless transition from POC to onboarding.
  • Provide insights to the Product team based on customer feedback to influence the product roadmap.

Requirements:

  • Based in the Pacific Standard Time (PST) timezone.
  • A proven track record of systematically diagnosing and resolving infrastructure issues.
  • Prior experience in a customer-facing technical role.
  • Expertise with Docker, Kubernetes, Helm charts, networking, and cloud-managed services (e.g., MySQL, Object Stores).
  • Strong fundamentals in Infrastructure as Code (IaC), preferably Terraform.
  • Proficiency with at least one cloud platform (AWS, GCP, Azure); experience with multiple platforms is a plus.
  • Strong Linux/Unix command line experience.
  • Basic proficiency in Python and familiarity with ML workflows or tools.
  • Exceptional communication skills, both written and verbal, with the ability to simplify complex topics for diverse audiences.
  • Proven ability to prioritize and manage multiple competing tasks in a dynamic environment.

Strong Plus:

  • Deep proficiency in Kubernetes design patterns, including Operators.
  • Familiarity with data engineering and MLOps tooling.
  • Experience as an educator or facilitator for technical training sessions, workshops, or demos.
  • SaaS, web service, or distributed systems operations experience.

Our Benefits:

  • 🏝️ Flexible time off
  • 🩺 Medical, Dental, and Vision for employees and Family Coverage
  • 🏠 Remote first culture with in-office flexibility in San Francisco
  • 💵 Home office budget with a new high-powered laptop
  • 🥇 Truly competitive salary and equity
  • 🚼 12 weeks of Parental leave (U.S. specific)
  • 📈 401(k) (U.S. specific)
  • Supplemental benefits may be available depending on your location
  • Explore benefits by country

We encourage you to apply even if your experience doesn't perfectly align with the job description as we seek out diverse and creative perspectives. Team members who love to learn and collaborate in an inclusive environment will flourish with us. We are an equal opportunity employer and do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. If you need additional accommodations to feel comfortable during your interview process, reach out at [email protected].

#LI-Remote

Similar Jobs

Forward Financing Logo Forward Financing

Senior Product Manager

Fintech • Financial Services
Remote
United States
529 Employees
145K-180K Annually

Dandy Logo Dandy

Senior Full-stack Engineer

Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing
Remote
USA
1800 Employees

Rubrik Logo Rubrik

National Distribution Manager

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
Remote
USA
3000 Employees
125K-200K Annually

ReversingLabs Logo ReversingLabs

Regional Sales Manager

Information Technology • Software • Cybersecurity
Remote
United States
307 Employees
150K-160K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
132 Employees
Year Founded: 2017

What We Do

Weights & Biases helps machine learning teams build better models faster. With a few lines of code, practitioners can instantly debug, compare and reproduce their models — architecture, hyperparameters, git commits, model weights, GPU usage, and even datasets and predictions — and collaborate with their teammates.

Similar Companies Hiring

PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account