Principal Software Engineer, Architecture (AI/ML)

Posted 13 Days Ago
Be an Early Applicant
Austin, TX
Hybrid
Expert/Leader
Cloud • Enterprise Web • Software • Infrastructure as a Service (IaaS)
DigitalOcean is the cloud of choice for developers, startups, and SMBs around the world.
The Role
The Principal Software Engineer, Architecture (AI/ML) will lead the architectural strategy for DigitalOcean's cloud services, focusing on integrating AI/ML technologies. Responsibilities include developing AI/ML models, refining pipelines, mentoring teams, and solving complex problems in cloud environments.
Summary Generated by Built In
Do you ever wonder what happens inside the cloud?

DigitalOcean (NYSE: DOCN) simplifies cloud computing so builders can spend more time creating software that changes the world. With our mission-critical infrastructure and fully managed offerings, DigitalOcean enables startups and small and medium-sized businesses (SMBs) to rapidly deploy and scale modern applications. As a remote-first organization, our employees, like our customers, are based around the world.

We want people who are passionate about staying on top of the latest cloud infrastructure and AI/ML trends, with an excellent aptitude for supporting internal employees and teams.

We are looking for a highly experienced, highly motivated Principal Software Engineer, Architecture (AI/ML) with a Computer Science, Engineering, or AI/ML background. You will be involved in the architecture, design, implementation, verification, and integration of the next generation of DigitalOcean Cloud Computing software with a strong emphasis on AI/ML-driven solutions.

What You’ll Be Doing:

  • Working at the forefront of cloud, distributed computing, and AI/ML technologies.
  • Serving as the architect driving the technical strategy and direction for our large-scale cloud services, including machine learning model deployment and orchestration.
  • Developing AI/ML models to optimize cloud infrastructure, improve system reliability, and enhance user experience.
  • Building and refining machine learning pipelines and frameworks to support scalable AI/ML solutions.
  • Owning the primary responsibility for establishing a pragmatic long-term technical direction for our software services, ensuring alignment with our customers, business goals, and internal teams.
  • Leading a team of highly passionate technical leads to evolve our service architecture, with alignment across several product technical roadmaps.
  • Leading by example through direct contribution and providing direction in establishing development and operational practices, with specific attention to AI/ML model lifecycle management.
  • Serving as the technical lead on our most demanding, cross-functional projects.
  • Actively mentoring individuals and the engineering community on advanced technical issues, including best practices in AI/ML.

What We’ll Expect From You:

Architect-level experience in the following domains:

    • Proven expertise in large-scale cloud and AI/ML services, and a deep understanding of cloud computing’s potential in enhancing AI/ML applications.
    • Demonstrated ability to lead and mentor large software and AI/ML teams.
    • Experience with web and cloud-native services is a must-have, with experience deploying scalable AI/ML solutions in production.
    • Adept at Systems Thinking with an ability to decompose complex problems into simple, straight-forward solutions, including AI/ML-specific challenges like model drift and data dependency management.
    • Strong grasp of system interdependencies, limitations, and expertise in AI/ML optimization techniques for performance, scalability, and accuracy.

AI/ML Expertise:

    • Hands-on experience in AI/ML frameworks and libraries, such as TensorFlow, PyTorch, or Scikit-Learn, and model-serving frameworks such as TensorFlow Serving or ONNX.
    • Proven experience in developing and deploying models for performance-intensive applications at web-scale.
    • Understanding of the MLOps lifecycle, including data engineering, model training, validation, deployment, and monitoring.
    • Understanding of key HPC technologies including RDMA, InfiniBand/RoCE, GPUDirect and other storage technologies 
  • Knowledge in performance, scalability, enterprise system architecture, and engineering best practices with an emphasis on the integration of AI/ML.
  • Leverage knowledge of open-source, industry standards, and prior art in architecture decisions with AI/ML considerations.
  • Balance technical leadership and savvy with strong business judgment to make the right decisions about technology, demonstrating simplicity and creativity.
  • Master’s degree or higher preferred in Computer Science, AI/ML, or a related field.
  • 15+ years professional experience in web-scale system software development.
  • 5+ years experience demonstrating an established track record in Deep Learning and Machine Learning
  • 3+ years recent experience as an ML engineer, data science engineer, or similar
  • In-depth experience in two or more of the following areas: Cloud Computing, Storage, Networking, Platform-as-a-Service, Infrastructure-as-a-Service, Software-as-a-Service.
  • Excellent communication skills at all levels

Why You’ll Like Working for DigitalOcean:

  • We are proud to work here. You’ll be a part of a cutting-edge technology company with an upward trajectory, who are proud to simplify cloud computing so builders can spend more time creating software that changes the world. As a member of the team, you will be a Shark who thinks big, bold, and scrappy, like an owner with a bias for action and a powerful sense of responsibility for customers, products, employees, and decisions. 
  • We prioritize career development. At DO, you’ll do the best work of your career. You will work with some of the smartest and most interesting people in the industry. We are a high-performance organization that will always challenge you to think big. Our organizational development team will provide you with resources to ensure you keep growing. We provide employees with reimbursement for relevant conferences, training, and education. All employees have access to LinkedIn Learning's 10,000+ courses to support their continued growth and development.
  • We care about your well-being. Regardless of your location, we will provide you with a competitive array of benefits to support your overall well-being, from one-time work from home stipend to wellness allowance to flexible time off policy, to name a few. While the philosophy around our benefits is the same worldwide, specific benefits may vary based on local regulations and preferences.
  • We reward our employees. The salary range for this position is between $225,000.00 - $338,000.00 based on market data, relevant years of experience, and skills. You may qualify for a bonus in addition to base salary; bonus amounts are determined based on company and individual performance. We also provide equity compensation to eligible employees, including equity grants upon hire and the option to participate in our Employee Stock Purchase Program. 
  • We value diversity and inclusion. We are an equal-opportunity employer, and recognize that diversity of thought and background builds stronger teams and products to serve our customers. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

*This is a remote role

#LI-Remote

#LI-KR1

Top Skills

PyTorch
Scikit-Learn
TensorFlow
The Company
HQ: New York , NY
900 Employees
Hybrid Workplace
Year Founded: 2012

What We Do

DigitalOcean (NYSE: DOCN) simplifies cloud computing so builders can spend more time creating software that changes the world. With our mission-critical infrastructure and fully managed offerings, DigitalOcean enables startups and small and medium-sized businesses (SMBs) to rapidly deploy and scale modern applications. As a remote-first organization, our employees, like our customers, are based around the world.

Why Work With Us

Here you'll get to work with some of the smartest, most interesting people around; solving unique and complex technical challenges on a scale matched by few companies. If you get excited about stretching yourself in new ways, developing yourself to your fullest potential, with amazingly supportive friends and colleagues; we want to talk to you!

Gallery

Gallery

Similar Jobs

Chewy Logo Chewy

Software Engineer III (AI/ML)

eCommerce • Healthtech • Pet • Retail • Pharmaceutical
Richardson, TX, USA
20000 Employees

Apptronik Logo Apptronik

Senior Software Engineer - DevOps

Computer Vision • Hardware • Machine Learning • Robotics • Software
Easy Apply
Hybrid
Austin, TX, USA
160 Employees

Apptronik Logo Apptronik

Product Quality Engineer

Computer Vision • Hardware • Machine Learning • Robotics • Software
Easy Apply
Hybrid
Austin, TX, USA
160 Employees

Apptronik Logo Apptronik

Software Engineer Autonomy - Intern 2025

Computer Vision • Hardware • Machine Learning • Robotics • Software
Easy Apply
Hybrid
Austin, TX, USA
160 Employees

Similar Companies Hiring

Halter Thumbnail
Software • Machine Learning • Internet of Things • Hardware • Greentech • Business Intelligence • Agriculture
Auckland City, NZ
150 Employees
TrainingPeaks (A Peaksware Company) Thumbnail
Software • Fitness
Louisville, CO
69 Employees
Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account