DevOps Engineer (AI/ML) - Onsite in Palo Alto, CA

Sorry, this job was removed at 06:03 p.m. (CST) on Tuesday, Oct 22, 2024
Palo Alto, CA
80K-175K Annually
Internship
Cloud • Software
The Role
Who We Are

OpenTeams is the services marketplace where open source software users can find, vet, and contract with service providers. At OpenTeams we believe in a culture of do-ers, learners, and collaborators. We are looking for people who are motivated, humble, curious, and respectful of others. In order to meet the demands of our high growth business, we are looking for talented individuals to provide insights, solutions, and strategy to our internal leadership team and client partners. 

Title: DevOps Engineer

Duration: Full-Time/Direct Hire

Location: Palo Alto, CA (On-site)

Salary: $80K - $175K DOE

You must be able to work on-site in Palo Alto, CA to be considered for this position. 

Job Overview:

Our client that develops cutting-edge Artificial General Intelligence (AGI) solutions is seeking a DevOps Engineer with a passion for AI/ML technologies to join their dynamic team.  You will play a critical role in managing infrastructure both in the cloud and on-premise, ensuring seamless operations for our internal teams and external customers. If you are curious about AI and excited to work in an environment that bridges DevOps and AI development, this role is perfect for you.


Key Responsibilities:

  • Infrastructure Management: Set up, maintain, and optimize cloud and on-premise environments for AI/ML workloads, ensuring scalability, security, and reliability.
  • Automation: Develop and maintain CI/CD pipelines for AI/ML model training, deployment, and testing across multiple environments.
  • Collaboration: Work closely with data scientists, ML engineers, and software developers to streamline the development-to-production process.
  • Machine Learning Operations (MLOps): Implement MLOps best practices to support the AI/ML team in their model lifecycle, from training to deployment and monitoring.
  • Cloud Services: Manage cloud infrastructure on platforms such as AWS, Google Cloud, or Azure, ensuring cost-efficient and high-performance resource allocation for model training and deployment.
  • On-Premise Solutions: Configure and manage on-premise hardware for training models, ensuring hardware is optimized for AI tasks (e.g., GPU/TPU configurations).
  • Monitoring & Troubleshooting: Build robust monitoring and alerting systems to proactively identify and solve issues related to infrastructure and application performance.
  • Security & Compliance: Implement and enforce security best practices across all platforms, both cloud and on-prem, including role-based access control and data encryption.
  • Customer Support: Provide technical support to customers in deploying and managing their AI workloads, assisting with integration and troubleshooting.


Qualifications:

  • 3+ years in DevOps, with a focus on managing infrastructure for data or AI-driven environments.
  • Cloud Expertise: Hands-on experience with **AWS, Azure, Google Cloud**, or similar platforms.
  • On-Prem Experience: Knowledge of managing and scaling **on-prem hardware** for AI tasks, including GPU/TPU resources.
  • Automation Tools: Experience with CI/CD pipelines, Docker, Kubernetes, and configuration management tools like Ansible, Terraform, or Puppet.
  • MLOps: Exposure to AI/ML frameworks (e.g., TensorFlow, PyTorch) and familiarity with MLOps pipelines is a plus.
  • Scripting & Programming: Proficiency in scripting languages such as Python, Bash, or Go for automating workflows.
  • Version Control: Expertise with Git and related source control tools.
  • Problem-Solving: Strong troubleshooting skills, especially in high-performance, data-heavy environments.
  • AI Enthusiast: Curiosity about AI/ML technologies, with a desire to learn and grow in the space.


Nice-to-Have Skills:

  • Experience with **AI model serving frameworks like TensorFlow Serving, TorchServe, or KubeFlow.
  • Familiarity with monitoring tools such as Prometheus, Grafana, or Elastic Stack
  • Knowledge of networking and security best practices for hybrid environments.


Why You Should Join

You'll become an important part of a collaborative, remote-first team. We are a passionate and ambitious team, with a proven record of success building multiple companies. We strive to provide a working environment that gives you room to learn and grow. OpenTeams is committed to creating a diverse and inclusive work environment and is proud to be an equal opportunity employer. 

All qualified applicants will receive equal consideration for recruitment, interviews, employment, training, compensation, promotion, and related activities without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, veteran status or any and all other protected classes and in accordance with all applicable laws.

The Company
HQ: Austin, Texas
36 Employees
On-site Workplace
Year Founded: 2019

What We Do

OpenTeams is at the forefront of open source support, offering a wide range of practice areas led by a network of Open Source Architects. With over 680 open source technologies, our team provides comprehensive services including strategy and consulting, custom development, integration, migration, and 24/7 support. Our practice areas cover various domains, such as Machine Learning Operations, Cloud Optimization, Data Science and Engineering, SaaS and Cloud Applications, Artificial Intelligence and Machine Learning, PyTorch Hardware Optimization, PyTorch Artificial Intelligence System Building, and High-Performance Systems. Each solution is staffed by experienced professionals who assist businesses in addressing specific challenges and leveraging open source technologies to achieve their goals. OpenTeams is dedicated to helping clients build better software with reliable open source support.

Similar Jobs

BlackLine Logo BlackLine

Senior Site Reliability Engineer

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Hybrid
Pleasanton, CA, USA
1810 Employees
145K-193K Annually

BlackLine Logo BlackLine

Staff I Reliability Engineer - FedRAMP

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Hybrid
Pleasanton, CA, USA
1810 Employees
160K-213K Annually

BlackLine Logo BlackLine

Senior Manager, Site Reliability Engineering

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Hybrid
Pleasanton, CA, USA
1810 Employees
186K-248K Annually

Roblox Logo Roblox

Senior Software Engineer, Edge Compute

Computer Vision • Gaming • Software • Virtual Reality • Web3 • Metaverse
Hybrid
San Mateo, CA, USA
2500 Employees
234K-284K Annually

Similar Companies Hiring

Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account