Infrastructure Management and Provisioning Engineer

Posted Yesterday
Be an Early Applicant
Madrid, Comunidad de Madrid, ESP
In-Office
Senior level
Healthtech • Biotech • Pharmaceutical
The Role
Design, deploy, and manage automated provisioning and orchestration for large-scale HPC and AI compute clusters. Own IaC and configuration-as-code practices using Ansible and GitLab CI/CD, streamline bare-metal imaging, manage OS builds with Red Hat Image Builder and NVIDIA tools, enforce patching and compliance, monitor platform reliability, and troubleshoot complex hardware-kernel-automation issues to ensure scalable, secure compute infrastructure.
Summary Generated by Built In

At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections,  where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.

The Position

Job description

As an Infrastructure Provisioning and Management Engineer within the Accelerated Compute Engineering (ACE) team, you will be responsible for overseeing and advancing our core infrastructure management and provisioning tech stack. This role has a strong focus on driving configuration-as-code, infrastructure-as-code (IaC), and modern automated provisioning best practices across our high-performance compute (HPC) and industry-leading AI Factory.

You will own the lifecycle, deployment, and optimization of bare-metal and virtualized compute environments that power Roche's advanced computing initiatives. By treating infrastructure strictly as code and eliminating manual configurations, you will ensure our advanced clusters are highly reproducible, securely patched, and rapidly scalable to meet the evolving demands of computational science and large-scale AI workloads.

Description of the area

Hosting and Infrastructure (HI) provides mission-critical on-premise infrastructure, cloud hosting, connectivity, and technology products that enable all functions at every Roche site to develop, innovate, connect, and deliver compliant digital products across the Roche Enterprise.

The Value Streams - Accelerated Compute Engineering (ACE) Team is focused on driving both customer success and platform success by acting as a center of excellence and delivery for the High Performance Compute and AI Infrastructure supporting AI and HPC use cases across Roche. This team facilitates seamless onboarding and adoption for business vertical customers needing accelerated compute—helping those infrastructure consumers with needs optimized for high availability, seamless data transfer, flexibility, speed, and the rapidly changing needs of AI—helping achieve rapid time-to-value.

Job Responsibilities

Automated Provisioning & Cluster Orchestration

  • Design, deploy, and manage large-scale automated provisioning systems for multi-node HPC and AI Factory environments.

  • Own and maintain the infrastructure management and provisioning tech stack underpinning the orchestration, monitoring, and provisioning of complex GPU and CPU workloads.

  • Streamline bare-metal provisioning and node imaging pipelines to ensure minimal downtime and rapid expansion capabilities.

Infrastructure-as-Code (IaC) & Configuration Governance

  • Enforce a strict configuration-as-code and infrastructure-as-code mindset, replacing manual interventions with repeatable automation scripts.

  • Author, review, and maintain complex Ansible playbooks and roles for configuration management, patch deployment, and compliance drift remediation.

  • Establish robust CI/CD pipelines using GitLab to test, validate, and deploy infrastructure changes safely across development, staging, and production clusters.

Operating System Engineering & Lifecycle Management

  • In partnership with Enterprise OS teams, standardize and manage operating system builds, with dual proficiency across HPC and AI Factory platforms.

  • Utilize solutions such as Red Hat Image Builder and NVIDIA Base Command Manager to create optimized, compliant, and secure custom golden images tailored for AI and high-performance computing workloads.

  • Manage OS lifecycles, including kernel tuning, automated package updates, and vulnerability management, ensuring alignment with global security standards.

Platform Reliability & Collaboration

  • Implement proactive monitoring and alerting for infrastructure provisioning health, node availability, and configuration drifts.

  • Address and help resolve complex, systemic infrastructure failures, contributing to post-mortem analyses to continuously improve platform resilience.

Qualifications

Education / Experience

  • Bachelor’s or an advanced degree in Computer Science, Computer Engineering, or a similar technical discipline.

  • 5+ years of experience in systems engineering, DevOps, or platform infrastructure roles, with a proven track record of managing enterprise Linux environments at scale.

  • Deep, practical knowledge of operating system internals for both RHEL and Ubuntu OS.

Technical & Business Skills:

  • Automation & Orchestration: Advanced capability with Ansible on the command line and experience building scalable infrastructure pipelines using GitLab CI/CD.

  • Provisioning Tooling: Experience using NVIDIA Base Command Manager (Bright Cluster Manager) and Red Hat Image Builder (or related tools like Kickstart/Satellite).

  • Modern Engineering Mindset: Strong adherence to git-based workflows, code-review methodologies, and infrastructure-as-code principles.

  • Troubleshooting Depth: Ability to isolate complex, multi-layered faults bridging hardware, kernel configurations, and automation scripts.

Leadership & Mindset:

  • Lean & Agile Mindset: Passionate about continuous improvement, eliminating technical debt, and automating repetitive tasks to achieve scale.

  • Collaboration & Communication: Strong collaborative skills with an enterprise mindset, capable of working fluidly across team boundaries to drive platform success.

  • Intellectual Curiosity: Highly self-motivated to explore and adopt emerging technologies in the fast-evolving landscape of HPC and AI infrastructure engineering

 

 

Who we are

A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.


Let’s build a healthier future, together.

Roche is an Equal Opportunity Employer.

Skills Required

  • Bachelor's degree in Computer Science, Computer Engineering, or similar technical discipline
  • 5+ years experience in systems engineering, DevOps, or platform infrastructure roles
  • Proven track record managing enterprise Linux environments at scale
  • Deep, practical knowledge of RHEL and Ubuntu operating system internals
  • Advanced capability with Ansible (playbooks and roles) for configuration management
  • Experience building CI/CD pipelines for infrastructure changes using GitLab CI/CD
  • Experience with NVIDIA Base Command Manager and/or Bright Cluster Manager for cluster provisioning
  • Experience with Red Hat Image Builder or related tools (Kickstart, Satellite) for OS image creation
  • Experience with bare-metal provisioning and node imaging pipelines for HPC/AI clusters
  • OS lifecycle management skills including kernel tuning, automated package updates, and vulnerability management
  • Strong troubleshooting ability across hardware, kernel configurations, and automation scripts
  • Adherence to infrastructure-as-code and configuration-as-code principles and git-based workflows
  • Experience implementing monitoring and alerting for provisioning health and node availability
  • Ability to collaborate across teams and apply Agile/Lean practices to platform reliability

Roche Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Roche and has not been reviewed or approved by Roche.

  • Retirement Support U.S. materials describe a 401(k) with both matching and an additional company contribution, supported by formal plan documents and true‑up features. This structure is positioned as a standout element of the total package, particularly at Genentech.
  • Leave & Time Off Breadth Time‑off provisions include substantial vacation, a year‑end shutdown, and a paid six‑week sabbatical after six years. These elements indicate a recharge‑oriented approach within the U.S. offering.
  • Healthcare Strength Company materials emphasize comprehensive medical, dental, vision, and mental‑health resources alongside well‑being programs. Benefits pages consistently highlight breadth across core health coverage elements.

Roche Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Provincia de Buenos Aires
93,797 Employees
Year Founded: 1896

What We Do

Roche is a global pioneer in pharmaceuticals and diagnostics focused on advancing science to improve people’s lives. The combined strengths of pharmaceuticals and diagnostics under one roof have made Roche the leader in personalised healthcare – a strategy that aims to fit the right treatment to each patient in the best way possible. Roche is the world’s largest biotech company, with truly differentiated medicines in oncology, immunology, infectious diseases, ophthalmology and diseases of the central nervous system. Roche is also the world leader in in vitro diagnostics and tissue-based cancer diagnostics, and a frontrunner in diabetes management. Founded in 1896, Roche continues to search for better ways to prevent, diagnose and treat diseases and make a sustainable contribution to society. The company also aims to improve patient access to medical innovations by working with all relevant stakeholders. Thirty medicines developed by Roche are included in the World Health Organization Model Lists of Essential Medicines, among them life-saving antibiotics, antimalarials and cancer medicines. Roche has been recognised as the Group Leader in sustainability within the Pharmaceuticals, Biotechnology & Life Sciences Industry ten years in a row by the Dow Jones Sustainability Indices (DJSI).

Similar Jobs

Morningstar Logo Morningstar

Sales Executive

Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Hybrid
Madrid, Comunidad de Madrid, ESP
11500 Employees
45K-67K Annually

Mondelēz International Logo Mondelēz International

Full-stack Engineer

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
3 Locations
90000 Employees

SailPoint Logo SailPoint

Manager, Professional Services

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
2 Locations
2461 Employees

CSC Logo CSC

Accountant

Fintech • Legal Tech • Software • Financial Services • Cybersecurity • Data Privacy
Hybrid
Madrid, Comunidad de Madrid, ESP
8500 Employees
25K-30K Annually

Similar Companies Hiring

Camber Thumbnail
Fintech • Healthtech • Social Impact
New York, New York
90 Employees
Sailor Health Thumbnail
Healthtech • Social Impact • Telehealth
New York City, NY
20 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account