Sr. HPC Architect - Hybrid

Sorry, this job was removed at 08:12 p.m. (CST) on Thursday, May 22, 2025
Be an Early Applicant
75063, Irving, TX, USA
In-Office
Artificial Intelligence • Healthtech • Biotech
Where Molecular Science Meets Artificial Intelligence – Revolutionizing Cancer Care.
The Role

At Caris, we understand that cancer is an ugly word—a word no one wants to hear, but one that connects us all. That’s why we’re not just transforming cancer care—we’re changing lives.

 

We introduced precision medicine to the world and built an industry around the idea that every patient deserves answers as unique as their DNA. Backed by cutting-edge molecular science and AI, we ask ourselves every day: “What would I do if this patient were my mom?” That question drives everything we do.

 

But our mission doesn’t stop with cancer. We're pushing the frontiers of medicine and leading a revolution in healthcare—driven by innovation, compassion, and purpose.

 

Join us in our mission to improve the human condition across multiple diseases. If you're passionate about meaningful work and want to be part of something bigger than yourself, Caris is where your impact begins.

Position Summary
A Senior HPC Architect is responsible for designing and optimizing high-performance computing (HPC) systems, leveraging their expertise in parallel programming, performance analysis, and hardware architecture to create scalable, efficient solutions for demanding computational workloads, often collaborating with software developers and hardware engineers to achieve optimal performance across complex scientific or data-intensive applications. 
Job Responsibilities

  • System Design and Implementation:

  • Architecting and designing high-performance computing clusters, selecting appropriate hardware components like CPUs, GPUs, storage systems, and networking infrastructure. 

  • Installing and configuring operating systems (typically Linux) on cluster nodes. 

  • Setting up and managing distributed file systems (like Lustre, Ceph, GPFS) for large data storage and access. 

  • Implementing job scheduling systems (e.g., LSF, Slurm, PBS) to manage workload distribution across the cluster. 

  • Performance Optimization:

  • Monitoring system performance metrics (CPU utilization, memory usage, network bandwidth) to identify bottlenecks and optimize resource allocation. 

  • Benchmarking applications and performing performance analysis to identify areas for improvement. 

  • Tuning application code for parallel processing to leverage the power of the HPC cluster. 

  • User Support:

  • Providing technical support to researchers and users on how to access and utilize the HPC system 

  • Training users on best practices for submitting jobs and optimizing their applications for the HPC environment 

  • Troubleshooting user issues related to application execution, data management, and system access 

  • System Administration:

  • Managing system updates, patching, and security configurations to maintain a stable and secure HPC environment 

  • Implementing backup and disaster recovery procedures for critical data and system configurations 

  • Monitoring system health and proactively addressing potential issues through alerts and logging systems 

Required Qualifications

  • Minimum of five years’ experience in Linux systems administration.

  • Bachelor's degree in computer science, engineering, math, or scientific discipline with 2+ years of systems engineering; or 6 years’ experience in HPC architecture.  

  • Hands-on architecture design experience with HPC to include storage, file system, InfiniBand, security, authentication, and compute architecture

  • Experience using Git to manage shared software configuration code bases

  • Hands-on experience with cloud-based services (e.g. Azure, AWS, GCP).

  • Good understanding of storage administration and optimization, such as performing upgrades and defining RAID configurations.

  • Deep understanding of parallel computing concepts and programming paradigms (MPI, OpenMP, CUDA).

  •  Expertise in performance analysis tools and techniques to identify and address performance bottlenecks.

  • Knowledge of HPC hardware architectures, including processors, memory subsystems, network fabrics, and interconnects 

  • Familiarity with HPC software stack components like compilers, runtime systems, job schedulers, and scientific libraries

  • Good understanding of storage administration and optimization, such as performing upgrades and defining RAID configurations. 

  • Strong programming skills in languages commonly used in HPC (C, C++, Fortran)

  • Strong skills with scripting languages like Python and Shell scripting (e.g.,bash,ksh, Perl, Python) for automation 

  • Experience with system administration and cluster management tools (e.g., LSF, Slurm, PBS)

  • Experience with distributed file systems (Lustre, Ceph, GPFS) 

  • Excellent communication and problem-solving abilities to effectively collaborate with cross-functional teams 

Preferred Qualifications

  • Experience in life sciences, healthcare and/or research institutions highly preferred

  • Experience building and installing scientific software and other 3rd party software applications on HPC systems

  • Experience with HPC schedulers and resource managers

  • Experience executing scientific software on HPC systems

  • Experience writing user documentation

  • Strong technical and analytical skills

  • Strong verbal and written communication skills

  • Always maintains the highest level of professionalism when interacting with internal and external customers

  • Demonstrates a high-energy, positive attitude and commitment to quality customer service

  • Contributes to a positive team environment within the center by demonstrating a strong work ethic, effectively communicating with others, and proactively anticipating center and user needs

  • Experience coordinating and running support teams

  • Related industry certifications preferred. 

Physical Demands

  • Ability to lift, move and install HPC data center hardware and supplies.

  • Standing for extended periods while performing data center related tasks.

Training

  • All job specific, safety, and compliance training are assigned based on the job functions associated with this employee.

Other 

  • This position requires periodic travel and some evenings, weekends, and/or holidays.

  • Job may require after-hours response to emergency issues.

  • Periodically scheduled on-call may require after-hours response for technical emergencies not explicitly related to assigned job responsibilities

Conditions of Employment:  Individual must successfully complete pre-employment process, which includes criminal background check, drug screening, credit check ( applicable for certain positions) and reference verification.

This job description reflects management’s assignment of essential functions. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities to this job at any time.

 

Caris Life Sciences is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, gender identity, sexual orientation, age, status as a protected veteran, among other things, or status as a qualified individual with disability.

Similar Jobs

BAE Systems, Inc. Logo BAE Systems, Inc.

Chief Engineer

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Hybrid
Austin, TX, USA
40000 Employees
150K-254K Annually

BAE Systems, Inc. Logo BAE Systems, Inc.

Software Engineer

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Hybrid
Austin, TX, USA
40000 Employees
79K-135K Annually

BAE Systems, Inc. Logo BAE Systems, Inc.

Manager of Fleet Optimization Team

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Hybrid
Fort Worth, TX, USA
40000 Employees
121K-205K Annually

BAE Systems, Inc. Logo BAE Systems, Inc.

Systems Engineer

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Hybrid
Austin, TX, USA
40000 Employees
97K-165K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Irving, TX
1,700 Employees
Year Founded: 2008

What We Do

Caris Life Sciences was founded in 2008 with a simple but powerful purpose – to help improve the lives of as many people as possible. With transformative technologies informed by massive amounts of big data, we are revolutionizing healthcare to provide physicians and patients with the highest quality information about their disease – from detecting it early and determining how best to treat it, to developing the next wave of novel therapies.

Similar Companies Hiring

GC AI Thumbnail
Artificial Intelligence • Legal Tech
San Mateo, California
80 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account