HPC Engineer/Architect

Posted 8 Days Ago
Be an Early Applicant
New York, NY, USA
In-Office
Expert/Leader
Information Technology • Professional Services • Software • Consulting
The Role
Support and operate large-scale Linux HPC clusters and parallel file systems (GPFS); deploy and maintain GPU-based compute infrastructure; manage and migrate job schedulers (LSF to Slurm); automate deployment, testing, CI/CD and user lifecycle with Ansible, Python and shell scripting; performance tune storage and networks (InfiniBand); consult researchers and maintain HPC benchmarks and monitoring.
Summary Generated by Built In

Job Title: HPC Engineer/Architect

Work Location: New York, NY 10001

Job Type: Contract

Work Type: Hybrid

Duration: 6+ Months

USC/GC/H4/H1

Job Summary:

  • You will support day-to-day operations of large-scale parallel file systems, deploy and maintain Linux HPC infrastructure across multiple data centers, and assist HPC engineers and architects with day-to-day operations and tickets.

  • Support day-to-day operations of large-scale parallel file systems

  • Deploy and Maintain Linux HPC infrastructure across multiple datacenters

  • Assist HPC engineers and architects with day-to-day operations and tickets

Experience:

  • 16 to 20 years

Required Skills:

  • Linux Operating Systems (RHEL/CentOS), Parallel file system (GPFS), Job Scheduler LSF/Slrm

  • Anxible, Python, Shell scripting

  • GPU-based compute infrastructure (including CUDA)

  • CentOS 4.5

  • HPCC

Responsibilities:

  • Design, architect and oversee implementation of Linux based HPC clusters and storage

  • Deploy physical hardware using HPC deployment tools and configuration and orchestration tools (Ansible)

  • Parallel file system (GPFS) performance tuning, monitoring and troubleshooting

  • Perform systems benchmarking, and developing automated tests for the HPC environment, ensuring the reliability and efficiency of our computational infrastructure

  • Infiniband network maintenance and troubleshooting

  • Automate and monitor the HPC user lifecycle process

  • Slurm installation, configuration, performance tuning and troubleshooting

  • Plan, design and implement a transition from the LSF scheduler to Slurm

  • Manage the Slurm scheduler and translate Research policies into scheduler configurations

  • Consult with faculty and students to develop research pipelines for use on the HPC cluster

  • Develop and maintain user lifecycle software suite in Python, implement CI/CD pipeline

  • Test and automate upgrades of critical system applications using Ansible and shell scripts.

  • The ability to communicate effectively with clinicians, researchers, and other team members to develop technological solutions is key

Qualifications:

  • Experience working in a large-scale research based HPC environment

  • Proven experience working with distributed file storage solutions (i.e., GPFS)

  • Experience with deploying and troubleshooting Linux Operating Systems (RHEL/CentOS)

  • Experience with Scripting and Automation (Ansible, Python, Shell Scripting)

  • Solid understanding of job schedulers (LSF/SLURM)

  • Experience with GPU-based compute infrastructure (including CUDA)

Skills Required

  • 16 to 20 years experience in HPC environments
  • Experience working in a large-scale research based HPC environment
  • Proven experience with parallel file systems (GPFS)
  • Experience deploying and troubleshooting Linux (RHEL/CentOS, CentOS 4.5)
  • Scripting and automation with Ansible, Python, and shell scripting
  • Solid understanding and experience with job schedulers (LSF and Slurm)
  • Experience with GPU-based compute infrastructure, including CUDA
  • Experience with HPCC systems
  • InfiniBand network maintenance and troubleshooting
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees

What We Do

Comspark International Inc. is a global software and engineering consulting firm specializing in the development and maintenance of web-enabled solutions. The company provides a wide range of cutting-edge IT services, including software development, outsourcing, and enterprise application solutions such as ERP, AI, and blockchain. They assist clients across various industry domains by creating and implementing specialized technology solutions to drive business efficiency.

Similar Jobs

SharkNinja Logo SharkNinja

Fall 2026: Social Sharks Latin America (LATAM) Marketing Co-op (July to December)

Beauty • Robotics • Design • Appliances • Manufacturing
In-Office
New York, NY, USA
4000 Employees
29-38 Hourly

monday.com Logo monday.com

Sales Manager

Artificial Intelligence • Productivity • Sales • Software
Hybrid
New York, NY, USA
3049 Employees
170K-220K Annually

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Product Manager

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
New York, NY, USA
16000 Employees
125K-140K Annually

Mastercard Logo Mastercard

Consultant

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Hybrid
Harrison, NY, USA
38800 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account