AHEAD

Sr Engineer -Compute

Reposted Yesterday

Be an Early Applicant

Hiring Remotely in Gurugram, Haryana, IND

In-Office or Remote

Senior level

Cloud • Information Technology

The Role

Provide Tier 3 operational support for HPC compute clusters: incident/change management, firmware/software maintenance, performance assessment, cross-team troubleshooting, vendor escalation, customer communication, documentation, on-call rotation, and training/certification completion.

Summary Generated by Built In

AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help enterprises deliver on the promise of digital transformation.

At AHEAD, we prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard. We create spaces to empower everyone to speak up, make change, and drive the culture at AHEAD.

We are an equal opportunity employer, and do not discriminate based on an individual's race, national origin, color, gender, gender identity, gender expression, sexual orientation, religion, age, disability, marital status, or any other protected characteristic under applicable law, whether actual or perceived.

We embrace all candidates that will contribute to the diversification and enrichment of ideas and perspectives at AHEAD.

The High-Performance Computing Compute Engineer is primarily responsible for the overall health and maintenance of the physical cluster and server technologies in our managed services customer's environments. Our Compute Engineers are a valued member of the Managed Services Infrastructure Practice responsible for Tier 3 incident management, service request management and change management infrastructure support for all Managed Services customers.

Principal Duties and Responsibilities

Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities
Plan and perform software and firmware maintenance activities
Assess customer environments for performance and design issues and propose resolutions
Work across technical teams to troubleshoot complex infrastructure issues
Create and maintain detailed documentation
Serve as a subject matter expert and escalation point for compute technologies
Work with vendors to resolve compute issues
Communicate with customers and internal team with transparency
Participate in on-call rotation
Completion of training and certification as assigned to further skills and knowledge

Education and Experience

Bachelor’s degree or equivalent Information Systems or related field. Unique education, specialized experience, skills, knowledge, training, or certification may be substituted for education
5+ years of advanced Linux administration and troubleshooting
5+ years managing RedHat OpenShift Kubernetes and Virtualization clusters
5+ years of expert level experience managing infrastructure in high-performance computing environments including configuration, troubleshooting, and best practice
2+ years of experience with Nvidia DGX preferred
Experience with HPC schedulers (e.g., SLURM, Kubernetes, PBS, Run:ai) required
Proficient in physical server environments
Experience configuring, maintaining and troubleshooting containers
Experience with storage technology (e.g., Ceph or Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS)
Experience with machine learning or data science workflows in HPC/AI environments
1+ years working with monitoring platforms (e.g., Prometheus, Grafana); Elastic Observability experience is a bonus
1+ years working with an enterprise ITSM system: Service Now is a bonus
Previous experience with automation tools such as Ansible, Puppet, or Chef a plus
Managed Services or consulting experience is required
Strong background with customer service
High level problem-solving and communication skills
Strong oral and written communications skills
Related Linux, Nvidia, Scheduler, Containerization, Virtualization, and Clustering certifications are a bonus

Why AHEAD:

Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between.

We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning.

India Employment Benefits include:

Comprehensive health insurance coverage for employees, with options to extend coverage to dependents

Paid time off and company holidays, along with additional leave benefits as per policy

Flexible work arrangements, supporting work-life balance

Learning and development opportunities to support continuous growth and upskilling

Employee wellness initiatives and programs focused on physical and mental well-being

Retirement and statutory benefits in line with India regulations

Inclusive and people-first culture, with a strong focus on collaboration and ownership

Skills Required

Bachelor's degree or equivalent in Information Systems or related field (or equivalent experience/skills)
5+ years advanced Linux administration and troubleshooting
5+ years managing RedHat OpenShift, Kubernetes, and virtualization clusters
5+ years managing infrastructure in high-performance computing environments (configuration, troubleshooting, best practices)
Experience with HPC schedulers (e.g., SLURM, Kubernetes, PBS, Run:ai)
Proficiency with physical server environments
Experience configuring, maintaining, and troubleshooting containers
Experience with storage technologies (e.g., Ceph, Vast Data Platform) and distributed file systems (Lustre, GPFS, NFS, GlusterFS)
Experience with machine learning or data science workflows in HPC/AI environments
1+ years working with monitoring platforms (e.g., Prometheus, Grafana)
1+ years working with an enterprise ITSM system
Managed Services or consulting experience
Willingness to participate in on-call rotation and complete assigned training/certification
2+ years experience with Nvidia DGX
Experience with Elastic Observability
Experience with ServiceNow
Experience with automation tools (Ansible, Puppet, or Chef)
Related Linux, Nvidia, Scheduler, Containerization, Virtualization, and Clustering certifications
Strong customer service, problem-solving, and written/oral communication skills

AHEAD Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about AHEAD and has not been reviewed or approved by AHEAD.

Retirement Support — 401(k) contributions are matched dollar-for-dollar on the first $5,000 each year, with matching made each pay period and immediate 100% vesting. This structure signals above-standard employer support for retirement savings.
Affordable Benefits — Medical options include low employee premiums for PPO and HDHP plans, and the HDHP adds employer HSA funding plus a dollar-for-dollar HSA match up to stated amounts. Dental and vision plans list very low per-paycheck costs, helping keep overall healthcare spend manageable.
Wellbeing & Lifestyle Benefits — No-cost telemedicine (including virtual mental health when enrolled), free Calm access for the employee and dependents, and an EAP with counseling are included. Company-paid life and disability plus voluntary protections (legal/ID, pet insurance) and other extras round out a comprehensive set of supports.

Learn more about AHEAD's Compensation & Benefits →

AHEAD Insights

What's It Like to Work at AHEAD? AHEAD Culture & Values AHEAD Career Growth & Development What's the Work-Life Balance Like at AHEAD? AHEAD Leadership & Management AHEAD Company Growth, Stability & Outlook

View all jobs at AHEAD

View AHEAD Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Chicago, IL

1,154 Employees

Year Founded: 2007

What We Do

AHEAD builds platforms for digital business. By weaving together cloud infrastructure, intelligent operations, and modern applications, we help enterprises deliver on the promise of digital transformation.