Staff Engineer Engineering Compute Infrastructure and Grid Operations

Reposted 13 Days Ago
Be an Early Applicant
3 Locations
In-Office
128K-189K Annually
Senior level
Artificial Intelligence • Automotive • Semiconductor
We create custom semiconductor solutions that move, process, store, and secure data quickly and reliably.
The Role
The role involves managing and improving engineering compute infrastructure, focusing on job management, storage systems, and reliability within high-throughput environments.
Summary Generated by Built In

About Marvell

Marvell’s semiconductor solutions are the essential building blocks of the data infrastructure that connects our world. Across enterprise, cloud and AI, and carrier architectures, our innovative technology is enabling new possibilities. 

At Marvell, you can affect the arc of individual lives, lift the trajectory of entire industries, and fuel the transformative potential of tomorrow. For those looking to make their mark on purposeful and enduring innovation, above and beyond fleeting trends, Marvell is a place to thrive, learn, and lead. 

Your Team, Your Impact

Marvell’s semiconductor solutions are the essential building blocks of the data infrastructure that connects our world. Across enterprise, cloud and AI, and carrier architectures, our innovative technology is enabling new possibilities.
At Marvell, you can affect the arc of individual lives, lift the trajectory of entire industries, and fuel the transformative potential of tomorrow. For those looking to make their mark on purposeful and enduring innovation, above and beyond fleeting trends, Marvell is a place to thrive, learn, and lead.

What You Can Expect

Job Summary

We are seeking a Senior Engineer to design, operate, and continuously improve the engineering compute infrastructure used for large-scale chip design and verification. This role is heavily focused on grid job management, storage systems, reliability, and operational excellence in high-throughput compute environments.

The ideal candidate has strong IT and systems skills, deep experience with batch schedulers and distributed storage, and a passion for diagnosing and preventing large-scale job failures that impact engineering productivity.

Key Responsibilities – Grid & Job Management

Own and evolve grid job management infrastructure used for large regressions and high-volume batch workloads.

Debug and resolve grid job failures, including scheduling issues, hung jobs, resource starvation, and intermittent infrastructure faults.

Improve job reliability through watchdogs, retries, heartbeats, timeouts, and failure detection mechanisms.

Work with job controllers and wrapper layers to ensure consistent behavior across grid environments (e.g., LSF, UGE).

Partner with IT and compute teams during grid migrations, upgrades, and expansions.

Key Responsibilities – Storage & Filesystem Infrastructure

Develop deep operational understanding of shared engineering storage systems used by compute jobs.

Diagnose and resolve issues related to I/O performance, file contention, permissions, and cross-mounted filesystems.

Identify and mitigate storage-related failure modes that cause job instability or data corruption.

Collaborate with IT teams on filesystem migrations, maintenance windows, and outage prevention.

Key Responsibilities – Reliability, Monitoring & Prevention

Proactively identify systemic issues that lead to grid instability or job loss.

Design and deploy monitoring, logging, and metrics to detect infrastructure problems early.

Perform root-cause analysis of complex, intermittent failures affecting compute, storage, or networking.

Define best practices and guardrails to prevent repeat incidents and improve overall system robustness.

Key Responsibilities – Cross-Team Collaboration

Act as a technical bridge between engineering users, tools teams, and central IT.

Translate engineering workload requirements into actionable infrastructure improvements.

Communicate clearly during incidents, maintenance events, and post-mortems.

Document operational procedures and share knowledge to reduce support burden.

What We're Looking For

Qualifications and Skills

Bachelor’s degree in computer science, Computer Engineering, Electrical Engineering, or equivalent experience.

8+ years of experience in compute infrastructure, grid operations, or large-scale engineering environments.

Strong experience with grid or batch schedulers (e.g., LSF, UGE, Slurm, PBS).

Hands-on experience debugging distributed systems and batch job failures.

Strong Linux systems knowledge, including process management and resource monitoring.

Experience with shared storage systems (NFS, enterprise filers, high-performance filesystems).

Strong scripting skills in Python, shell, or similar languages.

Preferred Qualifications

Experience supporting EDA or engineering compute workloads.

Familiarity with job controller or wrapper-based execution architectures.

Experience operating environments with thousands of concurrent batch jobs.

Exposure to cloud or hybrid compute environments.

Prior involvement in grid or filesystem migrations.

Strong incident response and post-mortem leadership skills.

Expected Base Pay Range (USD)

128,000 - 189,370, $ per annum

The successful candidate’s starting base pay will be determined based on job-related skills, experience, qualifications, work location and market conditions. The expected base pay range for this role may be modified based on market conditions.

Additional Compensation and Benefit Elements 

Marvell is committed to providing exceptional, comprehensive benefits that support our employees at every stage - from internship to retirement and through life’s most important moments. Our offerings are built around four key pillars: financial well-being, family support, mental and physical health, and recognition. Highlights include an employee stock purchase plan with a 2-year look back, family support programs to help balance work and home life, robust mental health resources to prioritize emotional well-being, and a recognition and service awards to celebrate contributions and milestones. We look forward to sharing more with you during the interview process.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status.

Any applicant who requires a reasonable accommodation during the selection process should contact Marvell HR Helpdesk at [email protected].

Interview Integrity 

To support fair and authentic hiring practices, candidates are not permitted to use AI tools (such as transcription apps, real-time answer generators like ChatGPT or Copilot, or automated note-taking bots) during interviews.

These tools must not be used to record, assist with, or enhance responses in any way. Our interviews are designed to evaluate your individual experience, thought process, and communication skills in real time. Use of AI tools without prior instruction from the interviewer will result in disqualification from the hiring process.

This position may require access to technology and/or software subject to U.S. export control laws and regulations, including the Export Administration Regulations (EAR). As such, applicants must be eligible to access export-controlled information as defined under applicable law. Marvell may be required to obtain export licensing approval from the U.S. Department of Commerce and/or the U.S. Department of State. Except for U.S. citizens, lawful permanent residents, or protected individuals as defined by 8 U.S.C. 1324b(a)(3), all applicants may be subject to an export license review process prior to employment.

#LI-JT2

Skills Required

  • Bachelor's degree in computer science, Computer Engineering, Electrical Engineering, or equivalent experience
  • 8+ years of experience in compute infrastructure, grid operations, or large-scale engineering environments
  • Strong experience with grid or batch schedulers (e.g., LSF, UGE, Slurm, PBS)
  • Hands-on experience debugging distributed systems and batch job failures
  • Strong Linux systems knowledge, including process management and resource monitoring
  • Experience with shared storage systems (NFS, enterprise filers, high-performance filesystems)
  • Strong scripting skills in Python, shell, or similar languages

Marvell Technology Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Marvell Technology and has not been reviewed or approved by Marvell Technology.

  • Equity Value & Accessibility Equity appears to be a meaningful part of total rewards through RSUs and an ESPP with a 15% discount and lookback, which can materially raise overall compensation. Stock upside is positioned as a key differentiator when company performance is strong.
  • Parental & Family Support Paid parental/bonding leave is described as substantial, with additional disability leave for birthing parents and a flexible return-to-work program. Family-care leave, generous bereavement provisions, and family-building support (e.g., adoption/surrogacy reimbursement) further strengthen the package.
  • Healthcare Strength Medical coverage is presented as broad with multiple plan options and preventive care covered at 100% in-network, alongside dental, vision, and structured mental-health support. Additional programs like telehealth and specialized care partners add depth to the health offering.

Marvell Technology Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
6,500 Employees
Year Founded: 1995

What We Do

Marvell specializes in semiconductor solutions that power a wide range of industries, from data centers and 5G networks to AI, automotive, and storage applications. Our cutting-edge products are designed to meet the constantly evolving demands of a connected world, enabling faster, more efficient and more secure data processing and communication. With a focus on excellence and a commitment to advancing technology, we develop solutions that drive progress and transform industries.

Why Work With Us

Life at Marvell means being a part of new innovation and enduring technology; but it's also much more. Our diverse community is strengthened through cultural events, corporate gatherings and team-building activities, fostering collaboration and making work enjoyable. At Marvell, it's not just a job; it's an enriching, community-driven experience.

Gallery

Gallery

Similar Jobs

Caterpillar Logo Caterpillar

Senior Software Engineer

Artificial Intelligence • Cloud • Internet of Things • Software • Cybersecurity • Industrial
Hybrid
Irving, TX, USA
100000 Employees
113K-183K Annually

Cox Enterprises Logo Cox Enterprises

Human Resources Business Partner

Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Remote or Hybrid
United States
50000 Employees
67K-101K Annually

Cox Enterprises Logo Cox Enterprises

Customer Success Manager

Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Remote or Hybrid
United States
50000 Employees
92K-154K Annually

Cox Enterprises Logo Cox Enterprises

Human Resources Business Partner

Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Hybrid
Dallas, TX, USA
50000 Employees
67K-101K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account