AI/ML Computing Cluster Engineer

Reposted 7 Days Ago
Easy Apply
San Jose, CA
In-Office
100K-150K Annually
Mid level
Information Technology • Semiconductor
The Role
Develop and operate high-performance computing clusters for AI/ML workloads, ensuring scalability, performance, and reliability through collaboration with cross-functional teams.
Summary Generated by Built In
Job Title: AI/ML Computing Cluster Engineer
Office Location: San Jose, CA
Work Model: Onsite

About SK hynix America
At SK hynix America, we're at the forefront of semiconductor innovation, developing advanced memory solutions that power everything from smartphones to data centers. As a global leader in DRAM and NAND flash technologies, we drive the evolution of advancing mobile technology, empowering cloud computing, and pioneering future technologies. Our cutting-edge memory technologies are essential in today's most advanced electronic devices and IT infrastructure, enabling enhanced performance and user experiences across the digital landscape.
We're looking for innovative minds to join our mission of shaping the future of technology. At SK hynix America, you'll be part of a team that's pioneering breakthrough memory solutions while maintaining a strong commitment to sustainability. We're not just adapting to technological change – we're driving it, with significant investments in artificial intelligence, machine learning, and eco-friendly solutions and operational practices. As we continue to expand our market presence and push the boundaries of what's possible in semiconductor technology, we invite you to be part of our journey to creating the next generation of memory solutions that will define the future of computing.

Job Overview:

As the AI/ML Computing Cluster engineer, you will work on development and operation of high-performance computing clusters supporting AI/ML workloads. You will be responsible for development, implementation, operation, and optimization of AI data center IT environments to ensure scalability, performance, reliability, and cost-effectiveness. This role requires collaboration with cross-functional teams to align computing infrastructure with the organization's strategic direction.

Responsibilities:

Computing Cluster Infrastructure Development

  • Design and implement distributed computing cluster infrastructure to support large-scale AI/ML model training and inference jobs with a focus on transformer-based AI models.
  • Build and maintain distributed system to ensure scalability, efficient resource allocation, and high throughput.
  • Optimize cluster performance through hardware selection, equipment configuration, network engineering, and performance analysis.
  • Deploy and operate data center networking infrastructure using software system for automation, design validation, deployment, and operational support.
  • Implement tools and processes to maintain high uptime and ensure infrastructure reliability during both model training and inference phases.
  • Identify and resolve performance bottlenecks, improving overall system throughput and response times.

Team Leadership & Collaboration

  • Collaborate with cross-functional teams, including research, security, and benchmark test engineering teams, to integrate infrastructure with AI workflows, ensuring seamless deployment and operation.
  • Engage with technology vendors and partners to evaluate new solutions to drive innovation in AI computing infrastructure.

Qualification:   

  • Master’s degree or above in Computer Science, Electrical Engineering, or related fields.
  • 2+ years of experience in AI cluster engineering, MLOps, and benchmark testing, including GPU performance analysis, memory usage, and energy/power monitoring tools.
  • Strong familiarity with AI computing architecture, AI/ML infrastructure requirements, memory architecture and usages in AI/ML, AI algorithm trends and best practices.
  • Expertise in optimizing resource utilization, improving system throughput, and reducing latency in both training and inference.

Benefits:       

  • Top Tier health insurance at no employee cost
  • Paid day offs: PTO + Company Holidays + Happy Fridays
  • Paid Parental Leave Program
  • 401k Matching
  • Educational reimbursement up to $10,000 per year
  • Donation Matching and volunteering opportunities
  • Corporate discount programs
  • Free Breakfast/Lunch/Dinner provided to employees

Equal Employment Opportunity:

SKHYA is an Equal Employment Opportunity Employer. We provide equal employment opportunities to all qualified applicants and employees and prohibit discrimination and harassment of any type without regard to race, sex, pregnancy, sexual orientation, religion, age, gender identity, national origin, color, protected veteran or disability status, genetic information or any other status protected under federal, state, or local applicable laws. 


Compensation:

Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. Pay within the provided range varies by work location and may also depend on job-related skills and experience. Your Recruiter can share more about the specific salary range for the job location during the hiring process.

Pay Range
$100,000$150,000 USD

Top Skills

AI
Distributed Systems
Gpu
Ml
Mlops
Networking
Performance Analysis
Resource Optimization
Software Automation
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Jose, CA
328 Employees
Year Founded: 1983

What We Do

Semiconductors are essential to all IT products, and its performance often determines the performance of the final products. SK hynix is a global leader in producing semiconductor, such as DRAM, NAND Flash and CMOS Image Sensors. With these technology driven semiconductor products, SK hynix has consistently led the industry and is now the second largest memory chip maker worldwide.

IT devices become more pervasive as new imaginative and innovative IT products continue to grab imagination and desires of consumers. SK hynix has enhanced its competency with the best level of technology and a wide range of business portfolios in order to satisfy all those demand from customers. As a member of SK Group*, SK hynix is aiming at becoming the world’s best semiconductor company. SK hynix America Inc. operates as a subsidiary of SK Hynix Inc.


*SK Group is one of South Korea's top five industrial conglomerates.
It has about 40 affiliated companies, ranging from energy, telecommunications, finance, to construction.

Similar Jobs

Wells Fargo Logo Wells Fargo

Teller Hanford

Fintech • Financial Services
Hybrid
Hanford, CA, USA
213000 Employees
20-26 Hourly

Block Logo Block

Senior Software Engineer

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office or Remote
8 Locations
12000 Employees
185K-327K Annually

Block Logo Block

Technical Program Manager

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office or Remote
8 Locations
12000 Employees
218K-327K Annually

Block Logo Block

Technical Account Specialist

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office or Remote
8 Locations
12000 Employees
3K-6K Hourly

Similar Companies Hiring

Axle Health Thumbnail
Logistics • Information Technology • Healthtech • Artificial Intelligence
Santa Monica, CA
17 Employees
Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
LayerOne Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account