AI Systems Administration Specialist

Posted 2 Days Ago
Be an Early Applicant
Lemont, IL, USA
In-Office
86K-166K Annually
Mid level
Marketing Tech • Energy
The Role
Administer and maintain AI/ML and HPC testbed infrastructure: install and manage Linux systems, configuration management, version control, scripting, datacenter hardware operations, networking support, documentation, and collaborate with researchers and operations teams to ensure testbed availability and sustainability.
Summary Generated by Built In

The Argonne Leadership Computing Facility (ALCF) is home to one of the world's pioneering exascale supercomputers, Aurora. With its extraordinary computing speed and advanced artificial intelligence capabilities, Aurora is set to revolutionize scientific research. ALCF is dedicated to supporting high-performance computing (HPC) and adjacent services that are crucial to the research workflow. The Argonne Leadership Computing Facility (ALCF) is seeking a skilled Systems Administration Specialist to join their team to support the AI Testbed, which is tasked with the evaluation of emerging hardware and software platforms for Artificial Intelligence (AI) and Machine Learning (ML) for science.

In this role you can expect to:

  • Work directly with first class systems alongside scientific staff and research colleagues within the division.
  • Serve as a systems administrator working on Argonne’s AI Testbed, you will work with teams to install and manage a diverse array of AI and machine learning related hardware and software.
  • Work directly with other subject matter experts to ensure the sustainability and availability of the testbed infrastructure.
  • Support machines in a mixed operating system environment and work efficiently with other operations groups.
  • Have researchers rely on your guidance when it comes to the environment so you will have a direct impact in keeping research productive.
  • Work in a hybrid environment with 2+ days onsite in Lemont, Illinois; with the ability to work fully onsite if preferred by the applicant.

Position Requirements

Required skills and qualifications:

  • Experience in UNIX systems administration, especially Linux, with an emphasis on OS installation and upgrading, package building and management, common services and applications, and troubleshooting
  • Experience with Salt, Ansible or similar configuration management tools
  • Experience with Git, or other modern version control platforms
  • Effective problem-solving skills
  • Working knowledge of scripting languages, particularly Python
  • Ability to write concise documentation
  • Ability to work effectively as a member of a team
  • Flexibility in handling assignments and working on several projects simultaneously
  • Knowledge and understanding of how to safely operate within a datacenter, including tasks such as mounting and unmounting server hardware
  • Ability to handle physical labor of installing racks and servers in a datacenter including lifting up to 20 pounds independently and upwards with additional help from others
  • Understanding of IPv4 networking
  • Ability to model Argonne’s core values: impact, safety, respect, integrity and teamwork
  • To perform the essential functions of this position successful applicants must provide proof of U.S. citizenship, which is required to comply with federal regulations and contract.​

Preferred skills and qualifications:

  • Knowledge of AI/ML systems architecture and workflows (Groq, Cerebras, Graphcore, SambaNova)
  • Working knowledge of Kubernetes management
  • Knowledge of scientific applications
  • Knowledge of high-performance networking technologies such as Infiniband and Slingshot
  • Knowledge of Storage Area Networking and storage arrays, such as NetApp.
  • Knowledge of parallel and distributed file systems such as Lustre, and their associated hardware
  • Knowledge of high-performance computing techniques, graphics, and visualization
  • Experience with software packaging, building software from source, and dynamic linking
  • Understanding of MPI, and implementations
  • Ability to gather site requirements and represent them to design and development teams to find appropriate solutions across multiple sites
  • Ability to independently assess requirements, identify tasks, and coordinate with peers to accomplish goals
  • Experience implementing CI or CD workflows

This position can be hired at one of two levels; the selected candidate will be placed at the appropriate level (PT3 or PT4) dependent upon the depth and breadth of relevant knowledge and skills. The minimum requirements of the two levels are as follows:

  • PT3: Bachelor's degree and 4+ years of experience, or a Master's degree and 2+ years of experience, or equivalent. The expected pay range for this position is $86,299 - $134,626.
  • PT4: Bachelor's degree and 6+ years of experience, or a Master's degree and 4+ years of experience, or equivalent. The expected pay range for this position is $106,455 - $166, 070.

Job Family

Professional Technical (PT)

Job Profile

Systems Integration Admin/Support 3

Worker Type

Regular

Time Type

Full time

The expected hiring range for this position is $86,299.00 - $134,626.05.

Please note that the pay range information is a general guideline only. The pay offered to a selected candidate will be determined based on factors such as, but not limited to, the scope and responsibilities of the position, the qualifications of the selected candidate, business considerations, internal equity, and external market pay for comparable jobs. Additionally, comprehensive benefits are part of the total rewards package.

Click here to view Argonne employee benefits!

As an equal employment opportunity employer, and in accordance with our core values of impact, safety, respect, integrity and teamwork, Argonne National Laboratory is committed to a safe and welcoming workplace that fosters collaborative scientific discovery and innovation. Argonne encourages everyone to apply for employment. Argonne is committed to nondiscrimination and considers all qualified applicants for employment without regard to any characteristic protected by law.

Argonne employees, and certain guest researchers and contractors, are subject to particular restrictions related to participation in Foreign Government Sponsored or Affiliated Activities, as defined and detailed in United States Department of Energy Order 486.1A. You will be asked to disclose any such participation in the application phase for review by Argonne's Legal Department.  

All Argonne offers of employment are contingent upon a background check that includes an assessment of criminal conviction history conducted on an individualized and case-by-case basis.  Please be advised that Argonne positions require upon hire (or may require in the future) for the individual be to obtain a government access authorization that involves additional background check requirements.  Failure to obtain or maintain such government access authorization could result in the withdrawal of a job offer or future termination of employment.

Skills Required

  • Experience in UNIX systems administration, especially Linux (OS installation, upgrades, package management, troubleshooting)
  • Experience with Salt, Ansible, or similar configuration management tools
  • Experience with Git or other modern version control platforms
  • Effective problem-solving skills
  • Working knowledge of scripting languages, particularly Python
  • Ability to write concise documentation
  • Ability to work effectively as a member of a team
  • Flexibility in handling assignments and working on several projects simultaneously
  • Knowledge and understanding of safe datacenter operations, including mounting/unmounting server hardware
  • Ability to perform physical labor installing racks and servers, including lifting up to 20 pounds independently
  • Understanding of IPv4 networking
  • Proof of U.S. citizenship required to perform essential functions
  • Experience implementing CI or CD workflows
  • Knowledge of AI/ML systems architecture and workflows (Groq, Cerebras, Graphcore, SambaNova)
  • Working knowledge of Kubernetes management
  • Knowledge of scientific applications
  • Knowledge of high-performance networking technologies such as InfiniBand and Slingshot
  • Knowledge of Storage Area Networking and storage arrays (e.g., NetApp)
  • Knowledge of parallel and distributed file systems such as Lustre
  • Knowledge of high-performance computing techniques, graphics, and visualization
  • Experience with software packaging, building from source, and dynamic linking
  • Understanding of MPI and implementations
  • Ability to gather site requirements and represent them to design and development teams
  • Ability to independently assess requirements, identify tasks, and coordinate with peers to accomplish goals
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Lemont, IL

What We Do

Argonne National Laboratory, one of the U.S. Department of Energy's national laboratories for science and engineering research, employs 3,400 employees, including 1,400 scientists and engineers, three-quarters of whom hold doctoral degrees.

Similar Jobs

Optum Logo Optum

Agile Practitioner 1

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Schaumburg, IL, USA
160000 Employees
73K-130K Annually

Optum Logo Optum

Business Systems Analyst

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Schaumburg, IL, USA
160000 Employees
92K-164K Annually

Capital One Logo Capital One

Sr. Risk Manager for Enterprise Program Governance

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
4 Locations
55000 Employees
162K-203K Annually

Capital One Logo Capital One

Lead Software Engineer

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
Chicago, IL, USA
55000 Employees
209K-239K Annually

Similar Companies Hiring

ClickMint Thumbnail
AdTech • eCommerce • Marketing Tech • Generative AI
Malibu, CA
9 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account