SITE RELIABILITY ENGINEER

Reposted 2 Days Ago
Camden, NJ, USA
Hybrid
130K-150K Annually
Mid level
Information Technology • Logistics • Transportation • Analytics • Business Intelligence • 3PL: Third Party Logistics • Industrial
Best in Cold — Redefining Cold Storage Through Technology
The Role
As a Site Reliability Engineer, you'll enhance reliability for Phenix WMS and automation systems, focusing on incident reduction and system health through observability and automation. Responsibilities include defining SLIs and SLOs, participating in incident response, and testing disaster recovery plans.
Summary Generated by Built In
Site Reliability Engineer (SRE)
Engineer Reliability into the Systems That Move the Nation’s Food Supply
Who We AreUS Cold owns and operates one of the most complex temperature-controlled logistics networks in North America. Every day, our systems coordinate the storage and movement of food at national scale across a network of state-of-the-art distribution centers, including multiple highly automated warehouse facilities.We continue to advance our core warehouse and logistics platforms. Our current focus is on modular, event-driven, API-first and cloud architectures. We continue to enhance reliability and accelerate engineering productivity by strengthening our SRE and AI practices. This is a large investment in innovation to continue to drive operational excellence at our facilities.If you want to build durable systems that operate in the physical world at scale, this is that opportunity. The RoleThe Site Reliability Engineer is a founding member of US Cold’s SRE practice.This role exists to move the organization from reactive operations to engineered reliability. You will study how our most critical systems fail — particularly our Phenix WMS and facility automation interfaces — and design controls, automation, and observability that reduce incidents over time.Success in this role means fewer false alerts, faster recovery, less manual intervention, and systems that heal themselves when possible.You will work closely with application, infrastructure, and operations teams and participate directly in on‑call and incident response.What You Will Own
  • Reliability of the Phenix WMS and its integration with facility automation systems (robotics, conveyors, and control interfaces)
  • Definition and implementation of SLIs and SLOs that measure meaningful system health, not just availability
  • Observability across the full stack, correlating cloud services, APIs, and on‑premise facility operations
  • Automation to eliminate operational toil, including patching, data corrections, restarts, and recovery tasks
  • Development of self‑healing behaviors for common failure modes
  • Participation in on‑call rotations and leadership of blameless post‑incident reviews
  • Design and execution of disaster recovery tests across SaaS, cloud, and on‑premise environments
This is hands‑on reliability engineering. The systems you improve will directly impact daily warehouse operations.Technical Environment
  • Hybrid environments spanning cloud and on‑premise infrastructure
  • Azure cloud services
  • Warehouse Management Systems (Phenix WMS) and facility automation interfaces
  • Observability tooling across logs, metrics, and alerting
  • Automation using Python, PowerShell, Bash, or Ansible
  • CI/CD tools and modern deployment practices
  • Exposure to containerized and distributed systems environments

What We’re Looking For
  • 3+ years of experience in SRE, DevOps, Systems Engineering, or related roles
  • Strong Linux and Windows systems administration and troubleshooting skills
  • Hands‑on experience with automation and scripting
  • Experience designing and operating monitoring, alerting, and observability solutions
  • Practical experience working in Azure environments
  • Strong analytical skills and a bias toward eliminating root causes, not symptoms
  • Ability to collaborate across application, infrastructure, and operations teams
  • Experience supporting warehouse management systems or industrial automation platforms
  • Exposure to Kubernetes, microservices, or container orchestration
  • Hands on experience with infrastructure‑as‑code tools such as Terraform or Ansible
  • Understanding of distributed systems and high‑availability design
  • Experience with SRE practices such as SLO‑based operations, runbook automation, or chaos testing
Why This Role Is DifferentThis is not an inherited SRE function.
 There is no mature framework to maintain.
You will:
  • Help define what reliability means at US Cold
  • Work on systems that operate in the physical world
  • Engineer solutions that reduce toil and operational load
  • See the direct impact of your work on warehouse uptime and performance
  • Build practices that scale as the platform modernizes
This is an opportunity to grow as an SRE while helping establish the reliability foundation of a mission‑critical platform.Compensation & Structure 
  • Location: Hybrid – Camden NJ 
  • Reports to: IT – Site Reliability Engineering Manager
  • Salary Range: $130,000- $150,000
Operational Context
  • Systems operate continuously across warehouse facilities
  • Reliability failures have physical and operational consequences
  • On‑call participation is part of the role
  • Work occurs across cloud, SaaS, and on‑premise environments
Equal Opportunity Employer
This employer is required to notify all applicants of their rights pursuant to federal employment laws. For further information, please review the Know Your Rights notice from the Department of Labor.

Skills Required

  • 3+ years of experience in SRE, DevOps, Systems Engineering, or related roles
  • Strong Linux and Windows systems administration and troubleshooting skills
  • Hands-on experience with automation and scripting
  • Experience designing and operating monitoring, alerting, and observability solutions
  • Practical experience working in Azure environments
  • Strong analytical skills and bias toward eliminating root causes
  • Experience supporting warehouse management systems or industrial automation platforms
  • Exposure to Kubernetes, microservices, or container orchestration
  • Hands-on experience with infrastructure-as-code tools such as Terraform or Ansible
  • Understanding of distributed systems and high-availability design
  • Experience with SRE practices such as SLO-based operations, runbook automation, or chaos testing

United States Cold Storage Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about United States Cold Storage and has not been reviewed or approved by United States Cold Storage.

  • Fair & Transparent Compensation Pay is considered fair or competitive for the work and area by many. Frontline warehouse and forklift roles often describe the pay as very fair or even strong for relatively straightforward work.
  • Healthcare Strength Medical coverage begins shortly after hire and is described as excellent with low monthly premiums. The offering includes medical with prescription plans plus dental, vision, telemedicine, wellness programs, and an EAP.
  • Retirement Support A 401(k) with employer match is part of the package, with profit sharing and even pension noted in some cases. These retirement elements are presented as a strong component of the total rewards.

United States Cold Storage Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Camden, NJ
4,300 Employees
Year Founded: 1889

What We Do

For more than 125 years, United States Cold Storage (USCS) has protected America’s food supply through a national network of 39 world‑class, temperature‑controlled facilities across 14 states. Today, that mission is driven as much by software as it is by steel and concrete. Behind every truck, pallet, and movement is a modern, in‑house engineering organization building the mission‑critical systems that run our warehouses. Our teams develop and operate the technology that tracks inventory in real time, optimizes labor and throughput, integrates automation and robotics, and connects seamlessly with our partners’ supply chains. We work across multiple clouds and databases, leverage modern data platforms like Snowflake, and solve complex computer science problems—from optimization to routing at scale. With more than 4,300 employees and a 200+ person IT organization operating like a modern product and engineering group, USCS offers the rare combination of stability, scale, and complex technical challenges. We do whatever it takes to help our partners profitably reach their goals—whether that’s primary cold storage or fully integrated third‑party logistics—because when the temperature matters, getting it right matters even more. Our facilities are located in: California, Delaware, Florida, Georgia, Illinois, Indiana, Nebraska, North Carolina, Pennsylvania, Tennessee, Texas, Utah, and Virginia.

Why Work With Us

USCS is a family built on trust, teamwork, and pride. From freezer floors to engineering, everyone helps build the next generation of cold storage with practical tech that makes operations safer and smarter—because here, you don’t just keep food moving—you build how it moves next.

Similar Jobs

MongoDB Logo MongoDB

Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
7 Locations
5550 Employees
127K-249K Annually

Domino Data Lab Logo Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning
Easy Apply
Remote or Hybrid
US
200 Employees
200K-230K Annually

New York Life Insurance Company Logo New York Life Insurance Company

Site Reliability Engineer

Artificial Intelligence • Cloud • Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Hybrid
Lebanon, NJ, USA
12000 Employees
100K-143K Annually

Akamai Technologies Logo Akamai Technologies

Site Reliability Engineer

Cloud • Security • Software • Cybersecurity
In-Office or Remote
2 Locations
10285 Employees
95K-171K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Artificial Intelligence • Security • Software • Analytics • Big Data Analytics
Lake Oswego, OR
1500 Employees
Amalgamated Sugar Thumbnail
Food • Greentech • Agriculture • Industrial • Manufacturing
Boise, Idaho
768 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account