United States Cold Storage Jobs

SITE RELIABILITY ENGINEER

United States Cold Storage

SITE RELIABILITY ENGINEER

Reposted 9 Hours Ago

Camden, NJ, USA

Hybrid

130K-150K Annually

Mid level

Information Technology • Logistics • Transportation • Analytics • Business Intelligence • 3PL: Third Party Logistics • Industrial

Best in Cold — Redefining Cold Storage Through Technology

The Role

As a Site Reliability Engineer, you'll enhance reliability for Phenix WMS and automation systems, focusing on incident reduction and system health through observability and automation. Responsibilities include defining SLIs and SLOs, participating in incident response, and testing disaster recovery plans.

Summary Generated by Built In

Site Reliability Engineer (SRE)
Engineer Reliability into the Systems That Move the Nation’s Food SupplyWho We AreUS Cold owns and operates one of the most complex temperature-controlled logistics networks in North America. Every day, our systems coordinate the storage and movement of food at national scale across a network of state-of-the-art distribution centers, including multiple highly automated warehouse facilities.We continue to advance our core warehouse and logistics platforms. Our current focus is on modular, event-driven, API-first and cloud architectures. We continue to enhance reliability and accelerate engineering productivity by strengthening our SRE and AI practices. This is a large investment in innovation to continue to drive operational excellence at our facilities.If you want to build durable systems that operate in the physical world at scale, this is that opportunity. The RoleThe Site Reliability Engineer is a founding member of US Cold’s SRE practice.This role exists to move the organization from reactive operations to engineered reliability. You will study how our most critical systems fail — particularly our Phenix WMS and facility automation interfaces — and design controls, automation, and observability that reduce incidents over time.Success in this role means fewer false alerts, faster recovery, less manual intervention, and systems that heal themselves when possible.You will work closely with application, infrastructure, and operations teams and participate directly in on‑call and incident response.What You Will Own

Reliability of the Phenix WMS and its integration with facility automation systems (robotics, conveyors, and control interfaces)
Definition and implementation of SLIs and SLOs that measure meaningful system health, not just availability
Observability across the full stack, correlating cloud services, APIs, and on‑premise facility operations
Automation to eliminate operational toil, including patching, data corrections, restarts, and recovery tasks
Development of self‑healing behaviors for common failure modes
Participation in on‑call rotations and leadership of blameless post‑incident reviews
Design and execution of disaster recovery tests across SaaS, cloud, and on‑premise environments

This is hands‑on reliability engineering. The systems you improve will directly impact daily warehouse operations.Technical Environment

Hybrid environments spanning cloud and on‑premise infrastructure
Azure cloud services
Warehouse Management Systems (Phenix WMS) and facility automation interfaces
Observability tooling across logs, metrics, and alerting
Automation using Python, PowerShell, Bash, or Ansible
CI/CD tools and modern deployment practices
Exposure to containerized and distributed systems environments

What We’re Looking For

3+ years of experience in SRE, DevOps, Systems Engineering, or related roles
Strong Linux and Windows systems administration and troubleshooting skills
Hands‑on experience with automation and scripting
Experience designing and operating monitoring, alerting, and observability solutions
Practical experience working in Azure environments
Strong analytical skills and a bias toward eliminating root causes, not symptoms
Ability to collaborate across application, infrastructure, and operations teams
Experience supporting warehouse management systems or industrial automation platforms
Exposure to Kubernetes, microservices, or container orchestration
Hands on experience with infrastructure‑as‑code tools such as Terraform or Ansible
Understanding of distributed systems and high‑availability design
Experience with SRE practices such as SLO‑based operations, runbook automation, or chaos testing

Why This Role Is DifferentThis is not an inherited SRE function.
There is no mature framework to maintain.You will:

Help define what reliability means at US Cold
Work on systems that operate in the physical world
Engineer solutions that reduce toil and operational load
See the direct impact of your work on warehouse uptime and performance
Build practices that scale as the platform modernizes

This is an opportunity to grow as an SRE while helping establish the reliability foundation of a mission‑critical platform.Compensation & Structure

Location: Hybrid – Camden NJ

Reports to: IT – Site Reliability Engineering Manager

Salary Range: $130,000- $150,000

Operational Context

Systems operate continuously across warehouse facilities
Reliability failures have physical and operational consequences
On‑call participation is part of the role
Work occurs across cloud, SaaS, and on‑premise environments

Equal Opportunity Employer
This employer is required to notify all applicants of their rights pursuant to federal employment laws. For further information, please review the Know Your Rights notice from the Department of Labor.

Skills Required

3+ years of experience in SRE, DevOps, Systems Engineering, or related roles
Strong Linux and Windows systems administration and troubleshooting skills
Hands-on experience with automation and scripting
Experience designing and operating monitoring, alerting, and observability solutions
Practical experience working in Azure environments
Strong analytical skills and bias toward eliminating root causes
Experience supporting warehouse management systems or industrial automation platforms
Exposure to Kubernetes, microservices, or container orchestration
Hands-on experience with infrastructure-as-code tools such as Terraform or Ansible
Understanding of distributed systems and high-availability design
Experience with SRE practices such as SLO-based operations, runbook automation, or chaos testing

United States Cold Storage Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about United States Cold Storage and has not been reviewed or approved by United States Cold Storage.

Fair & Transparent Compensation — Pay is considered fair or competitive for the work and area by many. Frontline warehouse and forklift roles often describe the pay as very fair or even strong for relatively straightforward work.
Healthcare Strength — Medical coverage begins shortly after hire and is described as excellent with low monthly premiums. The offering includes medical with prescription plans plus dental, vision, telemedicine, wellness programs, and an EAP.
Retirement Support — A 401(k) with employer match is part of the package, with profit sharing and even pension noted in some cases. These retirement elements are presented as a strong component of the total rewards.

Learn more about United States Cold Storage's Compensation & Benefits →

United States Cold Storage Insights

What's It Like to Work at United States Cold Storage? United States Cold Storage Culture & Values United States Cold Storage Career Growth & Development What's the Work-Life Balance Like at United States Cold Storage? United States Cold Storage Leadership & Management United States Cold Storage Company Growth, Stability & Outlook

View all jobs at United States Cold Storage

View United States Cold Storage Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Camden, NJ

4,300 Employees

Year Founded: 1889

What We Do

For more than 125 years, United States Cold Storage (USCS) has protected America’s food supply through a national network of 39 world‑class, temperature‑controlled facilities across 14 states. Today, that mission is driven as much by software as it is by steel and concrete. Behind every truck, pallet, and movement is a modern, in‑house engineering organization building the mission‑critical systems that run our warehouses. Our teams develop and operate the technology that tracks inventory in real time, optimizes labor and throughput, integrates automation and robotics, and connects seamlessly with our partners’ supply chains. We work across multiple clouds and databases, leverage modern data platforms like Snowflake, and solve complex computer science problems—from optimization to routing at scale. With more than 4,300 employees and a 200+ person IT organization operating like a modern product and engineering group, USCS offers the rare combination of stability, scale, and complex technical challenges. We do whatever it takes to help our partners profitably reach their goals—whether that’s primary cold storage or fully integrated third‑party logistics—because when the temperature matters, getting it right matters even more. Our facilities are located in: California, Delaware, Florida, Georgia, Illinois, Indiana, Nebraska, North Carolina, Pennsylvania, Tennessee, Texas, Utah, and Virginia.

Why Work With Us

Amalgamated Sugar

Food • Greentech • Agriculture • Industrial • Manufacturing

Boise, Idaho

768 Employees

Golden Pet Brands

Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media