Site Reliability Engineering Manager
Ensure Reliability of Systems that Move the Nation's Food Supply
Who We Are:
United States Cold Storage owns and operates one of the most complex temperature-controlled logistics networks in North America. Every day, our systems coordinate the storage and movement of food at national scale across a network of state-of-the-art distribution centers, including multiple highly automated warehouse facilities.
We continue to advance our core warehouse and logistics platforms. Our current focus is on modular, event-driven, API-first and cloud architectures. We continue to enhance reliability and accelerate engineering productivity by strengthening our SRE and AI practices. This is a large investment in innovation to continue to drive operational excellence at our facilities.
If you want to build durable systems that operate in the physical world at scale, this is that opportunity.
The Role:
The Site Reliability Engineering Manager will design and implement the company’s SRE framework from the ground up.
You will define what reliability means at US Cold.
You will establish SLIs and SLOs.
You will modernize monitoring and incident response.
You will build the playbook others will follow.
This is both a hands-on technical role and a practice-building leadership position.
You will report to the Director of IT Operations and collaborate across Software Engineering, Customer Integration Technology, Data Engineering, Infrastructure, and Security.
What You Will Own:
- Establish the company’s first SRE practice including principles, standards, tooling, and operational processes.
- Define SLIs, SLOs, and error budgets across SaaS, on-prem, and custom services.
- Build reliability dashboards and executive-level reporting.
- Implement and evolve observability across logs, metrics, and distributed tracing.
- Mature incident response, outage management, and post-incident review processes.
- Partner with engineering to design resilient systems and reduce operational toil.
- Strengthen CI/CD reliability using safe deploy strategies such as canary and blue/green patterns.
- Implement cost visibility and cloud governance in partnership with Finance.
- Build runbooks, playbooks, and operational standards.
- Establish on-call structures and escalation clarity.
- Assist in hiring, mentoring, and developing future SRE team members.
This is foundational work. The systems and practices you design will shape how engineering operates for years.
Technical Environment:
- Azure cloud infrastructure
- Infrastructure as Code using Bicep, Terraform, or ARM
- GitHub Actions for CI/CD orchestration
- Safe deployment patterns including gated releases, canary, and blue/green
- Observability across logging, metrics, and distributed tracing
- Python scripting for automation and reliability tooling
- SaaS integrations, on-prem infrastructure, and custom-built services
What We’re Looking For:
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
- 5–7+ years in SRE, DevOps, Infrastructure, or Production Engineering.
- Hands-on ownership of production services.
- Proven experience implementing SLIs, SLOs, observability, and automation.
- Leadership in major incident response and post-incident reviews.
- Deep CI/CD expertise, particularly GitHub Actions.
- Strong Python scripting for automation and operational tooling.
- Practical knowledge of cloud cost optimization and FinOps principles.
- Ability to influence cross-functional teams
Why This Role Is Different:
This is not an inherited SRE function.
There is no existing framework to simply maintain.
You will:
- Define the reliability bar.
- Build the operating model.
- Influence architectural decisions.
- Establish executive-level visibility into system health.
- Create a culture where reliability is engineered, not reactive.
This is an opportunity to build something durable inside a company modernizing its core technology platform.
Compensation & Structure:
- Salary Range: $160,000.00 - $190,000.00/yr.
- Bonus Eligible
- Full-time, exempt
- Reports to: Director of IT Operations
- Travel less than 10%
- Location : Hybrid, Camden, NJ
Operational Context:
This role is primarily technical and office-based, with occasional interaction in operational environments depending on system needs.
Benefits Include:
If annual hours are attained, these benefits may apply. Medical, Dental, Vision, Prescription, Legal Insurance, Pet Discount, Critical Illness, Accident Insurance, Hospital Indemnity, Long Term Care + Permanent Life Insurance, Identity Theft Protection, Short Term Disability Insurance, Long Term Disability Insurance, Supplemental Disability Insurance, Basic Life Insurance, Accidental Death and Dismemberment Insurance, Supplemental Life Insurance, Supplemental Spouse Life Insurance, Child Life Insurance, Loan Solution, Health Flexible Spending Account, Dependent Flexible Spending Account, Telemedicine, Virtual Primary Care, Prescription Savings Plan, Prescription Specialty Copay Assistance Program, Weight Management Program, Chronic Condition Management, Care Navigator Program, 24/7 Nurse Line, Expert Medical Opinion, Precious Additions Maternity Program, Health Advocacy, Employee Assistance Program, Digital Cognitive Behavioral Therapy, Digital Physical Therapy, Behavioral and Mental Health Platforms, Auto and home discount program, Secure Travel Protection, Discount Programs, 401(k) plan, Education Assistance, Paid Time Off, Referral program & Commuter Benefit (NJ ONLY).
Physical & Operational Context:
May require physical effort associated with using the computer to access information, or occasional standing, walking, lifting needed to carry out everyday activities. Effective communication, vision, and hearing are essential for safety and productivity. Operate scanners, tablets, radios, phones, computers, and other essential equipment as required. Additional work hours may be requested by management to help manage employee production, projects, and/or special events. Engage in frequent personal interaction and communication. Attend in-person meetings and/or training on a regular basis. Possess strong arithmetic and reading skills. Follow verbal instructions, written instructions, and company policies. Work independently and coordinate with others. Fast-paced environment, managing stress and meeting productivity standards.
Additional Information:
Job functions may vary based on the area of operation. This description outlines the most common tasks required for the job. Reasonable accommodation may be provided to enable individuals with disabilities to perform essential duties. This job description may not encompass all tasks necessary to complete the role.
This employer is required to notify all applicants of their rights pursuant to federal employment laws. For further information, please review the Know Your Rights notice from the Department of Labor.
Top Skills
What We Do
For more than 125 years, United States Cold Storage (USCS) has protected America’s food supply through a national network of 39 world‑class, temperature‑controlled facilities across 14 states. Today, that mission is driven as much by software as it is by steel and concrete.
Behind every truck, pallet, and movement is a modern, in‑house engineering organization building the mission‑critical systems that run our warehouses. Our teams develop and operate the technology that tracks inventory in real time, optimizes labor and throughput, integrates automation and robotics, and connects seamlessly with our partners’ supply chains. We work across multiple clouds and databases, leverage modern data platforms like Snowflake, and solve complex computer science problems—from optimization to routing at scale.
With more than 4,300 employees and a 200+ person IT organization operating like a modern product and engineering group, USCS offers the rare combination of stability, scale, and complex technical challenges. We do whatever it takes to help our partners profitably reach their goals—whether that’s primary cold storage or fully integrated third‑party logistics—because when the temperature matters, getting it right matters even more.
Our facilities are located in: California, Delaware, Florida, Georgia, Illinois, Indiana, Nebraska, North Carolina, Pennsylvania, Tennessee, Texas, Utah, and Virginia.
Why Work With Us
USCS is a family built on trust, teamwork, and pride. From freezer floors to engineering, everyone helps build the next generation of cold storage with practical tech that makes operations safer and smarter—because here, you don’t just keep food moving—you build how it moves next.








