Senior Infrastructure & Reliability Engineer

Posted Yesterday
Be an Early Applicant
Melville, NY, USA
Hybrid
180K-180K Annually
Senior level
Software
The Role
Design, build, and operate production Linux/VMware infrastructure and containerized workloads; own incident response and RCA; tune and operate MySQL at scale; implement IaC/CI-CD (Terraform, Ansible, Docker); improve observability (Datadog); use AI-augmented engineering tools; produce runbooks and mentor teammates.
Summary Generated by Built In
The Opportunity  
 
If innovation lives in your DNA and AI is already part of how you think, build, and operate — you're going to love what we're doing at KWI. You'll join a small, senior team with a real mandate to design, build, and run the systems that power retail at scale. We move fast, we automate aggressively, and we expect every engineer to multiply their impact with modern tooling. Your fingerprints will be on the platform every day.
 
The Company
 
We are a small team with a big vision: to be the premier provider of cloud technology solutions for retailers. KWI offers a complete, unified commerce solution from a single database, specifically designed to help specialty retailers grow their business. Our portfolio of customers includes Pandora, Bluemercury, Tom Ford and many other globally recognizable brands.  
 
We combine Point of Sale, Merchandising, Order Management, eCommerce, CRM, and Loss Prevention into one cloud-based platform. We are a Values and Mission driven organization, and we believe that if we develop and demonstrate leadership in our strategy, operations, and people, we will continue to drive product innovation and service excellence. 

The impact you'll make
  • Support and operate our Linux/UNIX systems, VMware infrastructure, CI/CD pipelines, MySQL databases, and containerized workloads that serve our retail clients 24×7.
  • Own incidents end-to-end: triage alerts, drive root-cause analysis across the application, database, and network layers, and write the post-incident docs that stop recurrence.
  • Tune and operate MySQL at production scale: query analysis, replication topology, backup and recovery, and schema changes against live workloads.
  • Containerize and template services using Docker and infrastructure-as-code patterns to make deployments repeatable, declarative, and boring.
  • Improve observability across the fleet — metrics, logs, traces, and dashboards — so problems are seen before customers feel them.
  • Use modern AI-augmented engineering tools (Claude Code, MCP-based workflows, agentic automation) as a daily multiplier — to operate faster and extend what one engineer can deliver.
  • Document and mentor. Runbooks, design docs, and onboarding material aren't an afterthought here — they're how the team scales.

What you will bring
  • 5+ years operating production Linux/UNIX (RHEL, CentOS/Rocky, Debian/Ubuntu) at meaningful scale.
  • Strong MySQL operational experience — replication, performance tuning, backups, recovery, and schema migrations.
  • Hands-on VMware/vSphere experience in production environments.
  • Java application-tier troubleshooting experience — comfortable reading thread dumps, GC logs, and heap behavior.
  • Solid DevOps fundamentals: Git, CI/CD pipelines, Ansible (or similar configuration management), Terraform (or similar IaC), and Docker.
  • Networking literacy: TCP/IP, DNS, TLS, HTTP/S, load balancing, basic firewalling. You can read a tcpdump and a cert chain.
  • Comfortable scripting in Bash. Python is not required, but you should have a working understanding of programming fundamentals and be able to read, modify, and write straightforward code.
  • Strong troubleshooting instincts and the temperament lead under pressure.
  • Real day-to-day experience using AI-augmented engineering tools (Claude, Cursor, Copilot, MCP servers, agentic workflows) — not just demos.
  • Experience with Datadog or comparable observability platforms.

As a member of the KWI team you will receive
  • Full Medical, Dental and Vision
  • Annual bonus eligible
  • Free gym in the building 
  • Generous PTO policy
  • Summer Fridays....all year round
  • Tuition Reimbursement
  • Discount from building café
  • 401(K) with a 50% company match (up to 6% of employee contribution)
  • Employee Referral Program
  • (1) Volunteer day each year

Our work space

We understand that our teams need flexibility, which is why we follow a hybrid schedule. Our in-office days of Monday, Tuesday and Thursday, and employees are allowed to work remotely on Wednesdays and Friday.

We are also a collaborative group and believe that getting together in person allows our team to do their best work. Together we enjoy monthly events, bagels every Thursday, a state-of-the-art coffee machine, a full snack pantry and many more surprise and delight moments throughout the year.

Our commitment to you

At KWI, we know that cultivating diversity and fostering an inclusive work environment is critical to our impact and success. We create an environment where no individual is advantaged or disadvantaged because of their background. We offer equal opportunity employment regardless of race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability status, age, marital status, or protected veteran status.

With a commitment to maintaining a bias-free environment in which harassment is prohibited, we respect cultural diversity and comply with the laws of the places in which we operate. We expect our business partners, suppliers, clients, and all our team members to uphold these commitments.

About
KWI helps retailers maximize sales by uniting their online and in-store capabilities to deliver delightful shopper experiences. With KWI Merchandising and mobile POS, retailers can execute omnichannel flawlessly, and right at their fingertips — clienteling, endless aisle, mobile checkout with the latest payment options, inventory management, and ecommerce.

Skills Required

  • 5+ years operating production Linux/UNIX (RHEL, CentOS/Rocky, Debian/Ubuntu) at meaningful scale
  • Strong MySQL operational experience: replication, performance tuning, backups, recovery, schema migrations
  • Hands-on VMware/vSphere experience in production environments
  • Java application-tier troubleshooting: thread dumps, GC logs, heap behavior
  • DevOps fundamentals: Git, CI/CD pipelines, Ansible (or similar), Terraform (or similar IaC), and Docker
  • Networking literacy: TCP/IP, DNS, TLS, HTTP/S, load balancing, basic firewalling; able to read tcpdump and cert chains
  • Comfortable scripting in Bash; ability to read, modify, and write straightforward code (Python knowledge helpful but not required)
  • Strong troubleshooting instincts and ability to lead under pressure
  • Day-to-day experience using AI-augmented engineering tools (Claude, Cursor, Copilot, MCP servers, agentic workflows)
  • Experience with Datadog or comparable observability platforms
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Greenvale, NY

What We Do

Since 1985, we have delivered ongoing innovation – from SaaS to advancements in Unified Commerce that keep us in the forefront of cloud retail solutions. We offer a complete commerce solution for specialty retailers of all sizes. Our cloud-based technology powers all the solutions you need to create unified customer experiences.

Similar Jobs

New York Life Insurance Company Logo New York Life Insurance Company

Site Reliability Engineer

Artificial Intelligence • Cloud • Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Hybrid
New York, NY, USA
12000 Employees
112K-159K Annually

CoreWeave Logo CoreWeave

Senior Site Reliability Engineer

Cloud • Information Technology • Machine Learning
In-Office
2 Locations
1450 Employees
165K-242K Annually

MongoDB Logo MongoDB

Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
5 Locations
5550 Employees
127K-249K Annually

Andromeda (andromeda.ai) Logo Andromeda (andromeda.ai)

Senior Site Reliability Engineer

Artificial Intelligence • Cloud • Information Technology • Software
In-Office or Remote
8 Locations
17 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account