Platform/DevOps Engineer P-136

Sorry, this job was removed at 10:04 p.m. (CST) on Thursday, Jan 15, 2026
Hiring Remotely in United States
Remote
Information Technology • Software
The Role

SMASH, Who we are?
We believe in long-lasting relationships with our talent. We invest time getting to know them and understanding what they seek as their professional next step.

We aim to find the perfect match. As agents, we pair our talent with our US clients, not only by their technical skills but as a cultural fit. Our core competency is to find the right talent fast.

This position is remote within the United States. You must have U.S. citizenship or a valid U.S. work permit to apply for this role.
Role summary
You will own infrastructure reliability, observability, and cost optimization for a production platform serving multiple customers under a 99.5% uptime SLA. This role focuses on building resilient, secure, and cost-efficient cloud infrastructure while leading incident response, monitoring, and compliance readiness initiatives.

Responsibilities

  • Ensure 99.5% uptime SLA across all production services and customer environments.

  • Design and maintain multi-region deployments to support geographic redundancy.

  • Implement automated failover mechanisms for databases, load balancers, and critical services.

  • Build and manage disaster recovery strategies, including automated backups and point-in-time recovery.

  • Lead incident detection, response, and postmortems, meeting defined SLAs for P0 issues.

  • Develop real-time observability dashboards for uptime, latency, error rates, and system health.

  • Monitor application and infrastructure performance metrics across customers.

  • Implement alerting, on-call rotations, escalation policies, and PagerDuty integrations.

  • Manage log aggregation and retention using SIEM platforms such as Splunk or Sumo Logic.

  • Support SOC 2 Type II preparation through security controls, monitoring, and documentation.

  • Implement vulnerability scanning, penetration testing coordination, and DLP controls.

  • Optimize cloud infrastructure costs through right-sizing, auto-scaling, and storage lifecycle policies.

  • Track and report infrastructure and API costs per customer, driving FinOps best practices.

  • Build automated runbooks and self-healing workflows for common incidents.

Requirements – Must-haves

  • Strong experience as a Site Reliability Engineer, DevOps Engineer, or Platform Engineer.

  • Deep expertise in AWS cloud architecture (ECS, EKS, RDS, Lambda, S3, CloudFront).

  • Proven experience with Infrastructure as Code using Terraform or CloudFormation.

  • Hands-on production experience with Kubernetes and container orchestration.

  • Strong knowledge of observability and monitoring tools (Datadog, New Relic, Prometheus, Grafana).

  • Experience managing on-call rotations, incident response, and post-incident reviews.

  • Solid understanding of security practices including SIEM, vulnerability scanning, and SOC 2 compliance.

  • Demonstrated experience in cloud cost optimization and FinOps practices.

  • Ability to operate independently and prioritize reliability in high-availability environments.

Nice-to-haves (optional)

  • Experience supporting SOC 2 Type II audits.

  • Background working in regulated or compliance-heavy environments (PHI/PII).

  • Experience implementing DLP and document scanning solutions.

  • Familiarity with AI/ML workload cost optimization.

  • Experience supporting SaaS platforms with customer-isolated environments.

Similar Jobs

Easy Apply
Remote or Hybrid
USA
255 Employees
170K-201K Annually

Optum Logo Optum

Principal Data Scientist

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
San Diego, CA, USA
160000 Employees
110K-189K Annually

Optum Logo Optum

Principal Architect

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
Eden Prairie, MN, USA
160000 Employees
135K-231K Annually

Optum Logo Optum

Director, Corporate Strategy - Remote

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
New York, NY, USA
160000 Employees
113K-193K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Salt Lake City, UT
100 Employees
Year Founded: 2019

What We Do

We are coders, adventurers, and builders of career paths.
We help US companies build rockstar developer teams in Costa Rica and Colombia, not as contractors but as employees that fit naturally into the culture and work environment of the company. One single community from three different countries helping each other achieve more than they ever dreamed of.

Ready to pursue a fulfilling tech career? We are continuously looking for smart and skilled engineers. Check here our list of job openings and apply today! https://smash.cr/jobs.html

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account