TrueML Jobs

Sr. DevOps Engineer

TrueML

Sr. DevOps Engineer

Posted Yesterday

Hiring Remotely in United States

Remote

120K-155K Annually

Senior level

Fintech • Machine Learning • Payments • Social Impact • Software • Financial Services

TrueML is a fintech company building software to create positive experiences for consumers seeking financial health.

The Role

Lead development and execution of cloud-native infrastructure and IaC, build self-service developer platforms, optimize AWS cost and HA/DR, implement observability and AIOps, author automation in Python/Go/Bash, maintain Terraform, manage Kubernetes/Docker clusters, and drive CI/CD and incident response.

Summary Generated by Built In

Why TrueML?

TrueML is a mission-driven financial software company that aims to create better customer experiences for distressed borrowers. Consumers today want personal, digital-first experiences that align with their lifestyles, especially when it comes to managing finances. TrueML’s approach uses machine learning to engage each customer digitally and adjust strategies in real time in response to their interactions.

The TrueML team includes inspired data scientists, financial services industry experts and customer experience fanatics building technology to serve people in a way that recognizes their unique needs and preferences as human beings and endeavoring toward ensuring nobody gets locked out of the financial system.

TrueML Products is seeking a highly experienced Sr. DevOps Engineer I to serve as a core contributor on our infrastructure and platform engineering efforts. This role is critical in execution-focused cloud architecture, establishing robust CI/CD pipelines, and ensuring the absolute scalability, security, and reliability of our products.

Reporting to the Sr. Manager, DevOps, you will drive the day-to-day evolution of our internal developer platform and infrastructure-as-code (IaC) architecture. The ideal candidate is a deeply technical, hands-on engineer with a "systems-thinking" mindset. We are looking for a practitioner who thrives on solving complex distributed systems challenges and considers leveraging GenAI and AIOps tooling second-nature for optimizing system performance, monitoring, and automation.

What You'll Do (Technical Execution & Architecture):

Implement the technical roadmap for Infrastructure as Code (IaC), CI/CD evolution, and cloud-native architecture to support TrueML’s scaling needs.
Design, develop, and maintain self-service internal platforms to reduce developer cognitive load, enabling feature teams to deploy and manage services with minimal friction at increased velocity.
Act as a core steward for cloud spend (AWS), proactively identifying and driving cost-optimization initiatives across our infrastructure.
Build and maintain infrastructure architecture that supports strict High Availability (HA) requirements and robust Disaster Recovery (DR) protocols across multiple regions.
Implement and evolve comprehensive monitoring, logging, and distributed tracing systems, leveraging AIOps to move from reactive to predictive system maintenance.

What You'll Do (Deep-dive Hands-On Engineering):

Write and review high-quality, production-grade code in languages like Python, Go, or Bash to automate complex operational tasks and system integrations.

Drive hands-on development of robust Terraform Infrastructure as Code for reliable resource provisioning.

Directly architect, optimize, and troubleshoot complex CI/CD workflows (GitHub Actions, ArgoCD, Atlantis) to maximize build-and-deploy speed and reliability.

Proactively manage, fine-tune, and scale container orchestration environments, including hands-on configuration of Ingress controllers and declarative GitOps workflows.

Manage the technical integration and API configurations between various tools in the DevOps stack (e.g., connecting Jira, VictorOps, Slack, and Observe for seamless incident flow).

What You'll Do (Collaboration & Knowledge Sharing):

Partner closely with other Senior DevOps Engineers and Engineering Managers to align infrastructure deliverables with product roadmaps, ensuring DevOps acts as an accelerator.

Collaborate with Quality Engineering and Security teams to enforce "Definition of Done" standards that include automated testing and security gates.

Provide technical guidance to junior engineers on the team, fostering a culture of continuous learning.

Who You Are:

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.

6+ years of experience in DevOps, Site Reliability Engineering (SRE), or Software Engineering, working within high-performing senior engineering teams.

Expert-level mastery with AWS and hands-on experience managing multi-region, high-availability deployments.

Advanced experience with Kubernetes (K8s) and Docker, including cluster management, networking, and scaling in production environments.

High proficiency in Terraform to drive consistency and automation across all infrastructure layers (Experience with Atlantis is a plus).

Deep experience designing and maintaining complex pipelines (GitHub Actions, GitLab CI, or Jenkins) and mastery of scripting languages like Python, Go, or Bash.

Hands-on experience with modern monitoring, observability, and tracing stacks (Datadog, Observe) and a firm grasp of SRE principles (SLIs/SLOs/Error Budgets).

Experience acting as an Incident Commander or critical responder for high-severity outages.

Experience integrating AI-assisted productivity tools (Cline, GitHub Copilot) into your engineering workflow to accelerate delivery, troubleshooting, and system monitoring.

What We Offer (Perks & Benefits)

Flexible vacation
Medical/dental/vision insurance
Traditional/Roth retirement savings options
Company-paid disability and life insurance
Flexible Spending Account & Limited FSA
Family-friendly parental leave, volunteer and voting time off
On-demand wellness platform access for you and 5 friends and family
PerkSpot discount program for 900+ merchants nationwide

Remote Work, Travel Expectations & Physical Requirements:

This role supports a global, cross-functional business and operates primarily in a Remote-First environment. However, flexibility outside of standard business hours and occasional local or international travel may be necessary for global operations support, company meetings, training, offsites, and collaborative projects.

This position primarily involves computer-based work, requiring extended periods at a computer, participation in virtual meetings, and use of standard office technology. We will consider reasonable accommodations to enable individuals to perform the essential functions of the role.

Maintaining a reliable internet connection and a professional work environment is expected. The ability to protect confidential company, employee, customer, and business information while working outside of a company office is also required.

Personally Identifying Information

We collect personal information for employment purposes. We do not sell personal information. Most of the information we have is provided to us by you and/or collected as part of the employment process. For more details on how we use, share, and delete personal information see our Privacy Policy.

Dedication to Diversity & Inclusion

We are an equal opportunity employer. We promote, value, and thrive with a diverse and inclusive team. Different perspectives contribute to better solutions and this makes us stronger every day. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status, or other protected characteristics.

Skills Required

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
6+ years experience in DevOps, SRE, or Software Engineering
Expert-level AWS experience managing multi-region, high-availability deployments
Advanced Kubernetes and Docker experience including cluster management, networking, and scaling
High proficiency with Terraform for Infrastructure as Code
Hands-on experience designing and maintaining CI/CD pipelines (GitHub Actions, ArgoCD, GitLab CI, or Jenkins)
Production-grade scripting and development in Python, Go, or Bash
Experience with monitoring, observability, and distributed tracing (Datadog, Observe) and SRE principles (SLIs/SLOs/Error Budgets)
Experience acting as Incident Commander or critical responder for high-severity outages
Experience building self-service internal platforms, GitOps workflows, and configuring Ingress controllers
Experience integrating DevOps toolchain APIs (Jira, VictorOps, Slack, Observe) for incident/workflow automation
Experience integrating AI-assisted productivity tools (e.g., Cline, GitHub Copilot) into engineering workflows
Experience with Atlantis

What the Team is Saying

TrueML Compensation & Benefits Highlights

Healthcare Strength — Medical, dental, and vision coverage are offered with multiple plan options, including HSA‑eligible choices, alongside FSAs and employer‑paid life, AD&D, and short/long‑term disability. Wellbeing resources such as a 24/7 EAP and a wellness coaching app further bolster the health package.
Leave & Time Off Breadth — Paid time off is described as generous or unlimited with paid holidays and volunteer days, and a remote‑friendly setup supports flexibility in taking time away. Paid parental leave for birth or adoption is also included.
Wellbeing & Lifestyle Benefits — Perks include a home‑office stipend, retailer discounts via PerkSpot, travel assistance, and recognition rewards, complementing core benefits. These additions support day‑to‑day convenience and remote productivity.

Learn more about TrueML's Compensation & Benefits →

TrueML Insights

What's It Like to Work at TrueML? TrueML Culture & Values TrueML Career Growth & Development What's the Work-Life Balance Like at TrueML? TrueML Leadership & Management TrueML Company Growth, Stability & Outlook

View all jobs at TrueML

View TrueML Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

450 Employees

Year Founded: 2013

What We Do

TrueML makes financial technology that prioritizes customer experience and revolutionizes the experience of consumers seeking financial health. We’re a team of inspired data scientists, financial services industry experts, and customer experience fanatics creating experiences that serve people in a way that recognizes their unique needs and preferences as human beings and endeavoring to ensure nobody gets locked out of the financial system. After more than 10 years in business, TrueML is excited to be expanding its footprint internationally. We are a growing, geographically diverse team with employees in 30 U.S. states and 7 different countries, with our key talent hub in LATAM. If you’re looking for an opportunity to do impactful work, join TrueML and make a difference alongside hundreds of other inspired individuals.

Why Work With Us

Our functional teams are a diverse mix of employees from different backgrounds and geographies, with each individual bringing unique perspectives and experiences that encourage increased innovation in our products and services. Join TrueML and make a difference alongside hundreds of other inspired individuals doing impactful work.

Gallery

TrueML Offices

Learn More

Remote Workspace

Employees work remotely.

TrueML is excited to be a remote-first company with team members across the US, Canada, and several countries in LATAM (Mexico, Argentina, Dominican Republic, and Costa Rica). Our teams frequently digitally collaborate & socialize across borders.

Typical time on-site:

Argentina (Remote Hub)

Mexico (Remote Hub)

Dominican Republic (Remote Hub)

San Francisco, CA

Costa Rica (Remote Hub)

Learn more