Production Engineer (DevOps & Backend) - Amari AI

Posted 25 Days Ago
San Francisco, CA
In-Office
Mid level
Angel or VC Firm
The Role
Ensure production reliability, own CI/CD pipelines, manage AI infrastructure and databases, build observability and alerting, lead incident response and internal backend tooling to improve deployment frequency and reduce MTTD/MTTR.
Summary Generated by Built In
About Us

Global trade still runs on outdated, manual workflows - we are fixing that by building AI agents for the logistics industry. Our AI works alongside humans, automating document-heavy tasks so companies can process shipments faster and with fewer errors.

We have moved past the "zero-to-one" phase and have achieved clear product-market fit. We are currently seeing rapid traction with >100% MoM revenue growth and are already deployed with customers processing meaningful operational volume. We've raised $5M from First Round Capital and Pear VC and are now scaling our platform's breadth and depth. Our deeply technical team comes from Google, LinkedIn, Salesforce and top schools and AI research labs.

The Role

We are looking for a Production Engineer who lives at the intersection of software development and systems engineering. Your mission is to ensure our production environment is rock-solid, automated, and observable. You will own our CI/CD pipelines, manage our AI infrastructure, and build the internal tools that empower our development team to ship code faster and more reliably.

Key Responsibilities

1. Reliability & Infrastructure (The Core)

Availability: Own the "uptime" of our services. Design and implement self-healing systems to minimize downtime and manual intervention.

CI/CD & Deployments: Architect and manage robust deployment pipelines to ensure feature releases are seamless and reversible.

AI Infrastructure: Manage specialized pipelines for AI and human-in-the-loop systems

Databases and compliance: Manage database operations, performance tuning, backups, compliance.

Scalability: Monitor system performance and proactively scale infrastructure to handle traffic spikes.

2. Observability & Metrics

Monitoring: Build and maintain comprehensive dashboards using tools like Prometheus, Grafana, or Datadog.

Alerting: Define and implement "Golden Signals" (Latency, Traffic, Errors, and Saturation) to ensure we know about issues before our customers do.

Incident Response: Lead the "Post-Mortem" process - analyzing why things broke and writing code to ensure they never break the same way twice.

3. Internal Tooling & Backend Development

Custom Tooling: Use your backend skills (Python preferably) to build internal CLI tools, automated scripts, and status dashboards.

Developer Experience: Act as a bridge for the dev team, making "the right way to deploy" the "easiest way to deploy."


Technical Requirements

Backend Proficiency: Strong experience in at least one backend language (e.g., Python, Go, Java) to contribute to internal tools and understand application logic.

Infrastructure as Code (IaC): Hands-on experience with Terraform, CloudFormation, or Ansible.

Containerization: Deep knowledge of Docker and orchestration (Kubernetes/ECS).

Cloud Platforms: Good-level knowledge of GCP

CI/CD Tools: Experience with GitHub Actions, GitLab CI, or Jenkins.

Success Metrics (The "How We'll Measure You")

To be successful in this role, you will be responsible for improving and maintaining:

MTTD/MTTR: Mean Time to Detect and Mean Time to Recover from incidents.

Deployment Frequency: How often we can safely ship code to production.

Change Failure Rate: The percentage of deployments that result in a rollback or failure.

SLA/SLO Compliance: Meeting our uptime and performance targets for customers.

Is this the right fit?

You are a great fit if: You find yourself "automating away" repetitive tasks and get genuinely excited when you see a perfectly tuned Grafana dashboard. You don't just want to write code; you want to see that code survive and thrive in the wild.

Top Skills

Ansible
CloudFormation
Datadog
Docker
Ecs
GCP
Github Actions
Gitlab Ci
Go
Grafana
Java
Jenkins
Kubernetes
Prometheus
Python
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Menlo Park, CA
154 Employees
Year Founded: 2013

What We Do

We’re specialists in pre-seed and seed. The startups we back go far. Best-in-class founders do not come around every day. When they do, we jump at the opportunity to work together. Our approach is to work with just a small number of best-in-class founders so we can dig in and go deep.

Similar Jobs

Anduril Logo Anduril

Technical Operations Engineer, Bolt

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Costa Mesa, CA, USA
6000 Employees
113K-1M Annually

CrowdStrike Logo CrowdStrike

Infrastructure Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
2 Locations
10000 Employees

CrowdStrike Logo CrowdStrike

Patent Attorney (Remote)

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
USA
10000 Employees

CrowdStrike Logo CrowdStrike

Security Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
2 Locations
10000 Employees

Similar Companies Hiring

Cie Thumbnail
Software • Enterprise Web • Digital Media • Consulting • Co-Working Space or Incubator • Angel or VC Firm • Agency
Irvine, CA
65 Employees
M13 Thumbnail
Angel or VC Firm
New York, NY
40 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account