What You Will Do:
- Service Reliability: Define and track SLIs/SLOs & error budgets for backend APIs and mobile release health. Hold teams accountable to reliability goals.
- Incident Management: Lead the on-call rotations, coordinate incident response, run post-mortems, and eradicate root causes.
- Observability & Tooling: Own Datadog dashboards, log pipelines, crash analytics (Firebase / Sentry), and feature-flag metrics (LaunchDarkly / ConfigCat).
- Automation & Elimination of Toil: Write tools and self-healing runbooks in Kotlin, Rust, Go, or Python for rollbacks, DB failovers, chaos tests, and config drift detection.
- Capacity & Performance: Forecast load, run stress / load tests, tune JVM & Graal settings for Kotlin services, and advise on RDS & Redis scaling.
- Disaster Recovery & Chaos Engineering: Design BCP/DR playbooks; run game days to validate recovery objectives.
- Cost & FinOps: Instrument cost metrics and collaborate with Finance to keep AWS spend within agreed “cost budgets.”
- Security & Compliance Support: Monitor GuardDuty / CSPM alerts, be prepared and participate in security incident response.
- Developer Partnership: Pair with mobile & backend engineers on instrumentation, release gates, and staged roll-outs; mentor teams in SLO thinking via brown-bag sessions.
What We Are Looking For:
- 3 + years in SRE, DevOps, or backend engineering for high-traffic services
- Proficient in at least one of Kotlin / Java, Rust, Go, or Python
- Deep Linux & networking fundamentals and hands-on AWS (ECS, ALB/NLB, RDS, S3, IAM, CloudWatch)
- Production experience with Datadog (or Prometheus / OpenTelemetry) for metrics, traces, and logs
- Incident response expertise: runbooks, RCA, post-mortems, and blameless culture
- Practical knowledge of relational DB (PostgreSQL/RDS) and Redis operations
- Familiarity with Kubernetes (EKS) concepts, Helm/OPA, container networking, and rolling releases
- Excellent communication skills; able to coach developers and influence process improvements.
Nice to have:
- AWS, CKA certifications
- Experience with feature-flag systems, chaos-engineering tools
- Prior work in regulated or enterprise-integrated environments (e.g., automotive, fintech)
Similar Jobs
What We Do
Drivemode enables smarter, safer, connected driving in any vehicle.
Drivemode was founded in 2014 by entrepreneurs from Zipcar and Tesla Motors who set out to fundamentally change the way consumers use technology in the car. Drivemode offers a mobile-based connected car platform through a consumer-facing Android app, driver assistance and analytics for fleet managers, and a bring-your-own-device connected car solution for automakers. The Drivemode app transforms a user’s phone into a car’s central computing device allowing voice-to-text messaging, music player overlay on navigation, “Do Not Disturb” mode, message auto-reply, and personalized travel recommendations. The Drivemode app has an automotive-grade interface designed and developed to adhere to National Highway Traffic Safety Administration safety guidelines for driving apps. Drivemode has raised $9.2M from industry leaders. Learn more at https://drivemode.com or download Drivemode at bit.ly/getdrivemode.









