Senior DevOps / SRE Engineer

Posted Yesterday
Be an Early Applicant
Prague, CZE
Hybrid
Senior level
Artificial Intelligence • Machine Learning • Software
The Role
Lead SRE initiatives to improve reliability, scalability, and performance across distributed systems. Build automation and infrastructure, enhance observability and incident response, tune platform performance and costs, resolve complex production issues, and document runbooks. Collaborate with Platform, Security, AI Platform, and Product teams to embed SRE best practices and drive durable improvements.
Summary Generated by Built In

We are looking for a Senior Site Reliability Engineer to join our Site Reliability Engineering (SRE) team. In this role, you'll drive the reliability, scalability, and performance of our platform, ensuring our systems remain stable as we grow. We value innovation and are seeking someone eager to bring fresh ideas – especially around building automation that reduces manual effort and improving distributed systems resilience.

This isn't a top-down organization; our engineers are the ones who flag technical challenges and design the solutions. You will collaborate closely with Platform Engineering, Security, AI Platform, and Product teams to design durable systems and make data-driven operational decisions.

What You'll Do
  • Collaborate with Engineering, Platform, and Security teams to embed SRE best practices early in system design.

  • Lead advancements in observability, monitoring, alerting, and incident-response workflows.

  • Analyze platform performance to contribute to cost-optimization, performance tuning, and resilience planning.

  • Build infrastructure and automation tooling that improves platform reliability and enhances deployment safety.

  • Diagnose and resolve complex production issues across distributed systems, and drive open post-incident reviews so failures translate into durable improvements.

  • Strengthen system consistency and author clear, concise documentation for runbooks and operational processes.

Who You Are
  • 4+ years of experience in SRE, DevOps, platform engineering, or similar production-facing roles.

  • Strong problem-solving and debugging skills in distributed systems to maintain higher platform stability.

  • Eager to share operational guidelines, champion SRE practices across teams, and openly discuss what we can learn from system failures.

  • Excellent communication skills (English is our default language) with a genuine, collaborative approach to working across diverse engineering teams.

  • Strong hands-on experience with cloud environments (AWS, GCP, or similar) and proficiency with infrastructure-as-code and CI/CD pipelines.

  • Familiarity with Kubernetes (or container orchestration), event-driven architectures, or supporting ML/AI workloads and GPU infrastructure.

What Success Looks Like:

Within 3 Months:

  • Fully onboarded into the Rossum ecosystem, gaining a deep understanding of our infrastructure, observability stack, and SRE processes while building relationships across the team.

  • Gaining a deep understanding of our synergy with Coupa and our shared roadmap.

  • Initial Impact Goal: Improve a small reliability issue or add value to an existing automation or monitoring area.

Within 6 Months:

  • Independently managing key responsibilities, owning recurring reliability tasks, and identifying areas for strategic improvement.

  • Actively participating in the alignment of processes within the new Coupa organizational structure.

  • Operational KPI: Implement measurable enhancements to alert quality, CI/CD reliability, or service health metrics.

Within 12 Months:

  • Recognized as a subject matter expert within the team, navigating the global Coupa ecosystem.

  • Successfully contributing to Rossum's mission at a massive scale using new global resources.

  • Long-Term Strategic Goal: Lead a major reliability or infrastructure initiative, providing technical recommendations to guide our long-term reliability strategy.

Why Join Us?

At Rossum, we're on a mission to free the world from boring manual data entry. Our AI platform helps companies save millions of hours, allowing professionals to focus on creative, impactful work.

In an exciting move for our future, we have joined forces with Coupa, the world's leading unified platform for Business Spend Management. By combining Rossum's cutting-edge document AI with Coupa's global ecosystem, we are uniquely positioned to redefine how businesses operate at a massive scale. You can read more about this exciting milestone and our shared vision in the official announcement here.

What sets us apart?

  • Cutting-edge AI technology reshaping how businesses operate globally.

  • A collaborative, supportive environment where autonomy thrives.

  • Opportunities to grow in a fast-scaling company.

  • A culture that values diversity, empathy, and genuine connection.

As part of the Coupa family, you'll enjoy the agility of a fast-moving, innovation-focused team with the stability and reach of a global market leader. For you, this means an even greater opportunity to make an impact, access new global markets, and grow your career within a collaborative culture that values autonomy, diversity, and genuine connection. Together, we're not just automating data—we're giving time back to the world's professionals.

What we offer

Future with Coupa: We are currently in an integration phase, during which we are reviewing and aligning our total rewards programs. Our goal is to blend Rossum's local culture with Coupa's global standards to provide you with a long-term future featuring clear career pathways, tailored learning journeys, and world-class development opportunities.

Current Benefits:

  • Flexible working models with a base in vibrant Prague and options for hybrid setup.

  • Competitive benefits designed to support your well-being, growth, and work-life harmony.

  • 5 weeks of vacation, 5 sick/personal days, and extra 2 weeks of paternity leave.

  • Personal development, education, and language courses budget.

  • High-end tech (MacBook, external monitor, keyboard of your choice) and a MultiSport card.

  • Team offsites, regular meetups, and a friendly, ambitious team.

Ready to make an impact in your next role? Apply now!

Skills Required

  • 4+ years of experience in SRE, DevOps, platform engineering, or similar production-facing roles
  • Strong problem-solving and debugging skills in distributed systems
  • Excellent communication skills (English)
  • Strong hands-on experience with cloud environments (AWS, GCP, or similar)
  • Proficiency with infrastructure-as-code and CI/CD pipelines
  • Familiarity with Kubernetes or other container orchestration
  • Familiarity with event-driven architectures, supporting ML/AI workloads, or GPU infrastructure
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
London
188 Employees
Year Founded: 2017

What We Do

Rossum solves four key steps in document-based processes... receiving documents across multiple channels, automated understanding, two-way communication to resolve exceptions, and acting on the data using in-depth integrations. In typical real-world scenarios, Rossum’s proprietary AI engine outranks narrow data extraction solutions in accuracy. Meanwhile, Rossum’s platform automates the document-based communication process end-to-end. Rossum’s goal for every use case is at minimum a 90% document processing speed increase. What does Rossum bring to the table? Zero-friction deployment: See high AI accuracy right out of the box in Rossum’s free trial and cut down on most maintenance effort thanks to cloud hosting and automated self-learning. Highly customizable: Implement powerful configuration APIs while enterprise users can engage Rossum’s dedicated Global Services team. Unified document gateway: Solve everything from security and compliance to IT and user training in one place by adopting a universally capable document solution. End-to-end solution: Rossum’s cloud platform takes care of the entire document lifecycle from receiving to internal IT systems posting. Security and compliance: Rossum is ISO 27001 certified and HIPAA compliant. The cloud service has been specifically engineered for high availability, with enterprise-grade SLAs ranging up to a 99.9% uptime guarantee and 24/7 support

Similar Jobs

Mondelēz International Logo Mondelēz International

o9 Change Manager MEU/CEE

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
7 Locations
90000 Employees

Teya Logo Teya

Junior Software Engineer

Fintech • Payments • Financial Services
Hybrid
Prague, CZE
1000 Employees

Rapid7 Logo Rapid7

Senior Front-end Engineer

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
Prague, CZE
2400 Employees

Pfizer Logo Pfizer

Platform Engineer

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
In-Office or Remote
36 Locations
121990 Employees
65K-109K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account