We are looking for a Senior Site Reliability Engineer to join our Site Reliability Engineering (SRE) team. In this role, you'll drive the reliability, scalability, and performance of our platform, ensuring our systems remain stable as we grow. We value innovation and are seeking someone eager to bring fresh ideas – especially around building automation that reduces manual effort and improving distributed systems resilience.
This isn't a top-down organization; our engineers are the ones who flag technical challenges and design the solutions. You will collaborate closely with Platform Engineering, Security, AI Platform, and Product teams to design durable systems and make data-driven operational decisions.
What You'll DoCollaborate with Engineering, Platform, and Security teams to embed SRE best practices early in system design.
Lead advancements in observability, monitoring, alerting, and incident-response workflows.
Analyze platform performance to contribute to cost-optimization, performance tuning, and resilience planning.
Build infrastructure and automation tooling that improves platform reliability and enhances deployment safety.
Diagnose and resolve complex production issues across distributed systems, and drive open post-incident reviews so failures translate into durable improvements.
Strengthen system consistency and author clear, concise documentation for runbooks and operational processes.
4+ years of experience in SRE, DevOps, platform engineering, or similar production-facing roles.
Strong problem-solving and debugging skills in distributed systems to maintain higher platform stability.
Eager to share operational guidelines, champion SRE practices across teams, and openly discuss what we can learn from system failures.
Excellent communication skills (English is our default language) with a genuine, collaborative approach to working across diverse engineering teams.
Strong hands-on experience with cloud environments (AWS, GCP, or similar) and proficiency with infrastructure-as-code and CI/CD pipelines.
Familiarity with Kubernetes (or container orchestration), event-driven architectures, or supporting ML/AI workloads and GPU infrastructure.
Within 3 Months:
Fully onboarded into the Rossum ecosystem, gaining a deep understanding of our infrastructure, observability stack, and SRE processes while building relationships across the team.
Gaining a deep understanding of our synergy with Coupa and our shared roadmap.
Initial Impact Goal: Improve a small reliability issue or add value to an existing automation or monitoring area.
Within 6 Months:
Independently managing key responsibilities, owning recurring reliability tasks, and identifying areas for strategic improvement.
Actively participating in the alignment of processes within the new Coupa organizational structure.
Operational KPI: Implement measurable enhancements to alert quality, CI/CD reliability, or service health metrics.
Within 12 Months:
Recognized as a subject matter expert within the team, navigating the global Coupa ecosystem.
Successfully contributing to Rossum's mission at a massive scale using new global resources.
Long-Term Strategic Goal: Lead a major reliability or infrastructure initiative, providing technical recommendations to guide our long-term reliability strategy.
At Rossum, we're on a mission to free the world from boring manual data entry. Our AI platform helps companies save millions of hours, allowing professionals to focus on creative, impactful work.
In an exciting move for our future, we have joined forces with Coupa, the world's leading unified platform for Business Spend Management. By combining Rossum's cutting-edge document AI with Coupa's global ecosystem, we are uniquely positioned to redefine how businesses operate at a massive scale. You can read more about this exciting milestone and our shared vision in the official announcement here.
What sets us apart?
Cutting-edge AI technology reshaping how businesses operate globally.
A collaborative, supportive environment where autonomy thrives.
Opportunities to grow in a fast-scaling company.
A culture that values diversity, empathy, and genuine connection.
As part of the Coupa family, you'll enjoy the agility of a fast-moving, innovation-focused team with the stability and reach of a global market leader. For you, this means an even greater opportunity to make an impact, access new global markets, and grow your career within a collaborative culture that values autonomy, diversity, and genuine connection. Together, we're not just automating data—we're giving time back to the world's professionals.
What we offer
Future with Coupa: We are currently in an integration phase, during which we are reviewing and aligning our total rewards programs. Our goal is to blend Rossum's local culture with Coupa's global standards to provide you with a long-term future featuring clear career pathways, tailored learning journeys, and world-class development opportunities.
Current Benefits:
Flexible working models with a base in vibrant Prague and options for hybrid setup.
Competitive benefits designed to support your well-being, growth, and work-life harmony.
5 weeks of vacation, 5 sick/personal days, and extra 2 weeks of paternity leave.
Personal development, education, and language courses budget.
High-end tech (MacBook, external monitor, keyboard of your choice) and a MultiSport card.
Team offsites, regular meetups, and a friendly, ambitious team.
Ready to make an impact in your next role? Apply now!
Skills Required
- 4+ years of experience in SRE, DevOps, platform engineering, or similar production-facing roles
- Strong problem-solving and debugging skills in distributed systems
- Excellent communication skills (English)
- Strong hands-on experience with cloud environments (AWS, GCP, or similar)
- Proficiency with infrastructure-as-code and CI/CD pipelines
- Familiarity with Kubernetes or other container orchestration
- Familiarity with event-driven architectures, supporting ML/AI workloads, or GPU infrastructure
What We Do
Rossum solves four key steps in document-based processes... receiving documents across multiple channels, automated understanding, two-way communication to resolve exceptions, and acting on the data using in-depth integrations. In typical real-world scenarios, Rossum’s proprietary AI engine outranks narrow data extraction solutions in accuracy. Meanwhile, Rossum’s platform automates the document-based communication process end-to-end. Rossum’s goal for every use case is at minimum a 90% document processing speed increase. What does Rossum bring to the table? Zero-friction deployment: See high AI accuracy right out of the box in Rossum’s free trial and cut down on most maintenance effort thanks to cloud hosting and automated self-learning. Highly customizable: Implement powerful configuration APIs while enterprise users can engage Rossum’s dedicated Global Services team. Unified document gateway: Solve everything from security and compliance to IT and user training in one place by adopting a universally capable document solution. End-to-end solution: Rossum’s cloud platform takes care of the entire document lifecycle from receiving to internal IT systems posting. Security and compliance: Rossum is ISO 27001 certified and HIPAA compliant. The cloud service has been specifically engineered for high availability, with enterprise-grade SLAs ranging up to a 99.9% uptime guarantee and 24/7 support








