Key Responsibilities
- Own the architecture, stability, scalability, and performance of the system.
- Design and implement platform features that support both synchronous low-latency and asynchronous compute-heavy algorithm execution.
- Enhance GPU management, scheduling, and resource allocation for optimal performance and cost-efficiency.
- Ensure robust Kubernetes-based deployment and observability for a highly dynamic system.
- Act as the technical bridge between Research and Application teams by translating requirements into scalable system designs.
- Collaborate closely with algorithm developers to streamline model deployment processes.
- Partner with backend engineers (primarily working in Ruby and Go) to integrate the research group algorithms into Cloudinary services.
- Advocate for high standards in code quality, observability, testing, and security.
- Guide engineering integration efforts when consuming the different platform APIs.
- Provide mentorship, support, and best practices to other engineers interacting with the platform.
- Take part in general R&D efforts, supporting a broader production environment.
- Contribute to the evolution of our platform to support a wider range of algorithmic workloads and model types.
- Help shape tooling and infrastructure for model versioning, rollout, monitoring, and testing.
- Collaborate with DevOps and Infrastructure teams to maintain operational excellence, system observability, and robust infrastructure support
Your Qualifications
- 8+ years of experience in software engineering, with 3+ years working on infrastructure/platforms involving ML/AI, GPU, or data-heavy systems.
- Proficiency in Python and familiarity with backend languages such as Ruby and/or Go.
- Strong understanding of Kubernetes internals and experience running GPU workloads in production environments.
- In-depth knowledge of AWS services.
- Experience architecting systems that support both real-time and asynchronous processing pipelines.
- Familiarity with the ML lifecycle and MLOps practices, including CI/CD for models, monitoring, and rollback strategies.
Bonus Qualifications
- Experience working in research-driven environments or alongside data scientists, algorithm research team and ML engineers.
- Contributions to open-source projects related to model serving, Kubernetes operators, or ML platforms.
- Experience supporting systems with diverse user groups across engineering and research disciplines.
Why Join Us?
- Opportunity to build and scale a one-of-a-kind platform powering state-of-the-art media algorithms.
- Collaborate with world-class research, engineering, and product teams.
- Have a direct impact on product experiences used by millions of developers and end-users.
- Be part of a culture that values creativity, autonomy, and continuous improvement.
Top Skills
What We Do
Cloudinary’s mission is to empower companies to deliver visual experiences that inspire and connect by unleashing the full potential of their media. With more than 50 billion assets under management and 7,500 customers worldwide, Cloudinary is the industry standard for developers, creators and marketers looking to upload, store, transform, manage, and deliver images and videos online. As a result, leading brands like Atlassian, Bleacher Report, Grubhub, Hinge, NBC, Mediavine, Peloton, Petco and Under Armour are seeing significant business value in using Cloudinary, including faster time to market, higher user satisfaction, and increased engagement and conversions. For more information, visit www.cloudinary.com.

.png)





