Senior DevOps Engineer (Cloud & ML Infastructure)

Posted Yesterday
Be an Early Applicant
Hiring Remotely in Greece
Remote
Senior level
Analytics
The Role
As a Senior DevOps Engineer, you'll design, operate, and enhance cloud-native infrastructures, focusing on ML workloads, and ensuring system reliability and performance.
Summary Generated by Built In
At Kpler, we are dedicated to helping our clients navigate complex markets with ease. By simplifying global trade information and providing valuable insights, we empower organisations to make informed decisions in commodities, energy, and maritime sectors.

Since our founding in 2014, we have focused on delivering top-tier intelligence through user-friendly platforms. Our team of over 700 experts from 35+ countries works tirelessly to transform intricate data into actionable strategies, ensuring our clients stay ahead in a dynamic market landscape. Join us to leverage cutting-edge innovation for impactful results and experience unparalleled support on your journey to success.


Your future position
 

As a Senior Platform Engineer you will join the Cloud Platform team to design, operate, and evolve Kpler’s cloud-native infrastructure supporting backend, data, and ML workloads. You will operate within the existing platform engineering framework and contributes to overall reliability, scalability, and cost efficiency of the platform.  In addition, you will bring hands-on experience running ML/AI and GPU-based workloads in production, helping the team standardize and strengthen this scope as it grows. This is a senior+ individual contributor role combining operational excellence, architectural input, and hands-on execution in a 24/7 production environment.


Key Responsibilities

  • Design, operate, and improve Kpler’s cloud-native infrastructure (Kubernetes, networking, compute, storage).

  • Contribute to Infrastructure as Code, CI/CD pipelines, and platform automation.

  • Ensure high availability, reliability, and security of production systems.

  • Improve observability, monitoring, alerting, and incident response processes.

  • Reduce MTTR and failure rates through structured reliability improvements.

  • Optimize infrastructure cost and performance, including compute-intensive workloads.

  • Support and help standardize ML/GPU-based workloads within the existing platform model.

  • Collaborate closely with ML engineers, data engineers, and backend teams to ensure production-grade deployments.

  • Contribute to architectural decisions shaping the evolution of the platform.

Experience & Background

    Essential:

  • 5+ years of experience in cloud/platform engineering in production environments.

  • Strong hands-on experience with Kubernetes in production.

  • Experience with Infrastructure as Code (Terraform preferred).

  • Strong knowledge of AWS (or equivalent cloud provider).

  • Experience operating distributed systems in 24/7 environments.

  • Strong operational mindset (SLOs, monitoring, incident management).


  • Desirable:

  • Proven experience running ML/AI workloads in production.

  • Experience with GPU-based workloads.

  • Exposure to LLM-based or compute-intensive systems.

  • Experience optimizing cost and performance of high-compute infrastructure

Skills & Competencies

    Technical / Functional Skills:
  • Strong cloud platform engineering expertise (AWS preferred).

  • Advanced Kubernetes operations in production (scaling, upgrades, workload isolation, troubleshooting).

  • Solid Infrastructure as Code experience (Terraform or equivalent).

  • Strong understanding of distributed systems and cloud-native architectures.

  • Experience designing and operating CI/CD pipelines.

  • Strong observability practices (monitoring, logging, alerting, SLO definition).

  • Incident management and root cause analysis in 24/7 systems.

  • Infrastructure cost optimization and performance tuning.

  • Solid programming skills (Python or Go preferred).

  • Practical experience supporting ML/AI or GPU-based workloads in production (highly valued).

  • Behavioural Competencies: 

  • Ownership & Accountability – Takes end-to-end responsibility for production systems and reliability outcomes.

  • Systems Thinking – Understands architectural trade-offs and long-term impact of technical decisions.

  • Structured Problem Solving Under Pressure – Maintains clarity and effectiveness during incidents and high-stakes situations.

  • Collaborative & Autonomy – Communicates clearly in distributed teams, documents decisions effectively, and works autonomously while maintaining strong cross-team alignment

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience.

  • Strong programming skills (Python or Go preferred).

  • Solid understanding of cloud-native architecture and reliability engineering principles.

We are a dynamic company dedicated to nurturing connections and innovating solutions to tackle market challenges head-on. If you thrive on customer satisfaction and turning ideas into reality, then you’ve found your ideal destination. Are you ready to embark on this exciting journey with us?

We make things happen
We act decisively and with purpose, going the extra mile.

We build
together
We foster relationships and develop creative solutions to address market challenges.

We are here to help
We are accessible and supportive to colleagues and clients with a friendly approach.


Our People Pledge

Don’t meet every single requirement? Research shows that women and people of color are less likely than others to apply if they feel like they don’t match 100% of the job requirements. Don’t let the confidence gap stand in your way, we’d love to hear from you! We understand that experience comes in many different forms and are dedicated to adding new perspectives to the team.

Kpler is committed to providing a fair, inclusive and diverse work-environment. We believe that different perspectives lead to better ideas, and better ideas allow us to better understand the needs and interests of our diverse, global community. We welcome people of different backgrounds, experiences, abilities and perspectives and are an equal opportunity employer.



By applying, I confirm that I have read and accept the Staff Privacy Notice

Top Skills

AWS
Go
Kubernetes
Python
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Brussels
138 Employees
Year Founded: 2014

What We Do

Kpler is the leading data & analytics firm providing real-time transparency in commodity markets. Relying on a methodology that combines artificial and human intelligence, the Kpler platform provides real-time data and analytics (global flows, storage, freight) on more than 40 commodities including crude oil, refined products, LNG, LPG, and dry bulk.

Similar Jobs

Apollo Next LTD Logo Apollo Next LTD

Junior Crypto Trader (Remote)

Blockchain • Fintech • Analytics • Financial Services • Cryptocurrency • Web3
Remote
13 Locations
57 Employees
2-5 Annually

Mondelēz International Logo Mondelēz International

Senior Analyst - Security Operations Center

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
Greece
90000 Employees

Mondelēz International Logo Mondelēz International

S4/o9 Training and Capability Lead

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
12 Locations
90000 Employees

Similar Companies Hiring

Northslope Technologies Thumbnail
Software • Information Technology • Generative AI • Consulting • Artificial Intelligence • Analytics
Denver, CO
88 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account