Trendyol Group

Cloud Storage Engineer (Ceph)

Reposted 18 Days Ago

Be an Early Applicant

2 Locations

Remote or Hybrid

Senior level

eCommerce

The Role

The Cloud Storage Engineer is responsible for operating and improving Ceph clusters in a private cloud, driving DR strategies, and developing automation tools.

Summary Generated by Built In

About the Team

At Trendyol Tech, our mission is to create a positive impact in our ecosystem by enabling commerce through technology.

We solve complex problems with data, creativity, and agility — always driven by real outcomes. With a culture built on learning, collaboration, and ownership, we grow together while building what’s next.

About the Role

This role focuses on building, operating, and continuously improving the core storage backbone of a large-scale private cloud. You will take technical leadership over Ceph, ensuring its performance, availability, and scalability as the platform grows. The position combines deep operations knowledge with modern automation and software engineering practices.

You will work on multi-site Ceph architectures, drive DR strategies, and contribute to object storage, block storage, and file system solutions consumed by OpenStack and Kubernetes environments. The role requires hands-on expertise across Ceph OSDs, MON/MGR services, RGW, CRUSH maps, placement groups, and performance tuning.

Beyond day-to-day operations, you will build tooling, improve monitoring, participate in incident response, lead capacity planning, and collaborate with other infrastructure and platform teams to align storage capabilities with broader cloud initiatives. This role is ideal for an engineer who wants deep technical ownership and the opportunity to shape the evolution of large-scale storage systems.

Responsibilites

Operate, scale, and evolve large-scale Ceph clusters used as the core storage layer of a private cloud platform.
Lead Ceph upgrades, expansions, and lifecycle operations across multi-region environments with minimal impact to production workloads.
Design and manage multi-site and geo-replicated Ceph architectures to ensure high availability, durability, and disaster recovery readiness.
Develop and maintain automation tooling using Ansible, Python, and Go to standardize cluster provisioning, configuration, and operational workflows.
Implement efficient storage tiering strategies, including hot/cold layers, cache tiers, and erasure-coded pools based on performance and cost requirements.
Troubleshoot complex distributed storage issues, perform deep root-cause analyses, and drive long-term reliability improvements.
Build observability pipelines integrating Ceph metrics into Prometheus/Grafana, ELK/Opensearch, and create actionable, predictive alerting mechanisms.
Collaborate closely with OpenStack, networking, compute, and platform engineering teams to ensure seamless integration between Ceph and dependent services.
Operate and harden S3-compatible object storage services using RGW, including lifecycle management, S3 API compatibility, and integration with CDN or edge caching layers.
Contribute to storage orchestration tools, Kubernetes operators, and internal CI systems for continuous validation of storage functionality and performance.

Expected Qualifications

Strong ownership mentality and the ability to independently drive complex technical projects from design to production.
Deep understanding of Linux internals, distributed systems, and large-scale storage operations.
Structured and clear problem-solving skills, especially in high-pressure or incident scenarios.
Proficiency in automation, reproducible operations, and writing clean, maintainable code.
Strong communication skills, capable of writing documentation, RFCs, and mentoring engineers.
Curiosity, adaptability, and willingness to explore new technologies such as NVMe/TCP, operators, and large-scale observability stacks.
A pragmatic engineering mindset focused on reliability, simplicity, and measurable outcomes.
Collaborative attitude and the ability to work closely with cross-functional teams in networking, compute, platform, and cloud infrastructure domains.

What We Offer

- Hybrid working model with flexibility: a schedule that helps you find the right balance between flexibility and team bonding, including work-from-abroad opportunities and a summer working model.

- Customisable FlexBenefits budget: Adjust your daily meal allowance, choose your health insurance package (and extend it to your spouse or children), and pick from additional benefits like fuel support or Trendyol shopping credits.

- Well-being support: Access to location-based in-house doctors, as well as psychologist and dietitian support, and HPV vaccination provision.

- Personalised training allowance and learning opportunities: Use your annual budget for any training or conference of your choice, explore our Learning Management System (LMS) anytime, and join in-person learning sessions offered throughout the year.

- Responsibility from day one: Take full ownership from the start in a culture where every voice is heard and valued.

- A diverse, international team: Collaborate with global peers across our offices in Berlin, Amsterdam, Dubai, and beyond, in a startup-spirited and collaborative environment.

- Opportunities to grow with the best: Tackle meaningful challenges, develop through hands-on experience, and grow with the support of expert guidance and global mentoring.

- Meaningful connections beyond tasks: Be part of team rituals, events, and social activities that help us stay connected and inspired.

Take the Next Step

If this role excites you, apply today, we look forward to taking the next step with you.

Want to get to know the team better first? Explore our Career Website, LinkedIn, or YouTube to learn more about #LifeatTrendyol and how we work.

Top Skills

Ansible

Ceph

Elk

Grafana

Kubernetes

Opensearch

Openstack

Prometheus

Python

View all jobs at Trendyol Group

View Trendyol Group Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

10,653 Employees

Year Founded: 2010

What We Do

We were founded in 2010 with a dynamic and agile start-up spirit. Since then, we have grown into a decacorn, backed by Alibaba, General Atlantic, Softbank, Princeville Capital, and several sovereign wealth funds. We believe that technology is the driver; e-commerce is the outcome. Thanks to our dedicated team, we are one of the top five e-commerce companies in EMEA and one of the fastest-growing e-commerce companies in the world! We deliver more than 1.5 million packages every day across 27 countries. We offer our 30 million customers a flawless shopping experience. Dreaming big is in our DNA: We're gearing up to be the leading global e-commerce platform. As a dynamic and passionate company, we are constantly growing with Trendyol Tech, one of the top R&D centres; Trendyol Express, the fastest growing delivery network; Dolap, the largest second-hand goods platform; and Trendyol Go, our instant food and grocery delivery service. And we’re not done yet! Now, we are on a journey to expand the positive impact we create to international markets. We opened our first international office in Berlin in May 2022 and Amsterdam followed in October 2022 and may others are on the way.