Responsibilites
- Operate, scale, and evolve large-scale Ceph clusters used as the core storage layer of a private cloud platform.
- Lead Ceph upgrades, expansions, and lifecycle operations across multi-region environments with minimal impact to production workloads.
- Design and manage multi-site and geo-replicated Ceph architectures to ensure high availability, durability, and disaster recovery readiness.
- Develop and maintain automation tooling using Ansible, Python, and Go to standardize cluster provisioning, configuration, and operational workflows.
- Implement efficient storage tiering strategies, including hot/cold layers, cache tiers, and erasure-coded pools based on performance and cost requirements.
- Troubleshoot complex distributed storage issues, perform deep root-cause analyses, and drive long-term reliability improvements.
- Build observability pipelines integrating Ceph metrics into Prometheus/Grafana, ELK/Opensearch, and create actionable, predictive alerting mechanisms.
- Collaborate closely with OpenStack, networking, compute, and platform engineering teams to ensure seamless integration between Ceph and dependent services.
- Operate and harden S3-compatible object storage services using RGW, including lifecycle management, S3 API compatibility, and integration with CDN or edge caching layers.
- Contribute to storage orchestration tools, Kubernetes operators, and internal CI systems for continuous validation of storage functionality and performance.
Expected Qualifications
- Strong ownership mentality and the ability to independently drive complex technical projects from design to production.
- Deep understanding of Linux internals, distributed systems, and large-scale storage operations.
- Structured and clear problem-solving skills, especially in high-pressure or incident scenarios.
- Proficiency in automation, reproducible operations, and writing clean, maintainable code.
- Strong communication skills, capable of writing documentation, RFCs, and mentoring engineers.
- Curiosity, adaptability, and willingness to explore new technologies such as NVMe/TCP, operators, and large-scale observability stacks.
- A pragmatic engineering mindset focused on reliability, simplicity, and measurable outcomes.
- Collaborative attitude and the ability to work closely with cross-functional teams in networking, compute, platform, and cloud infrastructure domains.
Top Skills
What We Do
We were founded in 2010 with a dynamic and agile start-up spirit. Since then, we have grown into a decacorn, backed by Alibaba, General Atlantic, Softbank, Princeville Capital, and several sovereign wealth funds. We believe that technology is the driver; e-commerce is the outcome. Thanks to our dedicated team, we are one of the top five e-commerce companies in EMEA and one of the fastest-growing e-commerce companies in the world! We deliver more than 1.5 million packages every day across 27 countries. We offer our 30 million customers a flawless shopping experience. Dreaming big is in our DNA: We're gearing up to be the leading global e-commerce platform. As a dynamic and passionate company, we are constantly growing with Trendyol Tech, one of the top R&D centres; Trendyol Express, the fastest growing delivery network; Dolap, the largest second-hand goods platform; and Trendyol Go, our instant food and grocery delivery service. And we’re not done yet! Now, we are on a journey to expand the positive impact we create to international markets. We opened our first international office in Berlin in May 2022 and Amsterdam followed in October 2022 and may others are on the way.






