Why work at Nebius
Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside some of the most experienced and innovative leaders and engineers in the field.
Where we work
Headquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with R&D hubs across Europe, North America, and Israel. The team of over 800 employees includes more than 400 highly skilled engineers with deep expertise across hardware and software engineering, as well as an in-house AI R&D team.
Nebius is looking for a Senior Software Engineer to join the Hardware Infrastructure Observability team. You're welcome to work from our office in Amsterdam. We build and run low-level monitoring for servers and data center engineering systems to ensure reliability at scale. We also design and operate maintenance and remediation systems that enable safe, predictable fleet-wide changes and keep the infrastructure healthy.
Key Responsibilities:
- Design and develop services and agents that provide deep visibility into a large server fleet and DC engineering systems
- Evolve our metrics/aggregation/alerting pipelines and improve signals quality
- Build maintenance workflows and automation that keep fleets healthy
- Investigate incidents hands-on (including on-host debugging) and drive root-cause fixes
- Collaborate with hardware, networking, and DC operations to improve reliability
We expect you to have:
- 5+ years of professional software engineering experience
- Excellent knowledge of Python and Golang or you are ready to quickly switch to these programming languages
- Strong Linux fundamentals
- Ability to write reliable code and and dig into complex problems
- Working proficiency in English
It will be an added bonus if you have:
- Solid understanding of modern server architecture, and its components
- Experience with metrics/monitoring/alerting Prometheus-compatible stacks (like VictoriaMetrics)
- Good knowledge of computer networks
- Experience designing, developing, and running high-load distributed systems
We conduct coding interviews as part of the process.
What we offer
- Competitive salary and comprehensive benefits package.
- Opportunities for professional growth within Nebius.
- Flexible working arrangements.
- A dynamic and collaborative work environment that values initiative and innovation.
We’re growing and expanding our products every day. If you’re up to the challenge and are excited about AI and ML as much as we are, join us!
Top Skills
What We Do
Cloud platform specifically designed to train AI models








