Senior Software Engineer in Hardware Infrastructure Observability

Reposted 2 Days Ago
Be an Early Applicant
Amsterdam, NLD
In-Office
Senior level
Artificial Intelligence • Information Technology • Consulting
The Role
Design and develop services for monitoring servers, improve metrics pipelines, automate maintenance workflows, and investigate incidents to ensure infrastructure reliability.
Summary Generated by Built In

Why work at Nebius
Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside some of the most experienced and innovative leaders and engineers in the field.

Where we work
Headquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with R&D hubs across Europe, North America, and Israel. The team of over 1400 employees includes more than 400 highly skilled engineers with deep expertise across hardware and software engineering, as well as an in-house AI R&D team.

The Role

Nebius is looking for a Senior Software Engineer to join the Hardware Infrastructure Observability team. You're welcome to work from our office in Amsterdam. We build and run low-level monitoring for servers and data center engineering systems to ensure reliability at scale. We also design and operate maintenance and remediation systems that enable safe, predictable fleet-wide changes and keep the infrastructure healthy.

Key Responsibilities:

  • Design and develop services and agents that provide deep visibility into a large server fleet and DC engineering systems
  • Evolve our metrics/aggregation/alerting pipelines and improve signals quality
  • Build maintenance workflows and automation that keep fleets healthy
  • Investigate incidents hands-on (including on-host debugging) and drive root-cause fixes
  • Collaborate with hardware, networking, and DC operations to improve reliability

We expect you to have:

  • 5+ years of professional software engineering experience
  • Excellent knowledge of Python and Golang or you are ready to quickly switch to these programming languages
  • Strong Linux fundamentals
  • Ability to write reliable code and and dig into complex problems
  • Working proficiency in English

It will be an added bonus if you have: 

  • Solid understanding of modern server architecture, and its components
  • Experience with metrics/monitoring/alerting Prometheus-compatible stacks (like VictoriaMetrics)
  • Good knowledge of computer networks
  • Experience designing, developing, and running high-load distributed systems

We conduct coding interviews as part of the process.



What we offer 

  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional growth within Nebius.
  • Flexible working arrangements.
  • A dynamic and collaborative work environment that values initiative and innovation.

We’re growing and expanding our products every day. If you’re up to the challenge and are excited about AI and ML as much as we are, join us!

Top Skills

Go
Linux
Prometheus
Python
Victoriametrics
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
473 Employees

What We Do

Cloud platform specifically designed to train AI models

Similar Jobs

FareHarbor Logo FareHarbor

Software Engineer

Sales • Software • Travel
Easy Apply
Hybrid
Amsterdam, NLD
960 Employees

Datadog Logo Datadog

Commercial Account Executive

Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Easy Apply
Hybrid
Amsterdam, NLD
6500 Employees

Airwallex Logo Airwallex

Engineering Lead, Payments Platform

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Remote or Hybrid
Amsterdam, NLD
2000 Employees

Cloudflare Logo Cloudflare

Account Executive

Cloud • Information Technology • Security • Software • Cybersecurity
Hybrid
4 Locations
4400 Employees

Similar Companies Hiring

GC AI Thumbnail
Legal Tech • Artificial Intelligence
San Francisco, California
46 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account