Network Reliability Engineer

Posted 9 Days Ago
Be an Early Applicant
Hiring Remotely in Warsaw, Warszawa, Mazowieckie, POL
In-Office or Remote
Mid level
Information Technology • Consulting
The Role
Design, build, and operate large AI/HPC infrastructure with observability, incident troubleshooting, remediation, and lifecycle management. Participate in on-call rotation, collaborate with engineering teams, maintain documentation, and promote stability, scalability, and security best practices.
Summary Generated by Built In
 
#HPC #AI #GPU #CLUSTERS
 
YOUR DAILY ROUTINE
- Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents- Troubleshoot high-impact production issues in collaboration with other engineering teams
- Participate in an on-call rotation to handle incidents and ensure service continuity
- Implement and maintain observability solutions to monitor AI infrastructure and application health
- Contribute to AI infrastructure lifecycle management across different environments and countries
- Promote and apply best practices in terms of stability, resiliency, scalability, and security
- Maintain clear technical documentation for tools and procedures
- Contribute to system and tool evolution based on production feedback
- Collaborate closely with development teams to ensure infrastructure readiness- Participate in team rituals and knowledge-sharing initiatives
 
ABOUT YOU
 
🎯 SOFTSKILLS : 
- Proactive and solution-oriented mindset
- Passion for automation and continuous improvement
- Strong collaboration and communication skills
- Ability to work independently and in a team
- Willingness to mentor and share knowledge
 
💻 HARDSKILLS : 
- Experience with Go or Python 
- Strong scripting skills (Bash, Python)
- Hands-on experience with Linux systems (Ubuntu/Debian)
- Preferred hands-on experience with GPU & HPC infrastructure 
- Knowledge of networking (VLAN/LAN, TCP/IP, DNS, BGP, load-balancing, IPv6, etc.)
- Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic, etc.)
- Comfortable with Infrastructure-as-Code (Ansible, Salt, AWX, etc.)
- Experience managing relational databases (MariaDB)
- Understanding of CI/CD pipelines (GitLab)
- Comfortable with English (written and spoken)
 

Skills Required

  • Experience with Go or Python
  • Strong scripting skills (Bash, Python)
  • Hands-on experience with Linux systems (Ubuntu/Debian)
  • Hands-on experience with GPU & HPC infrastructure
  • Knowledge of networking (TCP/IP, DNS, BGP, load-balancing, IPv6)
  • Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic)
  • Comfortable with Infrastructure-as-Code (Ansible, Salt, AWX)
  • Experience managing relational databases (MariaDB)
  • Understanding of CI/CD pipelines (GitLab)
  • Participate in an on-call rotation to handle incidents
  • Maintain clear technical documentation for tools and procedures
  • Comfortable with English (written and spoken)
  • Proactive and solution-oriented mindset
  • Passion for automation and continuous improvement
  • Strong collaboration and communication skills
  • Willingness to mentor and share knowledge
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
London
358 Employees
Year Founded: 2005

What We Do

More than a simple IT consulting group, MARGO is an alchemy of talents with constantly enriched expertise, always encouraged to be dynamic and to flourish. For more than 17 years, we have been carrying out missions of the highest technical complexity, enabling the digital acceleration of our clients, while ensuring a perpetual intellectual and collective stimulation for all our talents. At MARGO it's Consultant First. By joining MARGO, your talent today will remain relevant tomorrow. We bring you into the group not only for what you are today but for the potential you wish to develop. At MARGO, you will be able to build your own path to excellence within our teams and our missions. Tailor-made to serve your ambition and to match your talent. Today, Margo has over 400 employees in 8 entities: ► MARGO Trading Systems: our full stack development offer in market finance ► MARGO Capital Markets: our business offer in market finance ► MARGO Analytics: our expertise in data science and data engineering ► A CAPELLA Consulting: our business offer dedicated to insurance ► CODE BUSTERS: our community of developers ► DELIVERED by MARGO: our 100% remote full stack development entity ► MARGO UK: expertise in complex coding and application support on the English market. ► MARGO Poland: expertise in complex coding in retail Banking on the Polish market

Similar Jobs

Dropbox Logo Dropbox

Software Engineer

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Remote
Poland
2500 Employees
333K-451K Annually

DuckDuckGo Logo DuckDuckGo

Director, User Insights

Information Technology
Remote
14 Locations
393 Employees
244K-244K Annually

Circle (circle.so) Logo Circle (circle.so)

Lead Product Designer

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
140K-170K Annually

Mondelēz International Logo Mondelēz International

Full-stack Engineer

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
5 Locations
90000 Employees
4K-4K Annually

Similar Companies Hiring

Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account