Carbon3.ai

Network Engineer

Posted 2 Days Ago

Be an Early Applicant

Hiring Remotely in Office, Machaze, Manica, MOZ

Remote

Mid level

Artificial Intelligence • Information Technology

The Role

The Network Engineer will design, maintain, and improve high-performance network infrastructures for AI and data center operations, focusing on reliability, scalability, and automation.

Summary Generated by Built In

Era4 develops, owns and operates AI infrastructure across the UK, powered by renewable energy. Converting legacy industrial and energy sites into modern data center facilities, Era4 is combining brownfield regeneration opportunities with cleaner, efficient, scalable compute capacity for healthcare, research, finance, enterprise, and public-sector organisations

Role Summary:

We are seeking a Network Engineer to design, operate, and continuously improve the high-performance networking that underpins our AI infrastructure platform. This role focuses on data center interconnect, InfiniBand and high-speed Ethernet fabrics, IPv6-based architectures, SRv6-enabled service routing, automation, reliability, and scalable service delivery.

You will play a key role in delivering high-scale, software-driven networking that enables GPU clusters, Kubernetes platforms, and AI workloads to operate efficiently and securely across multiple sites.

The Position:

You will be responsible for the reliability, scalability, and performance of the network infrastructure supporting our AI platform. This includes data center Ethernet fabrics, InfiniBand-based HPC networks, storage networks, and high-speed interconnects between compute clusters and services.

The platform is designed using modern networking principles including IPv6-first architectures and segment routing (SRv6) to enable scalable service delivery, advanced traffic engineering, and flexible multi-site connectivity.

This is a hands-on infrastructure engineering role focused on network automation, platform reliability, and large-scale infrastructure delivery. You will work closely with platform engineering, Kubernetes, and AI infrastructure teams to ensure the network enables reliable as-a-service consumption of GPU and compute resources.

Key Responsibilities:

Data Center Network Architecture & Operations:

Design, deploy, and operate high-performance data center networks supporting GPU compute clusters and AI workloads.
Manage Ethernet-based data center fabrics and switching infrastructure across large-scale compute environments.
Operate and maintain InfiniBand fabrics used for high-performance GPU training clusters.
Implement IPv6-based networking architectures across compute, storage, and service layers.
Implement high-availability and resilient network architectures for critical platform services.
Support high-throughput, low-latency networking required for distributed AI training workloads.
Manage lifecycle operations including deployment, upgrades, expansion, and maintenance.

High Performance Compute & InfiniBand Networking:

Deploy and operate InfiniBand-based cluster networking for AI and HPC workloads.
Configure and manage InfiniBand fabrics, subnet managers, and performance tuning.
Support RDMA-based communication between GPUs and compute nodes.
Optimize network performance for large-scale distributed training workloads.
Troubleshoot performance across GPU, storage, and network interconnect layers.

Inter-Data Center & WAN Connectivity:

Design and operate high-capacity connectivity between multiple data center sites.
Implement IPv6-enabled WAN architectures and routing frameworks.
Deploy Segment Routing over IPv6 (SRv6) for traffic engineering, service chaining, and scalable inter-site connectivity.
Manage routing protocols and traffic engineering to ensure resilient inter-site networking.
Implement redundant and multi-path architectures to support high availability.
Optimize network performance for large-scale data movement, storage replication, and distributed AI workloads.

Network Automation & Infrastructure as Code:

Implement automation-first network operations using modern tooling.
Manage network configuration through Infrastructure-as-Code and version-controlled workflows.
Develop automation for network provisioning and validation into CI/CD pipelines.
Build automated processes for configuration management, validation, and lifecycle management.
Collaborate closely with Kubernetes and platform engineering teams to support cloud-native infrastructure.

Reliability, Monitoring & Performance:

Design and operate network observability and telemetry frameworks.
Implement monitoring, alerting, and capacity planning across the network stack.
Diagnose and resolve performance bottlenecks across compute, storage, and networking layers.
Participate in incident response and root cause analysis for infrastructure events.

Security & Governance:

Implement secure network segmentation and traffic policies across the infrastructure.
Support multi-tenant isolation and platform security controls.
Ensure networking architecture supports compliance, data protection, and operational governance.

Essential Experience:

Expertise operating large scale data center networking environments.
Experience with LAN, switching, routing, and modern data center network architectures.
knowledge implementing IPv6 networking in production environments.
Experience with high-performance networking environments supporting compute clusters.
Hands-on experience with network automation and Infrastructure as Code.
Understanding of Linux systems and automation tooling.
Familiarity with distributed systems and cloud-native infrastructure.
Experience implementing monitoring and observability solutions.
Proven experience troubleshooting skills across network, system, and infrastructure layers.
Proven experience with incident management and root cause analysis.

One or more would be an advantage:

Knowledge operating InfiniBand fabrics in HPC or AI environments.
Experience with RDMA, RoCE, or high-speed cluster networking.
Exposure with Segment Routing (SRv6) architectures.
Experience with modern data center fabrics (leaf-spine, VXLAN, EVPN).
Knowledge of AI or GPU infrastructure platforms.
Experience integrating networking with Kubernetes environments.
Background in DevOps, NetDevOps, or Site Reliability Engineering.
Familiarity with service management frameworks (ITIL).

Why Join Era4:

You’ll be joining a mission-driven start-up building critical national infrastructure, where operational excellence directly enables growth. This role offers high visibility with leadership, real autonomy, and the chance to shape how a next-generation company operates at scale.

Diversity & Inclusion:

Era4 is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Top Skills

Ethernet

Infiniband

Infrastructure As Code

Ipv6

Kubernetes

Linux

Network Automation

Srv6

View all jobs at Carbon3.ai

View Carbon3.ai Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Rugby

16 Employees

What We Do

Carbon3.ai is building the UK’s sovereign AI platform – secure, sustainable, and designed for real-world impact. AI growth demands are creating new challenges and compute power requirements are outpacing supply. At Carbon3.ai, we’re not just providing infrastructure, we’re building the foundations to overcome these challenges. We are an energy business transforming into the UK’s sovereign choice for AI. Vertically integrated from soil to software transforming legacy industrial sites into renewable powered AI data hubs. Designed, owned, and operated by Carbon3.ai, all infrastructure and data processing are located within the UK and fully subject to UK jurisdiction and regulatory oversight. We generate our own off-grid renewable power, providing low-cost, sustainable energy comparable to Nordic levels, making AI workloads both affordable and sustainable. We own 50+ sites across the UK and are rapidly scaling them into AI data centres, enabling high-density, low-latency, sovereign AI deployment at national scale. Whether you're training models, deploying intelligent agents, or building industry-specific solutions, Carbon3.ai accelerates your journey from concept to production. Backed by strategic partnerships with leading brands and robust investment, we’re building the infrastructure to power the UK’s most ambitious AI innovation – ensuring British enterprises can access world-class AI capabilities securely and sustainably.