Sr Linux Networking Engineer

Posted 4 Days Ago
Easy Apply
Be an Early Applicant
Hiring Remotely in USA
Remote
Senior level
Cloud • Digital Media • Information Technology
Generative media platform for developers.
The Role
Design and operate large-scale networks for GPU workloads, automate network infrastructure, ensure network performance, and troubleshoot issues.
Summary Generated by Built In

You are a seasoned networking engineer who has designed, deployed, and operated high-performance networks at scale. At fal, our platform orchestrates AI inference workloads across thousands of GPUs spread over multiple data centers and cloud providers. You will own the network layer that ties it all together—ensuring that model traffic, storage I/O, and control-plane communication are fast, reliable, and secure. You think in terms of packets per second, tail latency, and fabric utilization, and you automate everything you touch.


Key Responsibilities
  • Design, build, and operate the network fabric that interconnects our GPU fleet, including spine-leaf architectures, RDMA/RoCEv2 networks for distributed inference, and overlay networks for tenant isolation.
  • Own L2/L3 network design across bare-metal and cloud environments, including BGP peering, ECMP, VXLAN/EVPN, and high-bandwidth interconnects between data centers.
  • Develop and maintain network automation using Ansible, Terraform, and custom tooling to provision, configure, and validate switches, routers, DPUs, and SmartNICs at scale.
  • Instrument deep network observability—build dashboards, alerting, and anomaly detection across our fabric using Prometheus, Grafana, and packet-level telemetry.
  • Partner with the Compute and ML Performance teams to tune network paths for AI workloads, minimizing latency for model serving and maximizing throughput for large tensor transfers.
  • Drive incident response and root-cause analysis for network-related production issues and build automation to prevent recurrence.
  • Evaluate and qualify new networking hardware and software—NICs, switches, DPUs, SONiC, Cumulus, and similar—as we scale to next-generation GPU clusters.
Requirements
  • 8+ years of experience building and operating large-scale networks, ideally in GPU cloud, HPC, or hyperscale environments.
  • Deep expertise in Linux networking internals: kernel networking stack, iptables/nftables, tc, eBPF, network namespaces, bonding/teaming, and SR-IOV.
  • Strong command of routing and switching protocols: BGP, OSPF, ECMP, VXLAN, EVPN, MPLS, and segment routing.
  • Hands-on experience with high-performance networking for AI/ML: RDMA, RoCEv2, InfiniBand, GPUDirect, and NCCL tuning.
  • Proficiency automating network infrastructure with Ansible, Python, Go, and Git.
  • Experience with network-as-code workflows.
  • Familiarity with modern network operating systems such as SONiC, Cumulus Linux, Arista EOS, or Nokia SR Linux.
  • Experience with network observability stacks: Prometheus, Grafana, sFlow/NetFlow, and packet capture tools.
Nice to Have
  • Experience with DPU/SmartNIC programming (NVIDIA BlueField, AMD Pensando) and SDN/NFV architectures.
  • Contributions to open-source networking projects (SONiC, FRR, DPDK, eBPF/XDP).
  • Experience operating networks that support Kubernetes and container-native workloads (Calico, Cilium, MetalLB).
  • Familiarity with data center physical layer design, optics, and cabling at scale
What we offer at fal
  • Interesting and challenging work
  • Competitive salary and equity
  • A lot of learning and growth opportunities
  • We offer visa sponsorship and will help you relocate to San Francisco.
  • Health, dental, and vision insurance (US)
  • Regular team events and offsite
Location
  • Remote 

Top Skills

Ansible
Arista Eos
Cumulus Linux
Ebpf
Evpn
Git
Go
Gpudirect
Grafana
Infiniband
Linux
Mpls
Network Observability
Prometheus
Python
Rdma
Rocev2
Sonic
Terraform
Vxlan
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
73 Employees

What We Do

Generative Media Cloud

Similar Jobs

NVIDIA Logo NVIDIA

Software Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
In-Office or Remote
Santa Clara, CA, USA
21960 Employees
224K-426K Annually

AcuityMD Logo AcuityMD

Data Engineer

Healthtech • Software
Easy Apply
In-Office or Remote
2 Locations
213 Employees
175K-200K Annually
Remote
16 Locations
25 Employees
160K-200K Annually

Sleuth Logo Sleuth

Staff Software Engineer

Artificial Intelligence • Software • Biotech • Pharmaceutical
Remote or Hybrid
2 Locations
11 Employees
180K-205K Annually

Similar Companies Hiring

Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account