What you'll do
- 📈 Build & run observability for gateways, towers, and backend/edge services (metrics, logs, tracing, alerts; strong signal / low noise).
- 🤖 Automate ops: golden configs, zero-touch provisioning, safe canaries/rollbacks, scheduled maintenance, and self-healing where sensible.
- 🚨 Lead incidents end-to-end (runbooks, comms, mitigation, post-mortems) and drive fixes into code, configs, and process.
- 🚀 Harden deploys: progressive rollouts for firmware/agent/service changes across thousands of devices and multi-region backends.
- ⚙️ Performance tuning: reduce command/telemetry latency, smooth OTA pipelines, and de-risk noisy/unreliable links with back-pressure & retries.
- 🧭 Capacity & readiness: plan headroom for spikes and growth; chaos engineering for failover paths (cellular ↔ satellite, region failover).
- 📘 Own runbooks & SOPs that enable field teams and on-call to respond quickly and consistently.
- 🤝 Partner with Network/RF engineers on coverage/capacity changes, interference hunts, and carrier/satellite escalations.
- 🧭 Mentor teammates on SRE mindset, tools, and operational excellence.
Who we're looking for:
- 🧰 SRE/large-scale ops experience (cloud + distributed systems).
- 💻 Strong automation & scripting (Python/Go/etc.) and IaC (Terraform/Ansible/etc.).
- 📡 Solid networking fundamentals (TCP/IP, routing, VPNs, firewalls) + RF awareness (LoRa/LTE/sat a plus).
- 🔭 Hands-on with observability stacks (Prometheus, Grafana, ELK, OpenTelemetry).
- 🧯 Proven incident management for high-availability systems.
- 🧩 Performance tuning for latency-sensitive, unreliable-link environments.
- 🐧 Comfortable in Linux across cloud and edge devices.
- 📊 Data-driven: able to turn noisy telemetry into decisions (SQL or Jupyter a plus).
- 🧠 Pragmatic problem-solver who balances reliability, speed, and cost.
- ➕ Bonus: IoT/off-grid/field deployments experience. 🏕️
- Network awareness (baseline, not deep-dive) 📡You don’t need to be a routing/RF guru — we have those. You should be comfortable with:
- 🌐 Basic L3 troubleshooting: ping/traceroute, IP/subnetting, DNS/DHCP/NAT basics, reading simple routes.
- 📶 Reading link health: interpreting RSSI/SNR (LoRa) or RSRP/SINR (LTE) at a high level; spotting “link looks bad vs service is bad.”
- 🛰️ Backhaul pragmatics: understanding failover states (cellular ↔ satellite), cost/perf trade-offs, and safe config rollout patterns.
- 🗺️ Topology literacy: knowing what a gateway/tower/backhaul path looks like and where to put probes and alerts.
Top Skills
What We Do
We bridge deep tech into farming. Halter enables farmers to remotely shift, virtually fence and proactively monitor their cows’ health and behaviour. Can you imagine watching 500 cows or cattle walk calmly towards the milking shed or their next break? No quad bikes, no dogs, no fences. Just a mob of cows walking at their own pace. People say it looks like magic. Our customers are revolutionising farming with Halter. It's changing lives and transforming an industry.
Why Work With Us
There's something special about being surrounded by spectacular people making real change in the world. At Halter, you'll do your best work, with the best people and have the biggest impact in a culture grounded in performance.
Gallery
Halter Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
Our field teams are often on the road however, we believe that in-person interaction is key to building a high-performing culture so there will be times when we would like our team to meet in person.


.png)