Senior Infrastructure & TechOps Engineer

Posted Yesterday
Be an Early Applicant
Tel Aviv, ISR
Hybrid
Senior level
Software
Battle-tested tech for modern warfare
The Role
As a Senior Production Infrastructure Engineer at Kela, you'll manage production systems, build monitoring and incident response systems, automate processes, and ensure system reliability while participating in an on-call rotation.
Summary Generated by Built In
Description

As a Senior Production Infrastructure Engineer at Kela, you'll own the operational backbone that keeps our mission-critical production systems running across every production site we deploy. You'll sit inside the TechOps group on the Infrastructure team, working alongside the Professional Services and Support teams to make sure that what we build inhouse actually performs - reliably, observably, and at scale - in the field.

This role is about more than keeping the lights on. You'll design the monitoring, incident response, change management, and automation that let the rest of the organization move fast without breaking production. You'll build the systems that detect problems before customers do, the workflows that resolve them when they happen anyway, and the infrastructure that lets us push firmware, configuration, and software changes to field devices safely and at scale.

What You'll Do

  • Own monitoring and observability across the production fleet. Operate the Prometheus, Grafana, and centralized logging stack so problems surface early and the right signal reaches the right person.
  • Build the incident response system - workflows, runbooks, and tooling that let TechOps resolve issues fast across infrastructure, network, and application layers - and run the post-incident process that turns each event into a permanent fix.
  • Drive change management for production: firmware updates to field devices, software releases, configuration changes. Track every rollout, measure outcomes, and make the process safer and faster over time.
  • Automate aggressively. Infrastructure provisioning, auto-healing, deployment verification, tests, health checks. If it's manual today and shouldn't be, you fix it.
  • Define and track reliability maturity. Choose the signals that matter at our stage - SLOs, error budgets, MTTR, change failure rate, deployment frequency - instrument them, and use them to focus the team's investments.
  • Build deployment pipelines that bridge our hardware and software stack, in close work with the firmware and platform teams.
  • Integrate systems and data flows across the platform so we get the full value of what we already collect.
  • Close the gaps between how systems are designed, how they actually behave, and what the team needs to operate them.

Participate in the TechOps on-call rotation.

Requirements

What We're Looking For

  • 4+ years of hands-on experience in production infrastructure, Ops, SRE, or DevOps, supporting distributed systems where downtime has real consequences.
  • Experience with Linux - confident using the command line and diagnosing issues in production environments.
  • Hands on experience working with kubernetes
  • Hands-on experience with Prometheus, Grafana, and centralized logging.
  • Infrastructure-as-code with Terraform or Ansible.
  • Solid networking - VPN, routing, firewalls. Understanding the bits and bites of  complex networking architectures.
  • Proven experience designing and running incident response processes in production.
  • Ability to learn new technologies fast and work across unfamiliar layers of the stack.

Who You Are

  • Technically curious. You open the box, read the source, go find the next thing to learn.
  • Creative. You find the angle nobody else saw and the solution that's both simple and right.
  • A team player. You make the people around you better and you share what you know.
  • In flow with the work. You hold complexity, switch contexts, and keep moving without losing the thread.

Quick to learn from mistakes - yours and other people's. Every incident is material for the next iteration.

Skills Required

  • 4+ years of hands-on experience in production infrastructure, Ops, SRE, or DevOps
  • Experience with Linux
  • Hands-on experience working with Kubernetes
  • Hands-on experience with Prometheus, Grafana, and centralized logging
  • Infrastructure-as-code with Terraform or Ansible
  • Solid networking knowledge (VPN, routing, firewalls)
  • Experience designing and running incident response processes in production
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
97 Employees

What We Do

Battle-tested tech for modern warfare

Similar Jobs

Airwallex Logo Airwallex

Implementation Manager

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Remote or Hybrid
Tel Aviv, ISR
2200 Employees

HiBob Logo HiBob

Team Lead

HR Tech • Information Technology • Professional Services • Sales • Software
Remote or Hybrid
Israel
1350 Employees

Airwallex Logo Airwallex

Support Analyst, Financial Operations

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office
Tel Aviv, ISR
2200 Employees

Airwallex Logo Airwallex

Sales Development Representative

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office
Tel Aviv, ISR
2200 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account