Lead SRE, DevOps Group (Copy)

Posted Yesterday
Be an Early Applicant
Tel Aviv, ISR
Hybrid
Senior level
Software
We organize and mobilize the world’s ITOps and DevOps data.
The Role
Lead SRE responsible for ensuring platform reliability, defining and tracking SLAs/SLOs/SLIs, owning production reliability (on-call, incident response, post-mortems), driving error-budget-driven priorities, embedding reliability across the stack, and leveraging automation and AI to improve scalability and operations.
Summary Generated by Built In
Location requirements:
This role requires working out of the Tel Aviv office three days per week.

About the Role:
As a SRE Lead at BigPanda, you will play a critical role in ensuring the reliability, scalability, and performance of the platform that powers our customers’ operations. You’ll operate at the intersection of software engineering and production operations, taking full ownership of the systems you build and run.
This role is not just about responding to incidents — it’s about fundamentally improving how our platform behaves under real-world conditions. You will drive reliability initiatives end-to-end: defining measurable service goals, shaping engineering priorities through error budgets, and implementing solutions that prevent issues before they occur.
You’ll work closely with teams across the organization, embedding reliability and observability into every layer of the stack. At the same time, you’ll leverage automation, modern infrastructure practices, and emerging AI capabilities to continuously evolve how we operate and scale.

What you will do:
Develop deep product knowledge across our platform - Understanding its internals, failure modes, and operational behavior well enough to own incident resolution end-to-end.
Define and track SLAs/SLOs/SLIs across critical platform services, and use error budgets to drive engineering decisions.
Own production reliability - including on-call rotations, incident response, and post-mortems - with a focus on minimizing MTTR and preventing recurrence through systemic fixes, not just firefighting.
Work hand-in-hand with engineering teams across the stack - infrastructure, application, and business layers - to embed reliability requirements everywhere.


What skills and experience you’ll bring to BigPanda:
  • 5+ years of experience as an SRE (or similar role) in a high-scale production environment, with hands-on ownership across the full stack - infrastructure and application layers.
  • Business-level reliability experience is a strong advantage.
  • Experience in designing, building, and operating cloud-native systems on AWS.
  • Hands-on experience with maintaining Node.JS or JVM-based applications running with the following: MongoDB, ElasticSearch, Kafka.
  • Strong coding skills and a software engineering mindset - you build your own tools rather than waiting for someone else to.
  • Experience with infrastructure-as-code and modern container orchestration platforms.
  • Practical experience building or integrating AI-driven solutions (e.g., LLMs, agents, or AI-powered operational tooling).
  • A true owner - you take responsibility for systems end-to-end and proactively drive improvements without waiting for direction.
  • A problem solver who practices adaptability and flexibility to business needs.


About Us:
BigPanda is a fast-growing, values-driven, global company that enables Tech Ops teams to keep the digital economy running. BigPanda’s AI-driven IT operations (aka AIOps) platform transforms IT data into insight and action. By eliminating IT noise, automating incident management, and keeping our customers’ digital services up and running around the clock, we become a mission-critical part of our customers’ IT operations.
With BigPanda, some of the world’s largest enterprises including Hulu, Cisco, United, Abbott, Marriott, Expedia and many others are able to reduce costs and increase efficiencies, accelerate business velocity, and deliver extraordinary customer experiences.
BigPanda is backed by top-tier investors including Sequoia, Mayfield, Battery, Insight Partners, Advent International, and Greenfield Partners. 
We have an awesome team of motivated, knowledgeable, fun-loving, and friendly Pandas. We provide comprehensive health coverage, parental leave, competitive cash and equity compensation, and a supportive, collaborative, and innovative environment to empower you to do the best work of your career. 

Our Benefits: 
  • Competitive equity
  • Hybrid work schedule
  • Company funded health insurance
  • 6 weeks fully paid Parental Leave
  • Critical Family Medical Leave
  • Financial planning services
  • Employee learning & development budget
  • Values-based recognition (quarterly and annually)
  • Social community & ERG programs
  • FreeFit gym package
  • Work-life harmony
  • Dog friendly office

Skills Required

  • 5+ years of experience as an SRE or similar role in a high-scale production environment
  • Work from the Tel Aviv office three days per week
  • Experience designing, building, and operating cloud-native systems on AWS
  • Hands-on experience maintaining Node.JS or JVM-based applications
  • Experience with MongoDB, Elasticsearch, and Kafka
  • Strong coding skills and a software engineering mindset (build your own tools)
  • Experience with infrastructure-as-code
  • Experience with modern container orchestration platforms (e.g., Kubernetes)
  • Practical experience building or integrating AI-driven solutions (LLMs, agents, AI operational tooling)
  • Business-level reliability experience
  • Ownership mindset and strong problem-solving/adaptability
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Redwood City, CA
330 Employees
Year Founded: 2012

What We Do

BigPanda is the only Event Correlation and Automation platform built for domain-agnostic AIOps. We transform how IT teams prevent outages and resolve incidents by turning data into insights and action. Without BigPanda, IT Ops and DevOps teams struggle with manual and reactive incident response capabilities that are badly suited for the scale, complexity and velocity of modern IT environments. This results in painful outages, unhappy customers, growing IT headcount and the inability to focus on innovation. Fortune 500 enterprises such as Intel, Cisco, United, Nike, Marriott and Expedia rely on BigPanda to prevent outages, reduce costs, and give their teams time back for digital transformation. BigPanda helps organizations take a giant step towards Autonomous IT Operations by turning IT noise into insights and manual tasks into automated actions. BigPanda is backed by top-tier investors including Sequoia Capital, Mayfield, Battery Ventures, Greenfield Partners and Insight Partners. Visit www.bigpanda.io for more information.

Gallery

Gallery

Similar Jobs

Taboola Logo Taboola

Office Coordinator

AdTech • Big Data • Digital Media • Marketing Tech
Hybrid
Tel Aviv, ISR
1900 Employees

Silverfort Logo Silverfort

Fp&a Analyst

Information Technology • Sales • Security • Cybersecurity • Automation
Hybrid
Tel Aviv, ISR
507 Employees

Datadog Logo Datadog

Software Engineer

Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Easy Apply
Hybrid
Tel Aviv, ISR
6500 Employees

Airwallex Logo Airwallex

Operations Manager

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office
Tel Aviv, ISR
2200 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account