JFrog

Site Reliability Engineer

Reposted 9 Hours Ago

Be an Early Applicant

2 Locations

In-Office

Mid level

Software

The Role

Maintain and improve availability, performance, scalability, and operational excellence of large-scale, multi-cloud Kubernetes SaaS environments. Troubleshoot production incidents, build backend services and automation tools (Python/Go), strengthen observability, implement SRE practices (SLOs/SLIs, postmortems), participate in on-call rotations, support resilience and disaster recovery, and evaluate AI-assisted automation to improve operations.

Summary Generated by Built In

At JFrog, we’re reinventing DevOps to help the world’s greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit, and just all-around great people. If you’re willing to do more, your career can take off. And since software plays a central role in everyone’s lives, you’ll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust JFrog to manage, accelerate, and secure their software delivery from code to production -- a concept we call “liquid software.” Wouldn't it be amazing if you could join us on our journey?

We’re hiring an SRE to help improve the availability, performance, scalability, and operational excellence of our SaaS environments. You’ll work closely with Engineering and Cloud teams to automate operations, scale JFrog’s large-scale, multi-cloud, Kubernetes-based SaaS environments, strengthen observability, and improve incident response using modern SRE practices (SLOs/SLIs, error budgets, postmortems).
This role is hands-on, collaborative, and impact-focused. If you're eager to make a significant impact in a fast-paced, high-growth environment, we encourage you to apply.

As a Site Reliability Engineer at JFrog, you will…

Support the reliability, availability, performance, and scalability of JFrog’s large-scale, multi-cloud, Kubernetes-based SaaS environments
Investigate and troubleshoot production issues across distributed systems, infrastructure, Kubernetes, and cloud environments in close collaboration with Engineering teams
Design and develop backend services, internal platforms, and production engineering tools using Python, Go, or similar technologies
Improve reliability, observability, and operational readiness through SRE practices, monitoring and alerting, capacity awareness, postmortems, and safer CI/CD and production change processes
Evaluate and contribute to AI-assisted and agentic automation solutions that improve operational efficiency, troubleshooting, and production workflows
Support resilience initiatives, including disaster recovery validation, service readiness, health checks, and production readiness reviews
Participate in on-call rotations, lead incident response when needed, and drive follow-up actions to prevent recurrence
Continuously learn and evaluate new technologies that can improve reliability, automation, and operational excellence

To be a Site Reliability Engineer at JFrog, you need…

2-4 years of experience in SRE, Production Engineering, DevOps, or a similar role with hands-on production exposure
Strong troubleshooting and analytical skills, with the ability to investigate production issues in a structured and methodical way
Hands-on experience with Kubernetes-based containerized workloads
Experience with at least one public cloud provider: AWS, GCP, or Azure
Experience developing backend services, internal platforms, automation, or production engineering tools using Python, Go, or another programming language
Practical understanding of Linux fundamentals, networking concepts, HTTP, DNS, service connectivity, and production troubleshooting
Familiarity with CI/CD tools such as Jenkins, ArgoCD, GitHub Actions, or similar
Exposure to observability tools covering metrics, logs, and traces, such as Prometheus, Grafana, Coralogix, New Relic, or similar platforms
Understanding of incident management processes, alerting systems, and production support workflows
Ability to learn quickly, take ownership, communicate clearly, and work well in a collaborative production environment
Experience using AI-assisted operational workflows such as log analysis, incident summarization, triage support, or troubleshooting – an advantage
Familiarity with agentic automation frameworks such as LangGraph, LangChain, CrewAI, or similar – an advantage
Experience using AI-assisted development tools such as Cursor, Claude Code, GitHub Copilot, ChatGPT, or similar tools – an advantage

Skills Required

2-4 years of experience in SRE, Production Engineering, DevOps, or similar with hands-on production exposure
Strong troubleshooting and analytical skills; ability to investigate production issues methodically
Hands-on experience with Kubernetes-based containerized workloads
Experience with at least one public cloud provider: AWS, GCP, or Azure
Experience developing backend services, internal platforms, or production engineering tools using Python, Go, or similar
Practical understanding of Linux fundamentals, networking concepts, HTTP, DNS, and service connectivity
Familiarity with CI/CD tools such as Jenkins, ArgoCD, GitHub Actions, or similar
Exposure to observability tools for metrics, logs, and traces (Prometheus, Grafana, Coralogix, New Relic, or similar)
Understanding of incident management processes, alerting systems, and production support workflows
Ability to learn quickly, take ownership, communicate clearly, and work collaboratively in production environments
Experience using AI-assisted operational workflows (log analysis, incident summarization, triage support) - advantage
Familiarity with agentic automation frameworks (LangGraph, LangChain, CrewAI, or similar) - advantage
Experience using AI-assisted development tools (Cursor, Claude Code, GitHub Copilot, ChatGPT, or similar) - advantage

JFrog Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about JFrog and has not been reviewed or approved by JFrog.

Fair & Transparent Compensation — Pay is considered competitive overall, with many indicating they feel paid fairly relative to their roles. Compensation sentiment appears to have improved recently.
Equity Value & Accessibility — Equity grants and an employee stock purchase plan are commonly part of offers, adding meaningful value to total rewards. These components are highlighted alongside base pay as reasons packages are viewed favorably.
Healthcare Strength — U.S. medical, dental, and vision coverage are characterized as comprehensive and high quality. Employer-verified listings reinforce strong core health coverage.

Learn more about JFrog's Compensation & Benefits →

JFrog Insights

What's It Like to Work at JFrog? JFrog Culture & Values JFrog Career Growth & Development What's the Work-Life Balance Like at JFrog? JFrog Leadership & Management JFrog Company Growth, Stability & Outlook

View all jobs at JFrog

View JFrog Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Sunnyvale, California

1,603 Employees

Year Founded: 2008

What We Do

JFrog Ltd. (Nasdaq: FROG), is on a mission to create a world of software delivered without friction from developer to device. Driven by a “Liquid Software” vision, the JFrog Software Supply Chain Platform is a single system of record that powers organizations to build, manage, and distribute software quickly and securely, ensuring it is available, traceable, and tamper-proof. The integrated security features also help identify, protect, and remediate against threats and vulnerabilities. JFrog’s hybrid, universal, multi-cloud platform is available as both self-hosted and SaaS services across major cloud service providers. Millions of users and 7K+ customers worldwide, including a majority of the FORTUNE 100, depend on JFrog solutions to securely embrace digital transformation. Once you leap forward, you won’t go back!