Site Reliability Engineer II

Sorry, this job was removed at 03:46 p.m. (CST) on Wednesday, Nov 20, 2024
Be an Early Applicant
New Delhi, Delhi
In-Office
Cybersecurity
The Role

At SAFE Security, our vision is to be the Champions of a Safer Digital Future and the Catalysts of Change. We believe in empowering individuals and teams with the freedom and responsibility to align their goals, ensuring we all move forward together.


We operate with radical transparency, autonomy, and accountability—there’s no room for brilliant jerks. We embrace a culture-first approach, offering an unlimited vacation policy, a high-trust work environment, and a commitment to continuous learning. For us, Culture is Our Strategy—check out our Culture Memo to dive deeper into what makes SAFE unique.


Job Overview:


As a Site Reliability Engineer, you will be responsible for providing the platform for our mission-critical cloud platform, which must maintain constant uptime, scale seamlessly, and allow new services and features to flourish.


The successful candidate will be highly self-motivated with a passion for excellence, quality and detail. SRE will not only support operations but also work closely with the developers and architects within SAFE to aid in product design and assist with the implementation to improve stability, security, and scalability.

Core Responsibilities:

  • Operate, monitor, and triage all aspects of our production environments to achieve our SLA and SLOs as part of a 24x7 on-call team.
  • Troubleshoot complicated, cross-platform issues handling OS, Networking, and databases in a cloud-based SaaS environment, handle live production incidents, debug/troubleshoot application and infrastructure issues, and follow and implement SRE best practices.
  • Design, build, and implement innovative solutions for previous, present, and future issues.
  • Prepare alert handling procedures, runbooks, etc., for common tasks and Incidents.
  • Automate deployment and orchestration of services into the cloud environment as well as other routine processes.
  • Actively participate in capacity planning, scale testing, and disaster recovery exercises.
  • Interact with and support partner teams, including engineering, QA, and CSE, to improve system reliability.
  • Conduct thorough RCA (Root Cause Analysis) for all production incidents: Identify root causes, document findings, publish incident summaries, and develop preventative actions to mitigate future occurrences.
  • Contribute to Infra architecture and non-functional requirements, ensuring they fit into a cohesive vision aligned with the rest of the platform's Technology roadmap for the launch.
  • Propagate SRE culture across the organization by sharing industry best practices, standards, approaches, documentation, and code with other engineering teams.

Qualifications/ Essentials Skills/ Experience:

  • Demonstrable experience in managing and maintaining high availability services based on AWS cloud infrastructure (minimum 5+ years).
  • Demonstrable Experience in cloud environments AWS and container technology, Docker and Kubernetes.
  • Demonstrable experience in managing and monitoring large-scale queueing technologies such as RabbitMQ or Kafka.
  • Hands-on experience in provisioning Infrastructure as Code (IaC) using Terraform Enterprise/OpenTofu/CDK.
  • Experience in CI/CD pipelines using GitHub Actions and Jenkins.
  • Valid AWS Associate level or higher certification
  • Experience in AWS Networking (VPC, Network Firewall, NACLs, SGs, TGW, DirectConnect), Route 53, HAProxy, Fargate Firewalls.
  • Experience in programming/scripting in Python for at least 3+ years.
  • Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools - Grafana/Prometheus, DataDog, Splunk, New Relic, etc.
  • Experience with Operational tools such as PagerDuty, Jira Service Management / ZenDesk, etc.

If you’re passionate about cyber risk, thrive in a fast-paced environment, and want to be part of a team that’s redefining security—we want to hear from you! 🚀

Similar Jobs

Tufin Logo Tufin

Technical Account Manager

Security • Cybersecurity
Remote or Hybrid
India
500 Employees

Motorola Solutions Logo Motorola Solutions

Senior Machine Learning Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
India
23000 Employees

Motorola Solutions Logo Motorola Solutions

Strategic Territory Director India

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
India
23000 Employees

CrowdStrike Logo CrowdStrike

Global Alliances Manager

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
India
10000 Employees
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto, CA
403 Employees
Year Founded: 2012

What We Do

Safe Security is a pioneer in the “Cybersecurity and Digital Business Risk Quantification” (CRQ) space. It helps organizations measure and mitigate enterprise-wide cyber risk in real-time using it’s ML Enabled API-First SAFE Platform by aggregating automated signals across people, process and technology, both for 1st & 3rd Party to dynamically predict the breach likelihood (SAFE Score) & $$ Value at Risk of an organization

Headquartered in Palo Alto, Safe Security has over 200 customers worldwide including multiple Fortune 500 companies averaging an NPS of 73 in 2020.

Backed by John Chambers and senior executives from Softbank, Sequoia, PayPal, SAP, and McKinsey & Co., it was also one of the Top Contributors to the National Vulnerability Database(NVD) of the U.S. Government in 2019 and the ATT&CK MITRE Contributor in 2020.

The company, since 2018, has also been working with MIT in joint research for the development of their SAFE Scoring Algorithm. Safe Security has received several awards including the Morgan Stanley CTO Innovation Award.

Similar Companies Hiring

Rhymetec Thumbnail
Information Technology • Data Privacy • Cybersecurity • Consulting • Cloud
US
33 Employees
MacPaw Thumbnail
Software • Security • Information Technology • Data Privacy • Cybersecurity • App development
Cambridge, MA
550 Employees
Silverfort Thumbnail
Security • Sales • Information Technology • Cybersecurity • Automation
GB
507 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account