Site Reliability Engineer II - Operational Readiness (Scale & Performance) (Hybrid)

Posted 7 Days Ago
Be an Early Applicant
Bengaluru, Karnataka
3-5 Years Experience
Cloud • Information Technology • Security • Software
The Role
As a Site Reliability Engineer II, you will enhance the scalability and reliability of HashiCorp's cloud products by implementing system reliability best practices, conducting load testing, and collaborating with engineering teams to ensure high performance and availability. You will also provide troubleshooting expertise and develop automated tools for incident management and testing.
Summary Generated by Built In

The Role

As a Site Reliability Engineer for the Operational Readiness team, you will play a critical role in enhancing the scalability, performance, and reliability of HashiCorp's cloud products. With at least 3 years of experience in site reliability engineering or a related field, you will lead efforts to identify, address, and mitigate operational challenges before they impact our customers. Your expertise in load testing, performance analysis, and system hardening will ensure that our services meet the highest standards of operational excellence.

You will play a pivotal role in enhancing our operational resilience and maintaining the reliability of our enterprise and cloud-based products. With a focus on overall Quality you will be at the forefront of ensuring high availability and performance across HashiCorp’s offerings.

You will provide expert execution of the test plans, defining system wide strategies for product load and performance testing. You will be working on a wide variety of tools and exploring new avenues to ensure all the products meet the essential Operational readiness criteria. 

Utilize top-notch troubleshooting techniques like simulating the system with Chaos to identify, organize, and advocate for novel solutions to remediate customer impact on complex interconnected systems. 


Key Responsibilities

  • Implement best practices for system reliability, including proactive identification of potential failure points and the development of automated mitigations.
  • Design and execute comprehensive load testing strategies to identify performance bottlenecks and scalability limits across our cloud products.
  • Implement best practices and technologies to improve system resilience, ensuring high availability and fault tolerance through Chaos testing framework.
  • Work closely with engineering and product teams to integrate operational readiness into the development lifecycle, enhancing product stability and user satisfaction.
  • Build and refine tools and frameworks for automated testing, environment simulation, and incident reproduction, reducing manual effort and increasing test coverage.
  • Conduct in-depth analysis of testing results, documenting findings and making actionable recommendations for systemic improvements
  • Develop and implement disaster recovery and backup strategies to ensure data integrity and system resilience.
  • Share your knowledge and expertise with team members, fostering a culture of learning and continuous improvement.


Ideal Candidate

  • 3+ years of experience in SRE , systems engineering, or non functional testing roles with a focus on performance testing, or system scalability.
  • Having commitment to explore career opportunity in Site Reliability Engineering field
  • Proficient in any programming language or scripting language.
  • Good understanding of CI/CD process and maintaining quality pipelines
  • Experience with version control systems (e.g., Git) and agile project management methodologies.
  • Exposure to cloud technologies ( AWS, Azure, Or GCP) and container technologies like Nomad or Kubernetes.
  • Effective communication and collaboration skills, capable of working with cross-functional teams and articulating technical concepts to diverse audiences.
  • Experience with infrastructure as code (Terraform, CloudFormation) is a plus.
  • Understanding of monitoring and alerting systems is a plus
  • Chaos testing experience is a plus
  • Exposure to disaster recovery domain is a plus #LI-Hybrid

Top Skills

Go
Java
Python
The Company
HQ: San Francisco, CA
1,200 Employees
Hybrid Workplace
Year Founded: 2012

What We Do

HashiCorp was founded by Mitchell Hashimoto and Armon Dadgar in 2012 with the goal of revolutionizing datacenter management: application development, delivery, and maintenance. The datacenter of today is very different than the datacenter of yesterday, and we think the datacenter of tomorrow is just around the corner.

Jobs at Similar Companies

bet365 Logo bet365

Junior Sports Analyst

Digital Media • Gaming • Software • eSports • Automation
Denver, CO, USA
6100 Employees
55K-80K Annually

Silverfort Logo Silverfort

Sales Operations Analyst

Information Technology • Sales • Security • Cybersecurity • Automation
Remote
United States
357 Employees

Jobba Trade Technologies, Inc. Logo Jobba Trade Technologies, Inc.

Customer Success Specialist

Cloud • Information Technology • Productivity • Professional Services • Software
Hybrid
Chicago, IL, USA
45 Employees

Similar Companies Hiring

Silverfort Thumbnail
Security • Sales • Information Technology • Cybersecurity • Automation
GB
357 Employees
bet365 Thumbnail
Software • Gaming • eSports • Digital Media • Automation
Denver, Colorado
6100 Employees
Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account