Site Reliability Engineer III (Data Platform)

Posted 2 Days Ago
Be an Early Applicant
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
In-Office
Mid level
Cloud • Information Technology • Insurance • Software • Analytics
Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently.
The Role
The Site Reliability Engineer III will enhance reliability and scalability of the data platform services, manage big data tools, support on-call responses, contribute to automation, and optimize cloud infrastructure on AWS.
Summary Generated by Built In

Summary

The Team and the Opportunity
You will join the PDO Site Reliability Engineering (Data Platform) team that looks after the reliability and operability of Guidewire’s data platform services, including large‐scale data processing, analytics, and streaming capabilities that support our AI and Insight products.
The team partners closely with product engineering, data platform, and security to design for reliability, build automation, and run services in production.
As a Site Reliability Engineer – III (Data Platform), you will be a hands-on contributor helping to run and evolve our big data stack on AWS (and potentially other public clouds/SaaS products), using software engineering to address infrastructure and application reliability challenges. You’ll collaborate with more senior SREs to improve incident response, harden critical data paths, and enable scalable, cost‐efficient operations, directly supporting PDO’s focus on operational excellence and AI/cloud/data platform adoption.

Job Description

 

What You Will Do 

  • Collaborate with senior SREs to enhance incident response, reinforce critical data paths,  and facilitate scalable, cost-efficient operations in support of PDO’s objectives for  operational excellence and AI/cloud/data platform adoption. 

  • Support the maintenance and enhancement of production environment for data  platform services to ensure high availability and performance for mission-critical  workloads. 

  • Manage and optimize big data and streaming platforms like Kafka, Hadoop, Spark, and  Hive on AWS, focusing on configuration, tuning, and daily operations. 

  • Assist in defining SLOs, error budgets, capacity plans, and scaling strategies for data  platform components to support new AI and data products. 

  • Participate in on-call rotations to maintain high availability of services, leveraging tools like PagerDuty to triage alerts and provide technical responses to incidents impacting data and analytics platforms.

  • Resolve production issues through cross-team collaboration and adherence to  established incident management practices. 

  • Contribute to blameless post-incident reviews to develop reliability improvements,  runbooks, and automation. 

  • Build and improve automation and tooling using Go, Python, or scripting to standardize  deployment and troubleshooting for data services.  

  • Support CI/CD pipelines using TeamCity or Github Actions to enable safe and frequent  service deployments. 

  • Utilize Infrastructure as Code, including Terraform and AWS CloudFormation, to  maintain repeatable cloud infrastructure. 

  • Operate Kubernetes-based environments on AWS EKS, managing the lifecycle and  scaling of containerized data services. 

  • Implement progressive delivery strategies, such as blue/green and canary deployments,  to minimize release risks. 

What You Need to Succeed 

Experience and Education 

  • 4–6 years of relevant industry experience in Site Reliability Engineering, DevOps,  Production Engineering, or similar roles supporting large‑scale distributed systems  and/or data platforms. 

  • BS/MS in Computer Science, Computer Engineering, Mathematics, or a related  technical field, or equivalent practical experience.

Technical Skills 

  • Experience deploying and operating services on AWS or Azure, including on-call  support. 

  • Expertise with data and streaming platforms like Kafka, Hadoop, Spark, or Hive.

  • Proficiency in Go, Python, or Bash for automation; Java/Spring Boot knowledge is  beneficial. 

  • Skill in building tools and utilities using REST APIs or gRPC. 

  • Proficiency with CI/CD tools such as TeamCity, GitHub Actions, or Jenkins.

  • Experience with Infrastructure as Code (Terraform or CloudFormation). Familiarity with  Kubevela/Crossplane is a plus 

  • Working knowledge of Kubernetes (EKS) and Docker for deployment and resource  management. 

  • Familiarity with AWS services like RDS, EMR, Redshift, MSK, and ECS. 

  • Experience with observability and logging tools like Datadog or ELK.

  • Understanding of distributed systems, networking, storage, and operating systems.

  • Familiarity with agile methodologies like Scrum and Kanban. 

  • Ability to solve infrastructure problems using software engineering and participate in  incident response. 

  • Effective collaboration skills aligned with business priorities like AI and cloud adoption. 

Bonus Points 

  • Kubernetes/AWS certifications 

  • Contributions to open source projects

 

#LI-AA1

About Guidewire

Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540+ insurers in 40 countries, from new ventures to the largest and most complex in the world, run on Guidewire.

As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record with 1600+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our Marketplace provides hundreds of applications that accelerate integration, localization, and innovation.

For more information, please visit www.guidewire.com and follow us on Twitter: @Guidewire_PandC.

Guidewire Software, Inc. is proud to be an equal opportunity and affirmative action employer. We are committed to an inclusive workplace, and believe that a diversity of perspectives, abilities, and cultures is a key to our success. Qualified applicants will receive consideration without regard to race, color, ancestry, religion, sex, national origin, citizenship, marital status, age, sexual orientation, gender identity, gender expression, veteran status, or disability. All offers are contingent upon passing a criminal history and other background checks where it's applicable to the position.

Skills Required

  • 4-6 years of experience in Site Reliability Engineering, DevOps, Production Engineering, or similar roles
  • BS/MS in Computer Science, Computer Engineering, Mathematics, or related field
  • Experience deploying services on AWS or Azure
  • Expertise with data and streaming platforms (Kafka, Hadoop, Spark, Hive)
  • Proficiency in Go, Python, or Bash for automation
  • Experience with CI/CD tools (TeamCity, GitHub Actions, Jenkins)
  • Working knowledge of Kubernetes (EKS) and Docker
  • Experience with Infrastructure as Code (Terraform or CloudFormation)

Guidewire Software Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Guidewire Software and has not been reviewed or approved by Guidewire Software.

  • Flexible Benefits Flexible work options and distinctive global mobility programs enable remote/hybrid arrangements and short-term or longer-term cross-border work opportunities. Feedback suggests these options are a meaningful differentiator for those valuing location flexibility.
  • Leave & Time Off Breadth Unlimited PTO in the U.S., dedicated volunteer time, and a personal 'My Day' accompany generous parental leave. These elements indicate a broad time-off offering that supports rest, community engagement, and family needs.
  • Equity Value & Accessibility Equity grants (RSUs) and an employee stock purchase plan are positioned as significant parts of total compensation. Stock-based components can enhance overall pay, with value influenced by market conditions.

Guidewire Software Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Mateo, CA
3,400 Employees
Year Founded: 2001

What We Do

Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. ​We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540 insurers, from new ventures to the largest and most complex in the world, run on Guidewire. As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record, with 1,000+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our marketplace provides hundreds of applications that accelerate integration, localization, and innovation.

Why Work With Us

We're focused on each and every employees' personal and professional development, and offer internal career mobility programs and growth opportunities that make Guidewire unique. Other perks like generous PTO, flexible working, our Guidewire Gives Back charitabeland our "Work From Almost Anywhere" program support our employees' work-life balance

Gallery

Gallery

Similar Jobs

Airwallex Logo Airwallex

Analyst, Transaction Monitoring

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
2200 Employees
3-5 Annually

Airwallex Logo Airwallex

Financial Crime Operation Senior Analyst

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office or Remote
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
2200 Employees

Capco Logo Capco

Tester (Backend) – Insurance

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Hybrid
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
6000 Employees

Capco Logo Capco

Tester (Frontend) – Insurance

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Hybrid
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
6000 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account