Senior Site Reliability Engineer (Data Platform)

Posted 2 Days Ago
Be an Early Applicant
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
In-Office
Senior level
Cloud • Information Technology • Insurance • Software • Analytics
Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently.
The Role
The Senior Site Reliability Engineer will enhance data platform reliability on AWS, automating processes and collaborating on incident response and architecture improvements for data services.
Summary Generated by Built In

Summary

The Team and the Opportunity
You will join the PDO Site Reliability Engineering (Data Platform) team that owns the reliability and operability of Guidewire’s data platform services, including large‑scale data processing, analytics, and streaming capabilities that underpin our AI and Insight products.
This team partners closely with product engineering, data platform, and security to design for reliability, build automation, and run services in production.
As a Senior Site Reliability Engineer (Data Platform), you will be a technical leader in running and evolving our big data stack on AWS (and potentially other public clouds), using software engineering to solve infrastructure and application reliability challenges. You’ll help advance
PDO’s priorities by improving incident response, hardening critical data paths, and enabling scalable, cost‑efficient operations for our customers’ most important workloads.

Job Description

  

 What You Will Do  

  • Design and implement self‑service automation and tooling (in Go, Python, or scripting languages) to standardize and streamline deployment, operations, and troubleshooting  for data platform services. 

  • Implement and improve CI/CD pipelines (e.g., TeamCity, Github Actions) to support  safe, frequent deployments, including gate promotion and automated quality checks. 

  • Use Infrastructure as Code (e.g., Terraform, AWS CloudFormation) to build, harden, and  maintain repeatable cloud infrastructure for data and analytics workloads.

  • Operate and improve Kubernetes‑based environments (AWS EKS), including  deployment, scaling, and lifecycle management of containerized data services (e.g.,  Docker‑packaged microservices, streaming jobs). 

  • Apply progressive delivery strategies such as blue/green and canary deployments, and  support chaos engineering experiments to validate resilience and recovery mechanisms.

  • Collaborate on capacity planning and cost‑aware design for cloud resources across  compute, storage, and networking layers for data‑intensive systems. 

  • Build and refine end‑to‑end observability for the data platform using monitoring and  logging tools (e.g., Datadog, ELK), including metrics, traces, and logs. 

  • Develop meaningful dashboards and alerts to provide clear visibility into data pipeline  health, customer experience, and platform performance. 

  • Analyze operational data to identify reliability risks and bottlenecks, feeding insights into  the roadmap and reliability backlogs. 

  • Partner with product engineering, data platform, security, and other SRE teams to define  and implement improvements in service architecture and operational practices that  support PDO’s AI, cloud, and data platform priorities. 

  • Advocate for reliability, resilience, and operational excellence in design reviews,  readiness assessments, and release planning. 

  • Contribute to a positive, inclusive work environment based on accountability, continuous  learning, and psychological safety, consistent with Guidewire’s culture of determination,  collaboration, continuous improvement, and bravery. 

What You Need to Succeed 

Experience and Education 

  • 8+ years of relevant industry experience in Site Reliability Engineering, DevOps,  Production Engineering, or similar roles supporting large‑scale distributed systems and  data platforms.

  • BS/MS in Computer Science, Computer Engineering, Mathematics, or equivalent  practical experience. 

Technical Skills 

  • Strong experience with continuous deployment and operation of cloud services on public  cloud (AWS), including production support and on‑call. 

  • Hands-on experience running data platforms using big data and streaming technologies  such as Kafka, Hadoop, Spark, and Hive on the public cloud. 

  • Proficiency in at least one of Java, Go, or Python, and solid skills with scripting  languages to build tools, automation, and integrations. 

  • Experience building and operating microservices, including REST APIs and/or gRPC  services. 

  • Solid experience with CI/CD tools (e.g., TeamCity, Github Actions) for automated builds,  tests, and deployments, including promotion gates. 

  • Strong experience with Infrastructure as Code tools such as Terraform and AWS  CloudFormation for provisioning and managing cloud infrastructure. Familiarity with  Kubevela/Crossplane is a plus 

  • Practical knowledge of Kubernetes (e.g., AWS EKS) and Docker, including deployment  patterns, service discovery, and resource management. 

  • Familiarity with AWS services relevant to data and distributed systems, such as RDS,  EMR, Redshift, MSK (Managed Streaming for Kafka), ECS, SNS, and SQS. ● Expertise with monitoring, logging, and observability tools (e.g., Datadog, ELK) to  instrument services and build actionable alerts and dashboards. 

  • Deep understanding of distributed systems fundamentals, networking, storage, operating  systems, and how they interact in complex multi‑tier environments. 

  • Knowledge of capacity planning, scalability, and resilience patterns (including blue/green  and canary deployments, and chaos engineering concepts). 

Operational and ProblemSolving Skills 

  • Demonstrated experience solving infrastructure and application problems using software  engineering approaches rather than only manual operations. 

  • Familiarity with agile methodologies like Scrum and Kanban. 

  • Strong analytical and troubleshooting skills for complex, distributed, multi‑service  environments. 

  • Experience with on‑call, incident response (e.g., PagerDuty), and post‑incident review  processes, with a bias for learning and continuous improvement. 

Ways of Working 

  • Ability to collaborate effectively with other engineering, data, and operations teams to  understand their systems and help improve them.

  • A big‑picture perspective on systems, tools, and customer value, aligning technical  decisions with PDO’s priorities around operational excellence, AI, cloud, and data  platform adoption. 

  • Comfort with agile development methodologies and iterative delivery in a highly  collaborative environment. 

  • Eagerness to learn, experiment, and grow—staying current with emerging technologies  across cloud, data, and SRE practices, and applying them thoughtfully where they add  real value. 

Bonus Points 

  • Kubernetes/AWS certifications 

  • Contributions to open source projects  

#LI-AA1

About Guidewire

Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540+ insurers in 40 countries, from new ventures to the largest and most complex in the world, run on Guidewire.

As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record with 1600+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our Marketplace provides hundreds of applications that accelerate integration, localization, and innovation.

For more information, please visit www.guidewire.com and follow us on Twitter: @Guidewire_PandC.

Guidewire Software, Inc. is proud to be an equal opportunity and affirmative action employer. We are committed to an inclusive workplace, and believe that a diversity of perspectives, abilities, and cultures is a key to our success. Qualified applicants will receive consideration without regard to race, color, ancestry, religion, sex, national origin, citizenship, marital status, age, sexual orientation, gender identity, gender expression, veteran status, or disability. All offers are contingent upon passing a criminal history and other background checks where it's applicable to the position.

Skills Required

  • 8+ years of experience in Site Reliability Engineering, DevOps, Production Engineering, or similar roles
  • Experience with big data technologies such as Kafka, Hadoop, Spark, and Hive
  • Strong experience with cloud services on AWS
  • Proficiency in Go, Python, or Java
  • Experience with CI/CD tools and Infrastructure as Code

Guidewire Software Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Guidewire Software and has not been reviewed or approved by Guidewire Software.

  • Flexible Benefits Flexible work options and distinctive global mobility programs enable remote/hybrid arrangements and short-term or longer-term cross-border work opportunities. Feedback suggests these options are a meaningful differentiator for those valuing location flexibility.
  • Leave & Time Off Breadth Unlimited PTO in the U.S., dedicated volunteer time, and a personal 'My Day' accompany generous parental leave. These elements indicate a broad time-off offering that supports rest, community engagement, and family needs.
  • Equity Value & Accessibility Equity grants (RSUs) and an employee stock purchase plan are positioned as significant parts of total compensation. Stock-based components can enhance overall pay, with value influenced by market conditions.

Guidewire Software Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Mateo, CA
3,400 Employees
Year Founded: 2001

What We Do

Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. ​We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540 insurers, from new ventures to the largest and most complex in the world, run on Guidewire. As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record, with 1,000+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our marketplace provides hundreds of applications that accelerate integration, localization, and innovation.

Why Work With Us

We're focused on each and every employees' personal and professional development, and offer internal career mobility programs and growth opportunities that make Guidewire unique. Other perks like generous PTO, flexible working, our Guidewire Gives Back charitabeland our "Work From Almost Anywhere" program support our employees' work-life balance

Gallery

Gallery

Similar Jobs

Airwallex Logo Airwallex

Analyst, Transaction Monitoring

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
2200 Employees
3-5 Annually

Airwallex Logo Airwallex

Financial Crime Operation Senior Analyst

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office or Remote
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
2200 Employees

Capco Logo Capco

Tester (Backend) – Insurance

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Hybrid
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
6000 Employees

Capco Logo Capco

Tester (Frontend) – Insurance

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Hybrid
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, MYS
6000 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account