- Own the observability platform end-to-end—Prometheus, Grafana, distributed tracing (Open Telemetry)—and establish SLO/SLI frameworks and track SLAs across all system and applications.
- Lead major incident response as an incident commander; drive root-cause analysis and systemic remediation programs.
- Lead the evolution of CWAN's cloud infrastructure on AWS, establishing scalability, resilience, and security standards across all services.
- Serve as the primary owner of the Kubernetes (EKS) platform: design cluster topology, multi-tenancy models, autoscaling strategies, and upgrade lifecycle.
- Define and enforce the organization's Infrastructure-as-Code standards using Terraform and Ansible; drive adoption of GitOps workflows.
- Build and maintain CI/CD and automated deployment pipelines for all services and applications across environments.
- Evaluate and introduce emerging technologies (eBPF, WASM, service meshes) to improve platform capabilities.
- Partner with engineering leadership to embed reliability requirements into the SDLC; champion chaos engineering and resilience testing programs.
- Mentor and grow mid-level and junior SREs across global teams through code reviews, pairing, and structured knowledge sharing.
- Drive capacity planning, cost optimization, and FinOps practices across the AWS environment.
- Contribute to the engineering roadmap and help define the long-term reliability strategy for the CWAN platform.
Qualifications Required
- 7+ years of experience in Site Reliability Engineering, Platform Engineering, or related roles.
- Proven track record leading major incident response and driving post-incident systemic improvements.
- Strong experience building and operating Observability stacks at scale.
- Hands on experience with monitoring, logging and tracing tools like Grafana, Prometheus, Mimir, OpenSearch Dynatrace/Datadog, Victoria Metrics etc.
- Demonstrated ability to mentor engineers and influence technical direction across time zones.
- Deep expertise with Kubernetes in large-scale, multi-cluster production environments.
- Advanced proficiency with AWS (EKS, RDS/Aurora, ElastiCache, Direct Connect, IAM/SCP, Cost Explorer).
- Expert-level Infrastructure-as-Code skills with Terraform (modules, remote state, Atlantis or similar).
- Hands-on experience with CI/CD platforms: Jenkins, GitHub Actions and GitLab CI.
- Experience with GitOps workflows (ArgoCD, Rancher).
- Proficiency in at least one general-purpose programming language (Python, Go, Java) for building tooling and automation.
- Experience with security best practices in cloud environments (IAM least privilege, secrets management etc.).
Preferred
- Experience in financial services, FinTech, or other mission-critical, regulated environments.
- Hands-on experience with service mesh (Istio) and eBPF-based observability tools.
- Prior staff or principal engineer experience with cross-team technical influence in a global organization.
- AWS and Kubernetes certifications at the Professional level.
- Experience with multi-region active-active architectures and global load balancing.
Top Skills
What We Do
CWAN was founded on a simple belief: investment professionals deserve modern technology that actually works for them. Not legacy systems that slow them down. Not fragmented data that creates confusion. But one comprehensive platform that gives you complete visibility and crystal-clear insights. The result? Investment management that works as seamlessly as your investment strategy. Since our founding in 2004, CWAN has been the trusted technology partner powering the world’s leading institutional investors — from insurance companies, asset managers, and hedge funds to asset owners like corporations, endowments, and pension funds managing over $10 trillion in assets.
Why Work With Us
We continue to grow, fueled by a strong foundation, an ambitious vision, and a commitment to delivering exceptional value to our clients, partners, and team members around the world. What started as a bold idea in Boise, Idaho has rapidly transformed into a global presence. We’ve expanded our footprint significantly—now operating out of 24 offices
Gallery
Clearwater Analytics (CWAN) Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.


_1.jpg)





_1.jpg)


