Senior Site Reliability Engineer

Posted 16 Days Ago
Be an Early Applicant
2 Locations
In-Office
Senior level
Software
The Role
Drive operational maturity for Kubernetes workloads on GKE, improve CI/CD pipelines, troubleshoot production issues, and standardize infrastructure resources.
Summary Generated by Built In

Orion Innovation is a premier, award-winning, global business and technology services firm.  Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity.  We work with a wide range of clients across many industries including financial services, professional services, telecommunications and media, consumer products, automotive, industrial automation, professional sports and entertainment, life sciences, ecommerce, and education.

Job Overview:

Drive reliability and operational maturity for Kubernetes workloads on GKE through safe rollout patterns, high-signal observability, resilient IaC, and effective incident response. Collaborate with developers to harden CI/CD pipelines and address infrastructure concerns within application code.

Key responsibilities:

  • Design and maintain resilient deployment patterns (blue-green, canary, GitOps syncs) across services.
  • Instrument and optimize logs, metrics, traces, and alerts to reduce noise and improve signal.
  • Review backend code (e.g., Django, Node.js, Go, Java) with a focus on infra touchpoints like database usage, timeouts, error handling, and memory consumption.
  • Tune and troubleshoot GKE workloads, HPA configs, network policies, and node pool strategies.
  • Improve or author Terraform modules for infrastructure resources (e.g., VPC, CloudSQL, Secrets, Pub/Sub).
  • Diagnose production issues from logs, traces, dashboards, and lead or support incident response.
  • Reduce config drift across environments and standardize secrets, naming, and resource tagging.
  • Collaborate with developers to harden delivery pipelines, standardize rollout readiness, and clean up infra smells in code.

Key skills:

  • Have 4–6+ years of experience in backend or infra-focused engineering roles (e.g., SRE, platform, DevOps, or fullstack).
  • Can confidently write or review production-grade code and infra-as-code (Terraform, Helm, GitHub Actions, etc.).
  • Have deep hands-on experience with Kubernetes in production, ideally on GKE, including workload autoscaling and ingress strategies.
  • Understand cloud concepts like IAM, VPCs, secret storage, workload identity, and CloudSQL performance characteristics.
  • Think in systems: you understand cascading failure, timeout boundaries, dependency health, and blast radius.
  • Regularly contribute to incident mitigation or long-term fixes (not just closing alerts).
  • Can influence through well-written PRs, documentation, and thoughtful design reviews.

Good to have:

  • Exposure to GitOps tooling such as ArgoCD or FluxCD.
  • Experience developing or integrating Kubernetes operators.
  • Familiarity with service-level indicators (SLIs), service-level objectives (SLOs), and structured alerting.

Tools and Expectations:

  • Datadog - Monitor infrastructure health, capture service-level metrics, reduce alert fatigue through high signal thresholds.
  • PagerDuty - Own incident management pipeline. Route alerts by severity and align with business SLAs.
  • GKE / Kubernetes - Improve cluster stability and workload isolation. Define auto-scaling configurations and tune for efficiency.
  • Helm / GitOps (ArgoCD/Flux) - Validate release consistency across clusters. Monitor sync status and rollout safety.
  • Terraform Cloud - Support DR planning and detect infrastructure changes through state comparisons.
  • CloudSQL / Cloudflare - Diagnose DB and networking issues. Monitor latency, enforce access patterns, and validate WAF usage.
  • Secret Management - Audit access to secrets, apply short-lived credentials, and define alerts for abnormal usage.

Orion is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, citizenship status, disability status, genetic information, protected veteran status, or any other characteristic protected by law.

Candidate Privacy Policy

Orion Systems Integrators, LLC and its subsidiaries and its affiliates (collectively, “Orion,” “we” or “us”) are committed to protecting your privacy. This Candidate Privacy Policy (orioninc.com) (“Notice”) explains:

  • What information we collect during our application and recruitment process and why we collect it;
  • How we handle that information; and
  • How to access and update that information.

Your use of Orion services is governed by any applicable terms in this notice and our general Privacy Policy.


Top Skills

Cloudsql
Datadog
Github Actions
Gke
Helm
Kubernetes
Pagerduty
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Edison, NJ
3,410 Employees
Year Founded: 1993

What We Do

Orion is a leading digital transformation and product development services firm. Headquartered in Edison, NJ, we have a global team of 6,200+ associates, with engineers in 14 major delivery centers across North America, Europe, Asia Pacific and Latin America.

For over 25 years, Orion has been solving complex business problems for our clients. Our transformative business solutions are rooted in digital strategy, experience design, and engineering, empowering our clients to operate with agility at scale.​

Our mission is to serve as an agile and trusted partner for business transformation initiatives, providing deep emerging technology, experience design, and domain expertise.​

Our business has more than tripled over the last three years. ​

We have grown aggressively both organically and inorganically, adding new clients, complementary skills, domain expertise, and strengthening our global footprint.

Similar Jobs

MetLife Logo MetLife

Senior Site Reliability Engineer

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Remote or Hybrid
India
43000 Employees

Vertafore Logo Vertafore

Senior Site Reliability Engineer

Information Technology • Insurance • Software
Hybrid
Hyderabad, Telangana, IND
2372 Employees

Vertafore Logo Vertafore

Site Reliability Engineer

Information Technology • Insurance • Software
Hybrid
Hyderabad, Telangana, IND
2372 Employees

Sutherland Logo Sutherland

Senior Site Reliability Engineer

Artificial Intelligence • Analytics
In-Office
Hyderabad, Telangana, IND
39547 Employees

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account