We are seeking a Sr Staff Site Reliability Engineer — on a long-term basis during USA hours— who brings deep software engineering roots alongside SRE expertise. This individual will help shape and scale the reliability of our global cloud platform, bringing the full-stack perspective of someone who has built and shipped software and now drives reliability from the inside out.
The RoleThis is a Senior Staff-level technical leadership role with organization-wide influence. You will define and drive reliability strategy across our multi-cloud infrastructure (AWS and GCP), establish architectural standards, and ensure our backend systems operate with exceptional availability, scalability, and resilience.
You will also collaborate with strategic partners and engineering teams to enable our organization as a cloud-integrated service, leading technical discussions and ensuring secure and reliable integrations.
This is a long-term position for someone who thrives at the intersection of software development and reliability engineering. The ideal candidate has hands-on development experience, understands the complete software delivery lifecycle, and brings an end-to-end systems perspective — from code commit to production operation.
What You’ll Do- Define and drive Organization’s SRE strategy across engineering teams.
- Establish reliability standards, architectural guardrails, and production readiness frameworks.
- Initiate, participate in, and review architectural changes — leveraging development experience to ensure reliability and operability are built in, not bolted on.
- Apply SDLC knowledge to reliability decisions — engage early in design and architecture reviews to embed reliability, testability, and operability as first-class requirements.
- Proactively identify system-wide gaps — continuously assess the platform for reliability blind spots, missing observability, or architectural debt, and drive initiatives to close them without waiting to be asked.
- Bridge development and SRE teams — translate between engineering intent and operational reality, serving as a technical liaison who can read code, review PRs, and contribute to service-level design decisions.
- Design and maintain highly available, multi-region, multi-cloud systems.
- Ensure platform reliability supporting millions of IoT devices globally.
- Guide engineering teams in building fault-tolerant, scalable microservices and monolithic systems.
- Define and enforce SLIs, SLOs, and error budgets.
- Lead architecture reviews and production readiness reviews.
- Partner with strategic teams to deliver our organization as a cloud-integrated service and support partner integrations.
- Improve and streamline production release processes.
- Implement safe deployment strategies (canary, blue/green, progressive delivery).
- Build CI/CD guardrails to reduce deployment risk and improve reliability.
- Develop and mature observability strategies across infrastructure and services.
- Lead high-severity incident response, facilitate blameless postmortems, and drive systemic improvements to prevent recurring issues.
- 10+ years of combined software engineering and SRE/infrastructure experience, with a clear progression from development into reliability or platform engineering.
- Deep understanding of the complete Software Development Lifecycle (SDLC) — enabling well-informed reliability and design decisions across all phases of software delivery.
- Strong software development background — with hands-on experience building and shipping production software — enabling effective design collaboration, code-level review, and reliability-driven architectural input.
- End-to-end system comprehension — ability to reason about the full stack from device/client behavior through API layer, backend services, data stores, and infrastructure, connecting the dots across teams and domains.
- Self-directed gap identification — demonstrated initiative in spotting reliability, scalability, or process gaps and driving improvements without needing explicit direction.
- Collaborative cross-team communication — proven ability to work across engineering, product, and operations teams; comfortable influencing without authority and presenting technical decisions to both technical and non-technical stakeholders.
- Proven experience operating large-scale distributed systems in production.
- Strong hands-on expertise with AWS and GCP cloud platforms.
- Deep experience with Kubernetes in production environments.
- Advanced knowledge of Terraform, including modular design and infrastructure governance.
- Strong understanding of distributed systems, networking, and system reliability principles.
- Experience supporting Java-based monolithic systems and microservices architectures.
- Proficiency in Python for automation and tooling.
- Experience with modern observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry, etc.).
- Strong debugging, incident response, and root cause analysis skills.
- Security knowledge in transport and identity — working knowledge of SSL/TLS certificate lifecycle management, mutual TLS (mTLS) for service-to-service authentication, cipher suite selection and hardening, and TLS version enforcement across microservices and infrastructure boundaries.
- Excellent written and verbal communication skills, with experience coordinating across distributed engineering teams, facilitating technical discussions, and driving alignment on reliability decisions.
Qualification-
This Position is only for IST Evening (3pm to midnight) OR IST night (10pm to 7am) flexible rotation shift
Bachelor’s degree in computer science or software engineering.
- 10+ years of combined software engineering and SRE/infrastructure experience, with a clear progression from development into reliability or platform engineering.
Skills Required
- 10+ years combined software engineering and SRE/infrastructure experience
- Bachelor's degree in computer science or software engineering
- Strong hands-on expertise with AWS and GCP
- Deep experience with Kubernetes in production
- Advanced knowledge of Terraform including modular design and governance
- Proficiency in Python for automation and tooling
- Experience supporting Java-based monolithic systems and microservices
- Experience with observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry)
- Strong understanding of distributed systems, networking, and reliability principles
- Experience designing multi-region, multi-cloud systems and operating large-scale distributed production systems
- Experience implementing safe deployment strategies (canary, blue/green, progressive delivery) and CI/CD guardrails
- Security knowledge of SSL/TLS lifecycle, mTLS, and related hardening
- Proven incident response, blameless postmortem, and root cause analysis experience
- Ability to work IST Evening (3pm-12am) or IST Night (10pm-7am) flexible rotation shift
- Excellent written and verbal communication; cross-team collaboration and technical influence
Arrow Electronics, Inc. Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Arrow Electronics, Inc. and has not been reviewed or approved by Arrow Electronics, Inc..
-
Healthcare Strength — Healthcare offerings are positioned as robust, with multiple medical plan options and access to telemedicine, EAP, and wellbeing programs. Income-banded premium support is described as helping keep the base plan more affordable for lower earners.
-
Leave & Time Off Breadth — Time-off programs include unlimited PTO for U.S. salaried employees alongside accrual-based vacation and sick programs for hourly staff. Paid parental leave is described as available with a defined fully paid period for new parents.
-
Parental & Family Support — Family-focused supports include subsidized back-up childcare and eldercare days and dependent-care spending options. These offerings add practical value beyond base pay, particularly for caregivers.
Arrow Electronics, Inc. Insights
What We Do
A Fortune 500 company, ranked #133 in 2024, with over 22,000 employees worldwide, Arrow guides innovation forward for over 220,000 leading technology manufacturers and service providers. With 2023 sales of $33 billion, Arrow develops technology solutions that improve business and daily life. Arrow.com is the easiest place for innovators to create, make and manage technology.
Why Work With Us
Arrow is much more than products and services. We are a team of many backgrounds in a global ecosystem, working toward one common goal: to help customers create a better tomorrow, where innovation improves the quality of life and the benefits of technology are more accessible to all. Join us in building a better tomorrow for many!
Gallery








