Senior Site Reliability Engineer

Reposted 7 Days Ago
Easy Apply
Be an Early Applicant
Chennai, Tamil Nadu
In-Office
Senior level
Greentech • Energy
We believe that homeowners and renters are long overdue for a new approach to energy — one that puts them first.
The Role
The Senior Site Reliability Engineer will manage AWS and Kubernetes environments, automate processes, enhance platform reliability, and lead observability efforts while collaborating with various teams.
Summary Generated by Built In
Senior Site Reliability EngineerWho we areArcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held back by outdated systems and institutions by creating unprecedented access to the data and clean energy needed to make a decarbonized energy grid possible.In 2014, Arcadia set out on its mission to break the fossil fuel monopoly and since then we have been knocking down the institutional barriers to unlock decarbonization. To date, we have connected hundreds of thousands of consumers and small businesses with high-quality clean energy options. Fast forward to today, and now, we’re thinking even bigger. We have launched Arcadia Platform, an industry-defining SaaS platform that empowers developers and energy innovators to deliver their own custom, personalized energy experiences, accelerating the transformation of the industry from an analog energy system into a digitized information network.Tackling one of the world’s biggest challenges requires out-of-the-box thinking & diverse perspectives. We’re building a team of individuals from different backgrounds, industries, & educational experiences. If you share our passion for ushering in the era of the clean electron, we look forward to learning what you would uniquely bring to Arcadia! Visit www.arcadia.com.HQ: Greenwood Village, ColoradoWhat we're looking for:

We are seeking an experienced Senior Site Reliability Engineer (L3) to join our SRE/Platform Engineering team in India. This role will focus on building, scaling, and maintaining our AWS- and Kubernetes-based platform, ensuring high reliability, cost efficiency, and secure operations across multiple environments. The successful candidate will work closely with Engineering, Security, DevOps, and Product teams to drive automation, improve infrastructure resilience, and elevate observability across mission-critical systems.

The ideal candidate is a self-starter and hands-on engineer who can dive deep into complex distributed systems, automate away manual processes, and proactively identify reliability gaps. They should have a proven track record of managing production-grade AWS infrastructure, Kubernetes clusters, CI/CD pipelines, and cloud security. They will collaborate daily with US-based engineering teams and cross-functional partners to ensure our platform remains scalable, secure, and cost-optimized as we continue to grow.

What you'll do:
  • Design, build, and maintain AWS infrastructure (EKS, VPC, RDS, IAM, CloudWatch, CloudTrail, GuardDuty, Load Balancers, S3, CloudFront) using Terraform and CloudFormation
  • Lead all aspects of Kubernetes operations including cluster upgrades, performance tuning, CNI troubleshooting, workload scaling, Helm chart packaging, and GitOps deployments
  • Own and evolve our CI/CD ecosystem across Jenkins (Groovy scripting), GitHub Actions, AWS CodePipeline, ArgoCD, and FluxCD
  • Improve platform reliability by reducing operational toil through automation, scripting (Python/Bash), and proactive system hardening
  • Implement and enhance observability across Prometheus, Grafana, Loki, Tempo, Datadog, and CloudWatch—ensuring actionable alerting, dashboards, and metrics alignment with SLO/SLIs
  • Drive FinOps initiatives, identifying cost inefficiencies and working with engineering teams to implement best practices, tagging standards, budgeting, and resource right-sizing
  • Manage database operations across MySQL and PostgreSQL including backups, performance tuning, replication, and operational runbooks
  • Maintain and improve secret management using Vault, AWS Secrets Manager, and Parameter Store
  • Strengthen cloud security posture with IAM least privilege, CSPM reviews, audit readiness, GuardDuty/CloudTrail monitoring, and environment hardening
  • Troubleshoot complex production issues across networking, Kubernetes, compute, databases, and CI/CD systems
  • Collaborate daily with US-based teams for incident reviews, migrations, roadmap work, and platform enhancements
  • Contribute to development and adoption of AI-enabled tooling (e.g., automation, debugging assistants, MCP, RAG pipelines—good to have, not mandatory)
  • Document runbooks, architecture diagrams, SOPs, troubleshooting guides, and operational best practices
  • Participate in on-call rotations (if required) and drive post-incident analysis and long-term fixes

What will help you succeed:

Must-haves:
  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
  • 8–10+ years of experience in SRE/DevOps/Cloud Engineering, with deep hands-on exposure to AWS and Kubernetes
  • Strong hands-on experience with:
    • Terraform & Infrastructure as Code
    • AWS core services (EKS, IAM, RDS, EC2, VPC, CloudWatch, CloudTrail, GuardDuty)
    • Jenkins + Groovy, GitHub Actions, ArgoCD, FluxCD
    • Kubernetes troubleshooting and operations
    • Prometheus/Grafana/Datadog observability stacks
  • Proven ability to operate in high-scale, high-uptime, multi-environment production systems
  • Experience building automation via Python/Bash and reducing operational toil
  • Strong understanding of incident management, root cause analysis, and reliability engineering principles
  • Experience working with globally distributed teams across multiple time zones
  • Excellent communication skills (must interact with US teams daily)
  • Ability to work independently with minimal supervision, take ownership, and drive initiatives end-to-end
  • A growth mindset, strong troubleshooting ability, and comfort with complex cloud-native environments
Nice to have (Good-to-haves):
  • Experience with n8n self-hosted, workflow automation platforms
  • Exposure to LLMs, RAG, vector DBs, MCP concepts
  • Experience with cloud security/DevSecOps tools (Trivy, Inspector, OPA, Kyverno)
  • Hands-on experience with FinOps platforms and cloud cost governance
  • Certifications in related field ( AWS , Kubernetes , Terraform ..etc)

Benefits
  • Competitive compensation and employee stock options
  • Hybrid/remote-first working model (India-based role, with global collaboration)
  • Flexible leave policy
  • Comprehensive medical insurance (self + family members)
  • Annual performance cycle + quarterly recognition awards
  • A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation

Eliminating carbon footprints, eliminating carbon copies.

Here at Arcadia, we cultivate diversity, celebrate individuality, and believe unique perspectives are key to our collective success in creating a clean energy future. Arcadia is committed to equal employment opportunities regardless of race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, protected veteran status, or any status protected by applicable federal, state, or local law. While we are currently unable to consider candidates who will require visa sponsorship, we welcome applications from all qualified candidates eligible to work in India

Thank you

Top Skills

Argocd
AWS
Bash
CloudFormation
Datadog
Fluxcd
Github Actions
Grafana
Groovy
Jenkins
Kubernetes
Prometheus
Python
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Washington, DC
200 Employees
Year Founded: 2014

What We Do

Arcadia's clean energy tech platform gives everyone easy access to clean energy. Arcadia members have access to a mobile-optimized online dashboard where they can manage their account, track their account activity, and view their energy usage all in one place.

Arcadia's digital transformation of the traditional energy utility gives individuals greater control over what energy they support, how much it costs, and how they pay.

Founded in 2014, the company’s platform now integrates with more than 100 utilities in all 50 states and is used by more than 350,000 people.

Similar Jobs

CrowdStrike Logo CrowdStrike

Engineering Manager

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
18 Locations
10000 Employees

CrowdStrike Logo CrowdStrike

Senior Software Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
16 Locations
10000 Employees

Sutherland Logo Sutherland

Associate - Research

Artificial Intelligence • Analytics
In-Office or Remote
2 Locations
39547 Employees

Sutherland Logo Sutherland

Network Engineer

Artificial Intelligence • Analytics
In-Office or Remote
2 Locations
39547 Employees

Similar Companies Hiring

Runwise Thumbnail
Software • Real Estate • PropTech • Hardware • Greentech • Energy
New York, NY
199 Employees
Halter Thumbnail
Software • Machine Learning • Internet of Things • Hardware • Greentech • Business Intelligence • Agriculture
Boulder, Colorado
200 Employees
Energy CX Thumbnail
Utilities • Professional Services • Greentech • Financial Services • Energy • Consulting • Business Intelligence
Chicago, IL
108 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account