Site Reliability Engineer

Reposted 3 Days Ago
Boston, MA
Hybrid
Mid level
Information Technology • Web3
The Role
The Site Reliability Engineer manages AWS Kubernetes infrastructure, ensuring operational excellence, security, and scalability, while implementing reliability improvements and best practices.
Summary Generated by Built In

We're seeking an experienced Site Reliability Engineer to take ownership of our AWS-based Kubernetes infrastructure. You'll be responsible for the operational excellence, security, and scalability of our developments and production systems supporting our Enterprise Solid Server (ESS) technology for enterprise clients. You'll have significant autonomy to establish best practices, implement reliability improvements, and build the foundation for our growing infrastructure needs. 

Inrupt is headquartered in Boston, MA. This role is ideally based in Boston. Our team operates on a hybrid schedule, working from the office two days a week and enjoying remote flexibility on the remaining days.


Key Responsibilities

  • Manage day-to-day operations of AWS EKS clusters across development, staging, and production environments
  • Monitor system health, triage alerts, and respond to incidents (15-minute SLO)
  • Perform regular patching, upgrades, and maintenance of the infrastructure components.
  • Maintain and optimize our technology stack: EKS, MSK, RDS, ArgoCD, Traefik, Sysdig, Mezmo, Terraform
  • Manage AWS services, including VPC, RDS, MSK (Kafka), S3, and networking infrastructure
  • Implement and maintain comprehensive monitoring dashboards, alerting, and centralized logging
  • Maintain Terraform-based infrastructure automation and practice GitOps principles
  • Manage data infrastructure lifecycle: RDS databases, Kafka clusters, Redis caching, S3 buckets
  • Implement security baselines, manage RBAC, conduct vulnerability scanning, and remediation
  • Design and test disaster recovery strategies with defined RTO/RPO
  • Support ArgoCD deployments and troubleshoot application deployment issues
  • Create and maintain documentation and troubleshooting guides
  • Provide architectural reviews and capacity planning aligned with business objectives
  • Optimize infrastructure costs while maintaining performance and reliability
  • Establish on-call rotation and incident response procedures with post-mortem analysis
  • Work closely with the engineers to ensure operational requirements are built into our products
  • Work closely with engineers to ensure that non-functional requirements are met by the proposed architecture, design, and development choices.


 

About You

Required:

  • Experience managing production Kubernetes clusters, preferably AWS EKS
  • Deep knowledge of cloud platform services (e.g EC2, EKS, VPC, RDS, S3, IAM, CloudWatch)
  • Strong Terraform experience for infrastructure automation
  • Experience with monitoring platforms (Sysdig, Datadog, or similar) and logging systems
  • Hands-on experience with ArgoCD or similar tools
  • Strong understanding of networking: VPCs, security groups, load balancers, DNS
  • Database administration experience (PostgreSQL), including backups and performance tuning
  • Experience with message queue systems (Kafka/MSK preferred)
  • Proficiency in Python, Bash, or Go for automation
  • Excellent communication skills with the ability to explain complex technical concepts clearly
  • Ownership mindset with strong problem-solving and analytical skills
  • Experience with security best practices and compliance frameworks (SOC2, GDPR)

Preferred:

  • Service mesh experience (Istio, Linkerd, Consul)
  • FinOps practices and cost optimization experience
  • Chaos engineering and resilience testing
  • Multi-region infrastructure experience
  • AWS certifications (Solutions Architect, DevOps Engineer, or Security)
  • CKA (Certified Kubernetes Administrator) certification
  • Experience supporting government or highly regulated industries



 

Top Skills

Argocd
AWS
Bash
Datadog
Eks
Go
Kafka
Kubernetes
Postgres
Python
Sysdig
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Boston, Massachusetts
200 Employees
Year Founded: 2017

What We Do

Sir Tim Berners-Lee, inventor of the World Wide Web, created Solid to realize the web as he fully envisioned it. Sir Tim co-founded Inrupt to provide enterprise-grade Solid software and services.

Inrupt’s data infrastructure software enables enterprises and governments to deploy and manage Solid-compliant solutions. Our products are the expression of decades of experience in security, compliance, and operational excellence.

Similar Jobs

MongoDB Logo MongoDB

Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Hybrid
8 Locations
5550 Employees
151K-386K Annually

DraftKings Logo DraftKings

Site Reliability Engineer

Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Remote or Hybrid
United States
6400 Employees
148K-185K Annually

Circle Logo Circle

Site Reliability Engineer

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
In-Office or Remote
27 Locations
1050 Employees
153K-205K Annually

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
USA
8697 Employees

Similar Companies Hiring

Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees
Rain Thumbnail
Web3 • Payments • Infrastructure as a Service (IaaS) • Fintech • Financial Services • Cryptocurrency • Blockchain
New York, NY
80 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account