Inrupt

Site Reliability Engineer

Inrupt

Site Reliability Engineer

Sorry, this job was removed at 04:01 p.m. (UTC) on Monday, Jun 22, 2026

Boston, MA, USA

Hybrid

Information Technology • Web3

The Role

We're seeking an experienced Site Reliability Engineer to take ownership of our AWS-based Kubernetes infrastructure. You'll be responsible for the operational excellence, security, and scalability of our developments and production systems supporting our Enterprise Solid Server (ESS) technology for enterprise clients. You'll have significant autonomy to establish best practices, implement reliability improvements, and build the foundation for our growing infrastructure needs.

Inrupt is headquartered in Boston, MA. This role is ideally based in Boston. Our team operates on a hybrid schedule, working from the office two days a week and enjoying remote flexibility on the remaining days.

Key Responsibilities

Manage day-to-day operations of AWS EKS clusters across development, staging, and production environments
Monitor system health, triage alerts, and respond to incidents (15-minute SLO)
Perform regular patching, upgrades, and maintenance of the infrastructure components.
Maintain and optimize our technology stack: EKS, MSK, RDS, ArgoCD, Traefik, Sysdig, Mezmo, Terraform
Manage AWS services, including VPC, RDS, MSK (Kafka), S3, and networking infrastructure
Implement and maintain comprehensive monitoring dashboards, alerting, and centralized logging
Maintain Terraform-based infrastructure automation and practice GitOps principles
Manage data infrastructure lifecycle: RDS databases, Kafka clusters, Redis caching, S3 buckets
Implement security baselines, manage RBAC, conduct vulnerability scanning, and remediation
Design and test disaster recovery strategies with defined RTO/RPO
Support ArgoCD deployments and troubleshoot application deployment issues
Create and maintain documentation and troubleshooting guides
Provide architectural reviews and capacity planning aligned with business objectives
Optimize infrastructure costs while maintaining performance and reliability
Establish on-call rotation and incident response procedures with post-mortem analysis
Work closely with the engineers to ensure operational requirements are built into our products
Work closely with engineers to ensure that non-functional requirements are met by the proposed architecture, design, and development choices.

About You

Required:

Experience managing production Kubernetes clusters, preferably AWS EKS
Deep knowledge of cloud platform services (e.g EC2, EKS, VPC, RDS, S3, IAM, CloudWatch)
Strong Terraform experience for infrastructure automation
Experience with monitoring platforms (Sysdig, Datadog, or similar) and logging systems
Hands-on experience with ArgoCD or similar tools
Strong understanding of networking: VPCs, security groups, load balancers, DNS
Database administration experience (PostgreSQL), including backups and performance tuning
Experience with message queue systems (Kafka/MSK preferred)
Proficiency in Python, Bash, or Go for automation
Excellent communication skills with the ability to explain complex technical concepts clearly
Ownership mindset with strong problem-solving and analytical skills
Experience with security best practices and compliance frameworks (SOC2, GDPR)

Preferred:

Service mesh experience (Istio, Linkerd, Consul)
FinOps practices and cost optimization experience
Chaos engineering and resilience testing
Multi-region infrastructure experience
AWS certifications (Solutions Architect, DevOps Engineer, or Security)
CKA (Certified Kubernetes Administrator) certification
Experience supporting government or highly regulated industries

View all jobs at Inrupt

View Inrupt Profile

Report Job

Similar Jobs

DraftKings

Site Reliability Engineer

Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics

Remote or Hybrid

United States

6400 Employees

200K-250K Annually

MongoDB

Site Reliability Engineer

Big Data • Cloud • Software • Database

Easy Apply

Remote or Hybrid

5550 Employees

127K-249K Annually

Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning

Easy Apply

Remote or Hybrid

200 Employees

200K-230K Annually

Akamai Technologies

Site Reliability Engineer

Cloud • Security • Software • Cybersecurity

In-Office or Remote

10285 Employees

76K-136K Annually

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Boston, Massachusetts

200 Employees

Year Founded: 2017

What We Do

Sir Tim Berners-Lee, inventor of the World Wide Web, created Solid to realize the web as he fully envisioned it. Sir Tim co-founded Inrupt to provide enterprise-grade Solid software and services. Inrupt’s data infrastructure software enables enterprises and governments to deploy and manage Solid-compliant solutions. Our products are the expression of decades of experience in security, compliance, and operational excellence.