Location: Remote | Time Zone: Americas (8AM–5PM PT)
About VirtasantVirtasant is a global technology company delivering large-scale cloud, data, and AI solutions for some of the world’s leading organizations. With a remote-first model, we operate in over 130 countries and bring together top-tier talent to solve real business problems.
The RoleAs a Site Reliability Engineering (SRE) Support Engineer, you’ll play a mission-critical role in diagnosing infrastructure issues, resolving complex deployment challenges, and supporting clients across production and staging environments. This role requires a strong combination of hands-on infrastructure experience, customer empathy, and cross-functional collaboration.
You’ll work directly with client engineering teams to troubleshoot, improve, and scale systems that power high-availability applications.
Key Responsibilities
- Troubleshoot complex issues across infrastructure and deployment layers (Docker, Kubernetes, AWS).
- Support CI/CD, application deployments, and container orchestration systems.
- Engage directly with client engineers and stakeholders to resolve escalated support issues.
- Analyze trends in customer incidents and propose technical/process improvements.
- Write and maintain documentation, runbooks, and internal KB articles.
- Participate in post-incident reviews and drive continuous improvement.
- Support occasional weekend maintenance windows (comp time provided).
- 5+ years supporting production applications and web services.
- Strong experience with AWS, Kubernetes, and Docker.
- Experience troubleshooting complex distributed systems.
- Deep understanding of Linux administration and scripting (Bash, Python preferred).
- Familiarity with networking concepts: DNS, load balancing, firewalls.
- Excellent spoken and written English for technical customer communication.
- Comfortable working independently and owning issue resolution end-to-end.
- Ability to work US hours (8AM–5PM PT).
- Familiarity with Spark, Kafka, and related distributed data systems.
- Experience with IaC tools like Terraform, Ansible, or Puppet.
- Observability tools: Datadog, Prometheus, Grafana, Splunk.
- Kubernetes certification (CKA, CKAD, etc.).
- Detail-oriented with a relentless focus on operational excellence.
- Clear communicator who can translate complex issues for technical and non-technical stakeholders.
- Empathetic, calm under pressure, and focused on driving resolution—not blame.
- Loves learning and improving—personally and technically.
Freedom to Grow. Power to Deliver.
At Virtasant, we believe talented people do their best work in environments built on trust, autonomy, and continuous learning. You’ll join a truly global team - 130+ countries strong - where you can:
- Work from anywhere with full autonomy and respect for your time.
- Learn in every direction by working on cutting-edge systems across clients and sectors.
- Collaborate globally with kind, curious, and professional teammates.
- Make real impact by solving technical challenges that matter.
We’re remote-first. Trust-based. Proudly diverse. And relentlessly focused on delivering meaningful work.
Similar Jobs
What We Do
Virtasant is a fully-remote and globally distributed technology and business services company that brings the breadth and capability of more than 4,000 technology professionals in 130 countries. We started working in the public cloud in 2007, and over the last 13 years, our team has built or re-platformed over 200 applications, with over 600 million lines of code currently under management.
In this time, our team has developed proprietary software and methodologies that help us quickly and accurately identify opportunities to maximize the speed, scale and cost advantages that come with native cloud computing. Virtasant combines industry-leading cloud specialization with a deep bench of global experts across every aspect of the product lifecycle to deliver superior results to our clients. Virtasant offers services including cloud assessments and migrations, custom product development, information management, machine learning, remote team management, application modernization, full-lifecycle technology outsourcing, and more








