1058 | SRE / DevOps / Infrastructure Engineer

Sorry, this job was removed at 04:19 p.m. (CST) on Wednesday, Apr 08, 2026
6 Locations
Remote
Artificial Intelligence • Blockchain • Internet of Things • Machine Learning • Software
The Role

Intetics Inc., a global technology company providing custom software application development, distributed professional teams, software product quality assessment, and “all-things-digital” solutions, is seeking a highly skilled and experienced Senior DevOps Engineer to join our dynamic team on a full-time basis.
About the Project
A fast-growing tech company is building an infrastructure layer for modern AI workloads — a globally distributed platform that provides scalable, cost-efficient, and reliable access to GPU computing resources.

The platform enables customers to run production-level inference workloads across a diverse network of providers, offering flexibility, performance, and resilience required for real-world AI applications.

Since its launch, the company has demonstrated strong traction, securing a significant Series A investment and achieving multi-million ARR within its first year of operation. As both customer demand and platform scale continue to expand, the team is actively growing its infrastructure capabilities to support the next stage of development.
About the Role
We are looking for a strong SRE / DevOps / Infrastructure Engineer to help scale and operate a distributed AI-focused infrastructure platform.

The system combines a cloud-based control layer (running on AWS, including EKS and managed MySQL) with a large fleet of GPU-powered nodes distributed across multiple external providers. These components are connected via a custom networking layer to ensure high availability and performance for production workloads.

Workloads are orchestrated with Kubernetes, while observability is built around Prometheus, Grafana, Loki, Jaeger, and OpenTelemetry, covering metrics, logging, and tracing across the platform.

While the control layer is relatively lightweight and cloud-native, the GPU infrastructure introduces additional complexity. It spans different providers and environments, often resembling distributed on-premise setups rather than standard cloud infrastructure, requiring a deeper understanding of networking, reliability, and systems behavior at scale.

This is a hands-on role focused on solving real infrastructure challenges across Kubernetes, networking, observability, and production operations.

You will join a small, high-impact infrastructure team (currently a couple of engineers) that is actively growing as the platform and customer base continue to expand. The goal is to strengthen the core infrastructure early and support further scaling.

What you’ll do

  • Build, operate, and improve the infrastructure powering Parasail’s distributed inference platform
  • Own reliability, scalability, and operational excellence across AWS-based control planes and our multi-provider GPU fleet
  • Design and maintain the networking layer connecting control planes, Kubernetes clusters, and geographically distributed GPU hosts
  • Operate and improve Kubernetes-based inference orchestration, primarily on EKS
  • Manage deployments and infrastructure changes using Helm, FluxCD, and Terraform
  • Improve observability across the platform using metrics, logs, traces, dashboards, and alerting built on Prometheus, Grafana, Loki, Jaeger, and OpenTelemetry
  • Tune alerts, improve runbooks, and strengthen operational readiness as the system scales
  • Respond to production issues, perform root cause analysis, and implement durable fixes
  • Work closely with engineers across time zones using clear asynchronous communication and handoff practices, especially through Slack
  • Help expand Europe-based infrastructure coverage to support sustainable operations outside US business hours

Requirements
  • 5+ years of experience in SRE, DevOps, platform engineering, or infrastructure engineering
  • Strong production experience with networking and Kubernetes
  • Experience operating AWS infrastructure in production, especially EKS
  • Strong hands-on experience managing Linux hosts, clusters, and distributed systems in environments that are not fully abstracted by a major cloud provider
  • Experience with Prometheus, Grafana, Loki, Jaeger, and OpenTelemetry
  • Experience with deployment and GitOps workflows using tools such as Helm and FluxCD
  • Experience with infrastructure as code, ideally Terraform
  • Familiarity with alert tuning, runbook development, and practical incident management in production systems
  • Strong operational judgment: able to troubleshoot independently, respond calmly to incidents, and improve systems without constant direction
  • Comfortable working in a fast-moving startup where infrastructure, product, and customer demands are changing quickly
  • Clear communicator who can work effectively in an async environment and handle shift handoffs cleanly

Nice to have

  • Experience with AI inference, ML infrastructure, or adjacent high-performance distributed systems
  • Experience operating heterogeneous GPU fleets, bare-metal infrastructure, or multi-provider compute environments
  • Experience using AI tools productively in engineering workflows

Similar Jobs

Circle (circle.so) Logo Circle (circle.so)

Senior Site Reliability Engineer

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
130K-140K Annually

Circle (circle.so) Logo Circle (circle.so)

Customer Support (Pacific Time)

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
45K-60K Annually

Circle (circle.so) Logo Circle (circle.so)

Customer Support (Oceania)

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
45K-60K Annually

Pfizer Logo Pfizer

Scientist

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Remote or Hybrid
3 Locations
121990 Employees
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Naples, FL
532 Employees
Year Founded: 1995

What We Do

Intetics Inc. is a leading American technology company providing custom software application development, distributed professional teams creation, software product quality assessment, and “all-things-digital” solutions built with SMAC, RPA, AI/ML, IoT, blockchain, and GIS/UAV/LBS technologies. Based on proprietary pioneering business models of Offshore Dedicated Team® and Remote In-Sourcing®, an advanced Technical Debt Reduction Platform (TETRA™) and measurable SLAs for software engineering, Intetics helps innovative organizations capitalize on global talent with our in-depth engineering expertise based on our Predictive Software Engineering framework. Intetics core strength lays in design of software products in conditions of incomplete specifications. We have extensive industry expertise in Education, Healthcare, Logistics, Life Sciences, Finance, Insurance, Communications, and custom ERP, CRM, Intelligent Automation and Geospatial solutions. Our advanced software engineering background and outstanding quality management platform, along with an unparalleled methodology for talent acquisition, team building and talent retention, guarantee that our clients receive exceptional results for their projects. At Intetics, our outcomes do not just meet clients expectations, they have been exceeding them for a quarter of a century. Intetics operates from multiple offices in the USA, Europe and Latin America, hiring the best talent available worldwide. Intetics is ISO 9001 (quality) and ISO 27001 (security) certified and a Microsoft Gold, Amazon, and UiPath Silver partner. The company’s innovation and growth achievements are reflected in winning prestigious titles and awards, including Inc5000, Software 500, CRN 100, American Business, Deloitte Fast 50, European IT Excellence, Best European BPO, Stevie People’s Choice, Clutch and ACQ5 Awards, IAOP Global Outsourcing 100 and Fortune Innovative 300 lists.

Similar Companies Hiring

Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account