Disseqt AI

Senior Platform & SRE Engineer (On-Prem & AI Systems)

Reposted Yesterday

Be an Early Applicant

Bengaluru, Bengaluru Urban, Karnataka, IND

Hybrid

Senior level

Artificial Intelligence • Enterprise Web • Software • Generative AI

The Role

Own and design infrastructure architecture for cloud, hybrid, and on-prem AI deployments. Build scalable deployments, CI/CD/GitOps pipelines, observability stacks, and reusable IaC modules. Define SLOs, run incident response and postmortems, optimize performance and cost, ensure security and compliance, and collaborate with ML, backend, and product teams to deliver reliable enterprise AI platform services.

Summary Generated by Built In

Senior Platform & SRE Engineer (On-Prem & AI Systems)

Location: Bengaluru / Hybrid

Team: Platform Engineering & Infrastructure

We are building lean, agentic AI systems and enterprise-grade developer platforms designed for IT and DevOps teams who need reliable, secure, and cost-efficient AI deployments. Our products run in cloud, hybrid, and fully on-prem environments, enabling enterprises to streamline testing, monitoring, compliance, and operational efficiency.

As a Senior Platform & SRE Engineer (On-Prem & AI Systems), you will own the infrastructure layer that powers all our AI services. You will design scalable, secure, and fault-tolerant environments, orchestrate on-prem deployments for enterprise customers, and ensure platform reliability across cloud + customer VPC setups. This role sits at the intersection of infrastructure engineering, DevOps, SRE, and AI system deployment.

You will define the platform architecture, build automation, improve observability, optimize performance, and work closely with product and ML teams to enable fast, reliable delivery of our AI-driven features.

This role is for engineers who think in systems, automate everything, and thrive in environments where reliability, security, and efficiency are non-negotiable.

What You’ll Own

End-to-end infrastructure architecture for cloud and on-prem deployments
Scalable, reproducible deployments of AI, ML, and microservice workloads
SRE responsibilities: uptime, SLO/SLA definitions, incident response, postmortems
Build and manage CI/CD pipelines, GitOps workflows, automated release processes
Implement observability stacks (OpenTelemetry, Prometheus, Grafana, ELK)
Optimize platform performance, CPU-based model serving, cost efficiency
Security-first infrastructure design: secrets, IAM, isolation, least-privilege access
Create reusable Terraform/Helm/Ansible modules
Collaborate with backend, ML, and product teams on platform-level decisions
Drive operational excellence across monitoring, reliability, and scalability

What You’ll BringMust-Have Skills

5+ years in Infrastructure, SRE, or DevOps roles
Deep experience with on-prem deployments (VMs, proxies, firewalls, private networks)
Strong Terraform / Helm / Kubernetes (EKS, GKE, self-managed clusters)
Observability expertise: Prometheus, Grafana, OpenTelemetry
CI/CD expertise: GitHub Actions, GitLab CI, ArgoCD, or similar
Strong Linux fundamentals, networking, Docker internals
Experience deploying distributed microservices in production
Ability to debug infrastructure issues end-to-end

Great-to-Have

Experience supporting AI/ML workloads, model serving, vector DBs
Familiarity with open-source LLMs and CPU-based inference optimizations
Experience with air-gapped/on-prem enterprise deployment models
Security certifications or experience with SOC2 / enterprise compliance
Performance engineering, scalability tuning, load testing

Why This Role Matters

You will be one of the most critical hires in shaping our core platform- the foundation on which our agentic AI systems operate. Your work will determine how fast we can innovate, how reliably we can operate, and how securely we can deploy AI in enterprise environments.

You will directly influence:

Our Managed and on-prem enterprise architecture
Product reliability & SLAs
Deployment experience for customers
Overall developer velocity and system scalability

This is a career-defining opportunity to build a next-generation AI platform used by enterprise IT and DevOps teams globally.

Skills Required

5+ years in Infrastructure, SRE, or DevOps roles
Deep experience with on-prem deployments (VMs, proxies, firewalls, private networks)
Strong Terraform, Helm, and Kubernetes experience (EKS, GKE, self-managed clusters)
Observability expertise: Prometheus, Grafana, OpenTelemetry, ELK
CI/CD expertise: GitHub Actions, GitLab CI, ArgoCD, or similar
Strong Linux fundamentals, networking knowledge, and Docker internals
Experience deploying distributed microservices in production
Ability to debug infrastructure issues end-to-end
Create reusable Terraform/Helm/Ansible modules and automation
Experience supporting AI/ML workloads, model serving, vector DBs
Familiarity with open-source LLMs and CPU-based inference optimizations
Experience with air-gapped/on-prem enterprise deployment models
Security certifications or experience with SOC2 / enterprise compliance
Performance engineering, scalability tuning, and load testing experience

View all jobs at Disseqt AI

View Disseqt AI Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

19 Employees

Year Founded: 2025

What We Do

Disseqt AI provides an AI assurance platform for the full enterprise lifecycle, specializing in the testing, monitoring, and governance of agentic AI. The company enables organizations to validate AI behavior against internal policies, conduct red teaming, and maintain audit trails to ensure reliability and compliance with regulations like the EU AI Act, helping enterprises move from experimentation to production with confidence.