Site Reliability Engineer(SRE)

Reposted 22 Days Ago
Hiring Remotely in India
Remote
Mid level
Artificial Intelligence • Cloud • Information Technology • Software • Cybersecurity
The Role
The Site Reliability Engineer (SRE) will manage cloud-native infrastructure, develop CI/CD pipelines, and ensure system reliability using best practices and automation tools.
Summary Generated by Built In

Note:
Immediate joiners or candidates who can join within 7 days only to apply. Folks who can start working with us from 15th June 2026 will be given priority. If you have already applied to CloudRaft in the last 90 days, we already have your CV/resume on file. Multiple applications from the same candidate will not be considered.

About CloudRaft
CloudRaft is a premier cloud-native consulting and engineering company that helps ambitious startups and digital-first organizations build, scale, and operate mission-critical platforms. We partner with innovators at the forefront of artificial intelligence, developer productivity, observability, digital commerce, and enterprise software—enabling them to accelerate growth with resilient, scalable, and production-ready cloud infrastructure.

Our experience spans organizations developing AI safety and governance platforms, AI Cloud, AI agent ecosystems, developer tooling, observability solutions, digital health products, customer engagement platforms, and technology-driven franchise networks. By combining deep expertise in Platform Engineering, Kubernetes, DevOps, Observability, and Cloud Native technologies, CloudRaft helps high-growth companies move faster, operate more reliably, and focus on building category-defining products.


Job Description
We are looking for passionate Site Reliability Engineers (SREs) to join our growing team. In this role, you will take end-to-end ownership of designing, building, operating, and scaling mission-critical infrastructure for our partners. You will be responsible for ensuring reliability, performance, security, and operational excellence while driving automation, improving system efficiency, and implementing innovative solutions. Working at the intersection of software engineering and operations, you will help create resilient platforms that enable fast-growing organizations to scale with confidence.


Responsibilities
  • Manage and maintain Kubernetes clusters across cloud platforms, including OpenShift, Amazon EKS, Azure AKS, and Google GKE.
  • Implement and manage CI/CD pipelines using tools such as Jenkins, GitHub Actions, Argo CD, or GitLab CI/CD.
  • Design and maintain observability stacks with tools including Prometheus, Grafana, Loki, OpenTelemetry, and related technologies. Be part of the team who support open source projects like Prometheus, Thanos, Mimir, CloudNativePG, Istio and more.
  • Optimize system performance and resolve production issues. Be part of the on call roster to provide 24x7 coverage for the critical production systems.
  • Implement SRE principles, including Service Level Indicators (SLIs) and Service Level Objectives (SLOs), to uphold system reliability.
  • Automate infrastructure and operational tasks using programming languages such as Go or Python, and Infrastructure as Code (IaC) tools like Terraform.
  • Apply agentic AIto automate the SDLC lifecycle, AIOps and automation.
  • Learn about emerging technologies, including AI, GPU Infrastructure
  • Contribute to knowledge sharing through technical writing and presentations.

Qualifications
  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • 2-5 years of experience in SRE, Platform Engineering, or DevOps Engineer.
  • Strong expertise in Kubernetes, cloud-native technologies, on-premise and major cloud platforms (AWS, Azure, GCP).
  • Proficiency in programming languages such as Python or Go or Node.js.
  • Familiarity with CI/CD tools and modern deployment practices.
  • Proficiency in one or more open source observability stacks and Infrastructure as Code (Terraform/Pulumi).
  • CKA/CKAD Certified (Brownie points!)
  • Excellent problem-solving abilities and communication skills.
  • Inclination toward open-source contributions is advantageous.

Benefits : 
- Competitive salary
- Premium health insurance and various health & wellness benefits from a leading insurance provider through Plum
- Opportunity to work on the latest AI stack and GPU infrastructure
- Collaborative and supportive work environment full of learning
- Chance to take a front seat where you lead and deliver

Skills Required

  • Bachelor's degree in Computer Science, Information Technology, or related field
  • 2-5 years of experience in SRE, Platform Engineering, or DevOps roles
  • Strong expertise in Kubernetes, cloud-native technologies, and major cloud platforms (AWS, Azure, GCP)
  • Proficiency in programming languages such as Python or Go or Node.js
  • Familiarity with CI/CD tools and contemporary deployment practices
  • Knowledge of observability tools and Infrastructure as Code
  • AI skills including experience with Vibe Coding, AIOps and automation, understanding of LLMs and AI Agents, Prompt Engineering
  • CKA Certification
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Indore, Madhya Pradesh
16 Employees
Year Founded: 2022

What We Do

CloudRaft is a trusted problem solver for startups and Fortune 500 companies. Our team crafts cutting-edge AI Cloud, GPU Cloud, and cloud native solutions. We specialize in DevOps & cloud consulting, observability, and enterprise-grade support for open source technologies like PostgreSQL and Clickhouse. With our expertise, businesses can confidently navigate their digital transformation journey. Our Specialization: - AI Cloud, GPU Cloud, AI Infrastructure, Enterprise AI, and Generative AI: Empowering businesses with advanced AI capabilities that enhance decision-making and operational efficiency. - Cloud Native Solutions, Kubernetes Consulting: Crafting scalable, resilient cloud environments that adapt to your business needs. - DevOps & Cloud Consulting: Streamlining development and operations through best practices in DevOps and cloud strategies. - DevSecOps & Security: Ensuring robust security measures are integrated seamlessly into every stage of development. - Observability: Providing deep insights into system performance to ensure optimal functionality and quick resolution of issues. - Enterprise-grade Support for Open Source Technologies: Offering expert support for tools like Thanos, Prometheus, ArgoCD, PostgreSQL, and Clickhouse, ensuring your open-source projects thrive. We are committed to helping businesses navigate their digital transformation journey with confidence. Visit us at www.cloudraft.io

Similar Jobs

CrowdStrike Logo CrowdStrike

Site Reliability Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
India
10000 Employees

Circle (circle.so) Logo Circle (circle.so)

Senior Site Reliability Engineer

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
130K-140K Annually

Coupa Logo Coupa

Site Reliability Engineer

Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI
Remote
India
2500 Employees

Miratech Logo Miratech

Site Reliability Engineer

Information Technology
In-Office or Remote
Jaipur, Rajasthan, IND
701 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account