Senior CloudOps Engineer

Posted 2 Days Ago
Be an Early Applicant
Pune, Maharashtra
Hybrid
Senior level
Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Take Command of your Career
The Role
The Senior CloudOps Engineer will manage and automate cloud infrastructure, support ML operational pipelines, and ensure high availability, scalability, and security of services.
Summary Generated by Built In
Senior CloudOps Engineer
Job Description:
We are seeking an experienced and highly specialized Senior CloudOps Engineer to manage, automate, and secure our production cloud infrastructure and Machine Learning (ML)/Large Language Model (LLM) operational pipelines. This role is strictly focused on the operations and infrastructure that supports our data science and engineering teams-it is not a data science or core LLM development position.
Key Responsibilities and Required Expertise
The successful candidate will be an expert in all the following areas, driving high availability, scalability, and security.
I. Cloud Infrastructure & Automation
  • Infrastructure as Code (IaC): Deep expertise in managing and provisioning infrastructure using Terraform.
  • Containerization & Orchestration: Advanced deployment, scaling, and management of services using Docker/Kubernetes.
  • Networking & Services: Architecting and maintaining high-performance API Layers & Microservices.
  • AWS CloudOps: Expert proficiency in AWS operational services, including EventBridge and Step Functions, for building robust automation flows.
  • Data Storage: Managing and optimizing critical AWS data services, including S3, DynamoDB, Redshift, and Kinesis.

II. MLOps Tooling & Monitoring
  • ML/LLM Tooling Support: Provide and maintain the operational infrastructure for ML/LLM systems, including Model Registry/Versioning tools like MLflow/SageMaker.
  • Pipeline Automation (CI/CD): Designing and implementing robust CI/CD pipelines for ML/LLM deployments using tools like GitHub Actions/Jenkins.
  • Model Operations: Building the infrastructure to support Drift Detection & Retraining capabilities.
  • Monitoring & Alerting: Implementing comprehensive observability stacks using Prometheus/Grafana/CloudWatch.
  • Incident Management: Leading resolution efforts for production issues, including expertise with PagerDuty and On-call responsibilities.

III. Security & Compliance (FinOps)
  • Cloud Security: Establishing and enforcing strong security policies and best practices across the cloud environment (IAM, VPC, Secrets).
  • AWS Security Services: Expert knowledge and application of specific AWS security tools like IAM, KMS, and Secrets Manager.
  • Cost Optimization: Leading initiatives for Cost Optimization (FinOps), balancing performance and efficiency across all cloud resources.

Top Skills

AWS
Cloudwatch
Docker
DynamoDB
Eventbridge
Github Actions
Grafana
Jenkins
Kinesis
Kubernetes
Mlflow
Pagerduty
Prometheus
Redshift
S3
Sagemaker
Step Functions
Terraform

What the Team is Saying

Priya
Sammi
Tara
John
Grace
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Boston, MA
2,400 Employees
Year Founded: 2000

What We Do

At Rapid7, our vision is to create a secure digital world for our customers, our industry, and our communities. We do this by harnessing our collective expertise and passion to challenge what’s possible and drive extraordinary impact. We’re building a dynamic and collaborative workplace where new ideas are welcome.

Protecting 11,000+ customers against bad actors and threats means we’re continuing to push the envelope - just like we’ve been doing for the past 20 years. If you’re ready to solve some of the toughest challenges in cybersecurity, we’re ready to help you take command of your career.

Join us.

Why Work With Us

With our products, research, and open source communities, we’re building a secure digital future for everyone. This means constantly learning and evolving in an industry that’s anything but stagnant. You’ll be faced with tough challenges, and given the support to find creative solutions that drive our business, and your career forward.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Rapid7 Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Our default working model is hybrid, with employees working three days per week in the office. This approach underpins our commitment to flexibility and adaptability while supporting our dedication to development, teamwork and customer purpose.

Typical time on-site: 3 days a week
Company Office Image
HQBoston
Singapore - Regional Headquarters
Company Office Image
Arlington
Company Office Image
Austin, TX
Company Office Image
Belfast, GB
Dublin
Galway
Melbourne
Tokyo
Munich
Company Office Image
Prague
India
Company Office Image
Reding, UK
Company Office Image
Tampa, FL
Tel Aviv
Learn more

Similar Jobs

Rapid7 Logo Rapid7

Automation Engineer

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
Pune, Maharashtra, IND

Rapid7 Logo Rapid7

Senior Platform Operations Engineer

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
Pune, Maharashtra, IND

Rapid7 Logo Rapid7

Senior Compensation Analyst

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
Pune, Maharashtra, IND

Rapid7 Logo Rapid7

Senior People Strategist

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
Pune, Maharashtra, IND

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account