Roles and Responsibilities:
- Monitor production workloads using AWS CloudWatch and APM tools.
- Triage alerts, analyze logs, perform first/second-line fixes, and escalate to L3 as per defined runbooks.
- Perform application-level health checks, job monitoring, and validation of integrations.
- Debug and resolve issues related to Redis, PostgreSQL, and Apache Solr components.
- Validate configuration, permissions, and data flow across services.
- Operate and support services running on EC2, ECS, RDS Aurora, Elasticache, CloudFront, and CloudWatch.
- Perform controlled infrastructure changes through standard change management processes.
- Support DR, backup validation, scaling, and environment hygiene.
- Use Terraform or AWS CDK to manage infrastructure as code.
- Support CI/CD pipelines and automation using Git and GitHub, ensuring version control and deployment consistency.
- Operate and troubleshoot Docker containers and workloads on ECS or similar orchestration platforms.
- Build and maintain dashboards in CloudWatch or APM tools like AppSignal, Datadog, or New Relic.
- Lead or assist in high-severity incidents, perform root-cause analysis, and contribute to post-incident reviews.
- Apply least-privilege IAM principles, manage secrets securely, and adhere to internal security guardrails.
- Maintain up-to-date KBs, runbooks, and handover documents to support structured 24/7 operations.
Essential skills & experience
- 3–6 years of experience in Cloud Operations, SRE, or DevOps roles supporting production environments.
- Strong practical knowledge of AWS services including CloudFront, EC2, ECS, RDS Aurora, Elasticache, and CloudWatch.
- Experience with Redis, PostgreSQL, and Apache Solr for data caching, storage, and search indexing.
- Hands-on experience with Docker for container management and Terraform for infrastructure automation.
- Solid understanding of Git and GitHub workflows for version control and collaborative deployments.
- Knowledge of Ruby on Rails would be a bonus
- Strong Linux administration skills including user management, file systems, certificates, and cron jobs.
- Familiarity with application-level troubleshooting (REST APIs, background jobs, logs, etc.).
- Exposure to monitoring and observability tools (AppSignal, Datadog, CloudWatch, New Relic).
- Good scripting ability in Bash or Python for automation and operational tasks.
- Experience in incident management within 24/7 operational environments.
- Strong communication, analytical, and documentation skills with an ability to work under pressure.
- Willingness and ability to work rotational 24/7 shifts from the office, including weekends and nights.
Additional information
- This role is for 24/7 Operations Centre, work-from-office position with rotational shifts (including nights/weekends) and structured handovers. Cab transport will be provided for all shifts with shift allowance.
Benefits
- Keyloop provides a competitive and comprehensive benefits package designed to support employee wellbeing, work-life balance, and long-term growth.
- Health & Insurance:
- Group Medical Insurance up to ₹7,00,000 (family floater)
- Personal Accident cover up to ₹30,00,000 or twice the CTC
- Life Insurance (GTL) up to ₹10,00,000 or twice the CTC
- Annual Master Health and Lipid Profile check-ups
- Leave & Time Off:
- 21 days Earned Leave, 12 Casual Leave, 12 Sick Leave per year
- 12 Fixed Holidays + Birthday Off
- Wellbeing & Volunteer Days (2 each annually)
- Maternity (26 weeks), Paternity (4 weeks), Adoption (12 weeks), Bereavement (6 weeks)
- Tenure-based additional leave after 3 years of service
- Allowances:
- Shift Allowance (Morning, Afternoon, Night as per policy)
- Standby / On-Call Allowance for weekends or holidays
- Holiday Food Reimbursement for on-site work during holidays
- Relocation support (up to ₹60,000 depending on location and family status)
- Childcare reimbursement up to ₹2,000 per child (max two children, up to age 8)
- Additional Benefits:
- Long-term leave (unpaid) for health or family exigencies
- Compensatory off for work on public holidays or weekends
Top Skills
What We Do
As the largest global automotive technology company, Keyloop delivers cutting-edge solutions, tailored to the modern needs of auto retailers and OEMs alike. With 40 years of automotive DNA, and a deep understanding of what it takes to drive success, Keyloop solutions are delivered in over 90 countries, and trusted by more 20,000 retailers and 80 OEMs worldwide.
From the showroom to the workshop, and everything in between, its technology facilitates distinctive customer experiences between key systems, tools and departments. With modern consumers demanding increasingly high levels of service and responsiveness, Keyloop and their partners connect retailers and OEMs to consumers through every step of their journey.
Keyloop delivers a proven technology ecosystem that redesigns the automotive retail experience to cultivate lasting loyalty and optimise margins through increased efficiency, elevated experiences, and unrivalled connected data.
For more information, please visit www.keyloop.com






