What’s in it for you?
- Maintain and improve the reliability, availability, and performance of high-volume, data-intensive applications.
- Design, implement, and enhance monitoring, logging, and alerting solutions at scale.
- Collaborate with development teams to optimize system architecture and reliability.
- Manage and troubleshoot distributed systems in a Linux-based production environment.
- Leverage AWS cloud services to scale infrastructure efficiently.
- Utilize Kubernetes for container orchestration, ensuring optimal resource utilization and deployment strategies.
- Implement CI/CD pipelines using GitLab to automate deployments and operational tasks.
- Use infrastructure as code (IaC) tools such as Terraform and CloudFormation for provisioning and managing cloud resources.
- Implement observability best practices using Grafana, Prometheus, Thanos, and Loki.
- Perform root cause analysis (RCA) and proactively address performance bottlenecks and system failures.
- Ensure security best practices and compliance across all infrastructure components.
We’d love to hear from you, if you:
- Have 3+ years of experience in Site Reliability Engineering or related fields.
- Possesses strong Linux fundamentals with a deep understanding of system internals.
- Expertise in troubleshooting and problem-solving in distributed environments.
- Have hands-on experience with logging and monitoring solutions at scale.
- Are proficient in at least one programming language (preferably Python).
- Have strong experience with AWS services and Kubernetes.
- Have exposure to CI/CD pipelines, preferably using GitLab CI/CD.
- Have experience with infrastructure as code (Terraform, CloudFormation).
- Are familiar with observability tools such as Grafana, Prometheus, Thanos, and Loki.
Preferred Qualifications
- Experience in performance tuning and capacity planning.
- Knowledge of incident management and post-mortem analysis processes.
- Familiarity with security best practices in cloud environments.
- Experience in automating operational tasks using scripting and configuration management tools.
Similar Jobs
What We Do
Mindtickle provides a comprehensive, data-driven solution for sales readiness and enablement that fuels revenue growth and brand affinity. Its purpose-built applications, proven methodologies, and best practices are designed to drive effective sales onboarding and ongoing readiness.
With Mindtickle, revenue and sales leaders can continually assess, diagnose and develop the knowledge, skills, and behaviors required to effectively engage buyers and drive growth. Companies across a wide range of industries use Mindtickle's innovative capabilities for onboarding, training, bite-sized mobile updates, gamification-based learning, call recording, coaching and role-play to ensure world-class sales performance.
.png)







