Mindvalley

Site Reliability Engineer

Posted 20 Days Ago

Be an Early Applicant

Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur

3-5 Years Experience

Edtech

The Role

Join Mindvalley as a Site Reliability Engineer to build and maintain a resilient, high-performance infrastructure. Responsibilities include developing, overseeing cloud infrastructure, championing site reliability, and promoting CI/CD and DevOps best practices. Required skills include Kubernetes, Prometheus Metrics, Linux, Terraform, Ansible, and knowledge of cloud services like AWS, GCP, and Azure. Ideal candidates have 3+ years of system design experience and a proactive mindset for continuous improvement.

Summary Generated by Built In

Mindvalley is the leading and most promising ed-tech company to date. We dominate the US market for Personal Growth Education. We are empowering athletes within every major US sports team and promoting successful learning strategies in major companies.
We're currently on a mission to build the most advanced and complete learning experience to enable personal growth and development for all our amazing customers. We innovate tools that induce enlightenment within every aspect of human life. We are seeking the best engineers to build the best and most advanced education platform our species has seen. The goal to mark our success is: to power up to 100 countries, powering every Fortune 500 company, and progressing humanity towards a better future.

Join us in the mission to build and maintain a resilient, high-performance infrastructure!

We're on the lookout for a dynamic and seasoned Site Reliability Engineer (SRE). In this pivotal role, you'll join an exceptional team of SREs, ensuring the stability, scalability, and efficiency of our cloud infrastructure and applications.

Cloud Infrastructure Development:

Develop, and oversee our cloud infrastructure across leading platforms such as AWS, GCP, or Azure.
Implement infrastructure as code (IaC) methodologies for streamlined provisioning and configuration management.
Stay abreast of cloud advancements and best practices, driving optimization initiatives within our cloud environment.
Collaborate closely with architects and cloud engineers to craft secure, cost-effective solutions that meet our evolving needs.

Site Reliability Champion:

Advocate for the principles of Site Reliability Engineering (SRE) within the team and throughout the organization.
Spearhead the development and deployment of automated monitoring, alerting, and incident response systems.
Cultivate a culture of proactive troubleshooting and continuous enhancement of infrastructure reliability.
Utilize metrics analysis to pinpoint bottlenecks and fine-tune performance and scalability.

CI/CD and DevOps Champion:

Champion CI/CD and DevOps best practices within the team.
Spearhead the development and deployment of automated pipelines for infrastructure deployments.
Integrate monitoring and alerting systems into the CI/CD pipeline for proactive issue identification.
Promote collaboration between SRE, development, and operations teams.

Proficient in container orchestration systems, specifically Kubernetes.
Skilled in Prometheus Metrics & Observability ecosystems.
Strong understanding of Linux and network fundamentals.
Experience with automation tools (Terraform, Ansible, Chef, Puppet).
Knowledge of cloud services (AWS, GCP, Azure) and multi-cloud environments.
Familiarity with the full Software Development Life Cycle, including both Waterfall and Agile methodologies.
Excellent teamwork and communication skills, with a knack for detail-oriented problem-solving.
Ability to work under pressure, managing critical systems with a focus on timely delivery.
A proactive mindset, always looking for ways to improve system reliability and efficiency.
Curiosity and a continuous learning attitude, embracing new technologies and methodologies to drive innovation.

Demonstrated experience (3+ years) in system design, maintenance, and troubleshooting, with a solid background in Site Reliability Engineering, DevOps, Cloud Engineering, or similar roles.
Proven track record in automating operations, including deployment, system configurations, and operational tasks, to minimize manual work and enhance efficiency.
Expertise in container orchestration systems, especially Kubernetes, to ensure scalable and reliable application deployment.
Proficient in implementing and managing monitoring tools like Prometheus for proactive issue detection and resolution.
Strong foundation in Linux and network fundamentals, ensuring secure and optimized system operations.
Experience with infrastructure as code tools (Terraform, Ansible, Chef, Puppet) for efficient system provisioning and management.
Familiarity with cloud services (AWS, GCP, Azure) and the ability to navigate and optimize multi-cloud environments.
Knowledge of the full Software Development Life Cycle, with experience in both Waterfall and Agile methodologies, to support continuous integration and delivery.
Ability to lead incident response efforts, conduct thorough post-mortem analyses, and implement preventative measures to maintain high system availability and performance.
Capacity planning and performance tuning expertise to manage growth effectively and maintain optimal service levels.
Excellent communication skills, with the ability to work closely with cross-functional teams, including direct collaboration with C-level executives and tech leadership.
A proactive, solution-oriented mindset, with a focus on continuous improvement and innovation to drive system reliability and efficiency.
Curiosity and a commitment to continuous learning, with a willingness to explore new technologies and methodologies to enhance operational excellence.

Mindvalley is an equal opportunity employer and does not discriminate on the basis of race, colour, religion, gender identity or expression, national origin, age, disability, marital status, sexual orientation, or any other legally protected status. We are committed to creating a diverse and inclusive workplace and encourage applications from all qualified individuals.

Top Skills

Ansible

Kubernetes

Terraform

View all jobs at Mindvalley

View Mindvalley Profile

Report Job

The Company

Kuala Lumpur, Federal Territory

503 Employees

On-site Workplace

Year Founded: 2003

What We Do

Mindvalley is a learning experience company that publishes ideas and teachings by the best authors in personal growth, wellbeing, spirituality, productivity, mindfulness and more – and combines them with cutting-edge sophisticated learning technology within engaged and supportive communities.

Through our education platforms, online academies, mobile apps and both digital and live events, we give you access to an alternative curriculum that empowers you to kickstart your personal growth and lead extraordinary lives. Our ultimate goal is to launch a unified school for a billion people for all stages of life.