Senior DevOps Engineer (This is a contract role for six months with the chance of extension based on projects)
About Platform9:
Platform9 is a leader in simplifying enterprise private clouds. Our flagship product, Private Cloud Director delivers an open, comprehensive private cloud platform that is developer-friendly and cost-effective, while providing enterprise IT teams with key capabilities they need to modernize their legacy virtualized infrastructure, including: *
Production ready virtualization: Comprehensive virtualization management, from advanced resource scheduling to powerful cluster-wide Software Defined Networking.
Built for the enterprise: Ability to run hundreds of clusters with high availability, seamless in-place upgrades, and backed by Platform9’s Always-On Assurance™.
Integrated and extensible: Integrated platform includes Kubernetes and Kubernetes based platform extensions making it easier for enterprises to harness the ever-growing ecosystem of open-source infrastructure and cloud-native services.
Platform9 was founded by a team of VMware cloud pioneers and has over 20,000 nodes in production at some of the world’s largest enterprises, including Cloudera, EBSCO, Juniper Networks, and Rackspace. Platform9 is an inclusive, globally distributed company backed by prominent investors, committed to driving private cloud innovation and efficiency.
About the Role
We are seeking a highly motivated and experienced Senior DevOps Engineer to join our growing team. In this role, you will be responsible for the design, implementation, and maintenance of our cloud infrastructure, ensuring high availability, scalability, and security. You will be working closely with our engineering team to automate deployments, manage infrastructure as code, and troubleshoot production issues.
This is a unique opportunity to work on cutting-edge technologies and contribute to the success of a rapidly growing company. We offer a fast-paced and collaborative work environment where you will have the opportunity to learn and grow your skills.
Responsibilities
* Design, implement, and maintain our cloud infrastructure on AWS, including Kubernetes clusters, OpenStack environments, and supporting services.
* Automate infrastructure provisioning, configuration management, and application deployments using tools like Terraform.
* Implement and manage monitoring and logging solutions using Prometheus, Grafana, and other relevant tools.
* Develop and maintain internal tooling and scripts to improve operational efficiency.
* Troubleshoot and resolve production issues related to infrastructure, applications, and performance.
* Collaborate with engineering teams to implement and maintain CI/CD pipelines.
* Participate in on-call rotation to ensure 24/7 availability of critical services.
* Stay up-to-date on the latest technologies and trends in cloud computing and DevOps.
Qualifications
* 5+ years of experience in a DevOps or SRE role, with a strong understanding of cloud infrastructure and operations.
* Extensive experience with Kubernetes, including cluster administration, deployment strategies, and troubleshooting.
* Experience with OpenStack is highly desirable, but not required.
* Proficiency in infrastructure-as-code tools like Terraform or Ansible.
* Strong scripting skills in Python or similar languages.
* Strong programming skills in Golang or similar languages.
* Strong configuration management skills with Salt, Chef or similar languages.
* Experience with Observability tools like Prometheus, Cortex, Grafana, and Loki.
* Experience with CI/CD tools and best practices.
* Experience with administrating and debugging on Linux-based operating systems.
* Excellent problem-solving and troubleshooting skills.
* Strong communication and collaboration skills.
* Strong incident management experience.
Candidates need to have necessary authorisation to work in the US
Bonus Points
* Experience with EKS (Elastic Kubernetes Service).
* Experience with Cluster API, Cluster API Provider for AWS or Kamaji.
* Experience with managing on-premise infrastructure.
* Familiarity with OpenTelemetry and AI-powered observability tools.
* Experience working in a fast-paced startup environment.
Top Skills
What We Do
Platform9 is the open distributed cloud company, offering the power of the public cloud on infrastructure of customers’ choice—powered by Kubernetes and cloud-native technologies. Public clouds are walled gardens, and DIY is difficult and time-consuming. Platform9 offers a third option—an open and faster option—enabling a better way to go cloud-native. Platform9’s service powers 40K+ nodes across private, public and edge clouds. Innovative enterprises like Juniper, Kingfisher Plc, Mavenir, Redfin and Cloudera achieve 4x faster time-to-market, up to 90% reduction in operational costs, and 99.9% uptime. Platform9 is an inclusive, globally distributed company, backed by leading investors.