Scalable Systems is a Data, Analytics & Digital Transformation Company focused on vertical-specific innovative solutions
The Role
Scalable Systems is a USA-based Big Data, AI, and Digital Transformation Company focused on vertical innovative solutions. By providing next-generation technology solutions and services, we help organizations identify risks & opportunities, achieve operational excellence, and gain an innovative edge
Years of experience: +8 years
Required Technologies: Python, Amazon Web Services (AWS), DevOps, CI/CD, Shell.
Required Languages: English - Native, English - Advanced.
Project Duration: Long-term.
Responsibilities:
Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems.
- Architect and design highly scalable and available infrastructure solutions, integrating best practices in reliability engineering and automation.
- Collaborate with cross-functional teams (DevOps, Development, IT) to implement SRE principles throughout the software development life cycle.
- Establish and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services, monitoring and maintaining performance against defined targets.
- Implement and enhance observability, alerting, and incident response processes to proactively address issues and minimize downtime.
- Drive continuous improvement initiatives, identifying bottlenecks and optimizing within the infrastructure and application stack.
- Develop and maintain documentation related to system architecture, configuration, and procedures.
- Stay current with industry trends, recommending and adopting new tools and practices to enhance system reliability.
Requirements:
- Strong background in designing and implementing highly available and scalable infrastructure.
- Proficiency in scripting and automation using Python or Shell
- Experience with container orchestration platforms, serverless architectures, CI/CD pipelines, and IaC implementations. (Ansible & Terraform)
- Experience with Observability tools (preferred: Datadog, CloudWatch).
- In-depth knowledge of cloud computing platforms (preferred: AWS).
- Solid understanding of SRE/DevOps principles and practices.
- Excellent problem-solving skills with the ability to troubleshoot complex issues in production environments.
- Strong communication and leadership skills, fostering effective collaboration with cross-functional teams.
- Relevant certifications in SRE, DevOps, Cloud, etc., are a plus.
- Minimum 10 years of work experience in DevOps/SRE, including leadership roles.
*Scalable Systems is an Equal Opportunity Employer*
The Company