Site Reliability Engineer / Team Lead

Posted 2 Days Ago
Be an Early Applicant
Philippines
Senior level
Artificial Intelligence • Conversational AI
The Role
As a Site Reliability Engineer Team Lead, you will manage a team of engineers, ensuring cloud platform reliability, overseeing incident resolution, change management, and team training. Responsibilities include optimizing processes, handling escalated incidents, and developing team skills while ensuring security and high availability of systems.
Summary Generated by Built In

Description

We are looking for an experienced Site Reliability Engineer / Team Lead to manage and coordinate a team of reliability engineers. Your primary responsibility will be ensuring the high availability, security, and reliability of our cloud platform. You will oversee incident resolution, change management, and team coordination, as well as training and developing team members.

Requirements

Change, Incident and Problem Management:

  • Oversee the resolution of complex incidents.
  • Coordinate with SREs to ensure timely incident resolution.
  • Optimize technical processes to improve change quality.
  • Track and report on incident resolution metrics.
  • Ensure that SLOs & SLIs are defined and maintained.

Customer Escalation Handling:

  • Handle escalated incidents and ensure timely resolution.
  • Ensure customer satisfaction with the resolution process.

Team Coordination:

  • Assign tasks and tickets to team members.
  • Ensure proper documentation of incidents and resolutions.
  • Provide guidance and support to SREs.
  • Advise Platform Team Members plus other Stakeholders with Omilia Best Practices.

Quality Assurance:

  • Review and ensure the quality of incident responses and solutions.
  • Conduct regular audits of incident reports and resolutions.

Training and Development:

  • Develop training programs for new and existing team members.
  • Conduct knowledge-sharing sessions.

Platform Security, High Availability and Reliability:

  • Drive the design and development of the SRE infrastructure(Dev, Staging, PreProd Environments included as well) and maintenance tools for the full lifecycle of system development.
  • Disaster Recovery, Backup Strategy for data integrity and business continuity.
  • Provisioning, configuration, and scaling for efficiency and consistency.
  • Continuous Improvements of Omilia Cloud Platform.

Experience Required:

  • 5-7 years of experience in SRE or related roles.
  • Experience in large-scale system architecture and automation.
  • Experience in a leadership role is an advantage.
  • Bachelor’s degree in Computer Science, Engineering, or related field.

Must-have:

  • AWS
  • Azure (a plus)
  • Kubernetes
  • Docker
  • Terraform
  • Ansible

Nice-to-have:

  • Python
  • Git
  • Shell scripting
  • Linux
  • SQL
Benefits
  • Fixed compensation;
  • Long-term employment with the working days vacation;
  • Development in professional growth (courses, training, etc);
  • Being part of successful cutting-edge technology products that are making a global impact in the service industry;
  • Proficient and fun-to-work-with colleagues;
  • Apple gear.

Omilia is proud to be an equal opportunity employer and is dedicated to fostering a diverse and inclusive workplace. We believe that embracing diversity in all its forms enriches our workplace and drives our collective success. We are committed to creating an environment where everyone feels welcomed, valued, and empowered to contribute their unique perspectives without regard to factors such as race, color, religion, gender, gender identity or expression, sexual orientation, national origin, heredity, disability, age, or veteran status, all eligible candidates will be given consideration for employment.

Top Skills

Ansible
AWS
Azure
Docker
Git
Kubernetes
Python
Shell
Terraform
The Company
354 Employees
On-site Workplace
Year Founded: 2002

What We Do

At Omilia we are engaged to provide the most human-like human-to-machine communication experiences and technologies in order to help large enterprises improve the customer care experience.

Starting out of a small garage, Omilia is now serving 1 billion conversations, in 30 languages, across 17 countries.

With one of the fastest growing NLU solutions in the market, Omilia has been recognized as a Leader in the 2022 Gartner® Magic Quadrant™ for Enterprise Conversational AI Platforms, as well as in the IDC Marketscape for Worldwide Conversational AI Software Platforms for Customer Service 2021.

Our technology allows the enterprise to take advantage of Open-Question customer care with end-to-end Self-Service to greatly improve customer experience and significantly decrease operational costs.

In 2016 Omilia expanded to USA and Canada, counting 33 full production deployments worldwide and case studies with proven KPIs and ROIs across various industries.

Similar Jobs

Freelancer.com Logo Freelancer.com

Site Reliability Engineer

Information Technology • Software
Taguig, Southern Manila District, National Capital Region, PHL
77339 Employees
Hybrid
Manila, First District NCR, National Capital Region, PHL
289097 Employees
Hybrid
Manila, First District NCR, National Capital Region, PHL
289097 Employees
Hybrid
Manila, First District NCR, National Capital Region, PHL
289097 Employees

Similar Companies Hiring

Eastwall Thumbnail
Software • Information Technology • Consulting • Cloud • Big Data Analytics • Artificial Intelligence • App development
Denver, CO
20 Employees
Smartcat Thumbnail
Natural Language Processing • Machine Learning • Conversational AI • Artificial Intelligence
Boston, Massachusetts
242 Employees
RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account