Senior Site Reliability Engineer

Posted 9 Days Ago
Be an Early Applicant
6 Locations
Remote
Senior level
Artificial Intelligence • Conversational AI
The Role
Manage production clusters, develop observability solutions, ensure platform reliability, respond to incidents, and improve operational processes in collaboration with other teams.
Summary Generated by Built In

We are looking for a Senior Site Reliability Engineer with Cloud platform experience. This individual will be part of a team responsible for operating and maintaining production clusters and developing our observability solutions; they will collaborate with team members to develop automation strategies, monitoring & alerting, and ensuring overall platform reliability. Your goal will be to become an integral part of the team, making every challenge of the platform – your own challenge, and solving them accordingly.

Responsibilities

  • Ensure platform reliability and availability across production and pre-production environments through proactive monitoring, alerting, and automation.
  • First response for incidents, contribute to problem management and root cause analysis.
  • Supporting the development team's effort towards reliability, creating a solid reliability culture within the development lifecycle.
  • Develop troubleshooting documentation for production support resources.
  • Collaborate with Engineering teams to develop optimised and productive runbooks, operational documentation and automation of operational tasks.
  • Collaborate with development and cloud engineering teams to embed reliability and performance into the software delivery lifecycle.
  • Design, implement, and evolve observability solutions (metrics, logs, traces, dashboards) using tools such as Prometheus, Grafana, and ELK.
  • Participate in on-call rotations and continuously improve alert quality and response processes.
  • Champion a culture of reliability, performance, and continuous improvement across teams.

Requirements
  • Bachelor's Degree or MS in Engineering or equivalent.
  • Experience in operating at least one container orchestration cluster (Kubernetes, Docker Swarm).
  • Experience developing or maintaining software for production services at scale.
  • Experience with ELK.
  • Experience with AWS.
  • Experience with Grafana/Prometheus stack.
  • Strong scripting skills (Bash, Python or Go).
  • Excellent communication skills.
  • Thinking out of the box and anticipating challenges. It is imperative we are not simply reactive; we must expect challenges and question technologies, procedures and thinking already in place. You will be expected to constantly review and challenge at all levels.
  • Versatility. We work with agile/lean methods. We'd much rather iterate and learn than assume we know all the answers.
  • Being a team player. You don't (always) work in isolation and are excited by the thought of using your team whilst involving product, experience design, engineering, and more in the process.

Will be considered as a plus:

  •     Telephony knowledge (SIP, VoIP);
  •     Experience in Linux Administration (RedHat, CentOS, AL);
  •     Working knowledge in Configuration Management tools (Terraform, Ansible);
  •     Experience with TCP/IP and general networking concepts;
  •     RDBMS knowledge (MySQL, Postgres);
  •     NoSQL knowledge (Redis).

Benefits
  • Fixed compensation;
  • Long-term employment with the working days vacation;
  • Development in professional growth (courses, training, etc);
  • Being part of successful cutting-edge technology products that are making a global impact in the service industry;
  • Proficient and fun-to-work-with colleagues;
  • Apple gear.

Omilia is proud to be an equal opportunity employer and is dedicated to fostering a diverse and inclusive workplace. We believe that embracing diversity in all its forms enriches our workplace and drives our collective success. We are committed to creating an environment where everyone feels welcomed, valued, and empowered to contribute their unique perspectives without regard to factors such as race, color, religion, gender, gender identity or expression, sexual orientation, national origin, heredity, disability, age, or veteran status, all eligible candidates will be given consideration for employment.

Top Skills

Ansible
AWS
Bash
Docker
Elk
Go
Grafana
Kubernetes
MySQL
Postgres
Prometheus
Python
Redis
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
354 Employees
Year Founded: 2002

What We Do

At Omilia we are engaged to provide the most human-like human-to-machine communication experiences and technologies in order to help large enterprises improve the customer care experience.

Starting out of a small garage, Omilia is now serving 1 billion conversations, in 30 languages, across 17 countries.

With one of the fastest growing NLU solutions in the market, Omilia has been recognized as a Leader in the 2022 Gartner® Magic Quadrant™ for Enterprise Conversational AI Platforms, as well as in the IDC Marketscape for Worldwide Conversational AI Software Platforms for Customer Service 2021.

Our technology allows the enterprise to take advantage of Open-Question customer care with end-to-end Self-Service to greatly improve customer experience and significantly decrease operational costs.

In 2016 Omilia expanded to USA and Canada, counting 33 full production deployments worldwide and case studies with proven KPIs and ROIs across various industries.

Similar Jobs

Affirm Logo Affirm

Senior Site Reliability Engineer

Big Data • Fintech • Mobile • Payments • Financial Services
Easy Apply
Remote
Spain
2200 Employees
80K-110K Annually

Circle Logo Circle

Senior Site Reliability Engineer

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
Remote
Spain
1050 Employees
Remote
35 Locations
179 Employees

MoonPay Logo MoonPay

Senior Site Reliability Engineer

Blockchain • Fintech • Payments • Cryptocurrency • Web3
In-Office or Remote
5 Locations
244 Employees

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account