Site Reliability Engineer

Posted 8 Hours Ago
Be an Early Applicant
Lisboa
Mid level
Information Technology • Consulting
The Role
As a Site Reliability Engineer, you will enhance system stability and scalability, manage application performance, and address service issues. Responsibilities include building robust monitoring systems, capacity tests, and conducting audits, while advocating for best practices in release engineering and collaborating throughout the software development lifecycle.
Summary Generated by Built In

Company Description

Alter Solutions Portugal is an IT Consultancy Company, promoter of Digital Transformation, part of the Alter Solutions Group, created in 2006, in Paris.

In 2022, Alter Solutions joined the act digital group, constituting a global community of talent in Technology, with presence in thirteen countries: Germany, Belgium, Brazil, Canada, United States of America, Mexico, Morocco, Spain, France, Luxembourg, Poland, Portugal and Serbia. Also in 2023, we were certified as a Great Place to Work©.

In Portugal, we partner with over 120 clients and a team of over 500 people, working in projects for industries as diverse as banking, insurance, transportation, aviation, energy, and telecom.

Headquarters of the Nearshore IT center, Alter Solutions Portugal has a dedicated team of around 30 specialized professionals, integrated into projects with several internationally renowned clients.

Job Description

We are looking for a Site Reliability Engineer responsible to improve High Availability and Resilience, Better load management with L4 & L7 load balancers, build a dynamic and scalable infra to accommodate the high-volume business transactions, setup a Monitoring system to log the performance and capacity levels to ensure high availability of applications with minimal downtime.

Main Responsibilities:

  • Design, develop and implement systems software/scripts that improve the stability, scalability, availability, and latency of the Risk system applications. 
  • Solve problems occurring with our highly available production systems and build solutions & automation using combination of scripting & tooling to prevent them from happening again.
  • Defines and drives adoption of a best-in-class monitoring framework to accomplish end-to-end flow monitoring and effective alerting.
  • Monitoring system performance and capacity levels to ensure high availability of applications with minimal downtime.
  • Build and run capacity tests to manage the growth of systems.
  • Investigating any service disruptions or other service issues to identify their causes.
  • Performing regular audits of servers to check for signs of degradation or malfunction which involves infra hygiene and end of life.
  • Conducting post-mortem examinations of failed systems to identify and address root cause. 
  • Accountable for maintenance and improvement of IT continuity strategies
  • Be an advocate of release engineering best practices such as ZERO Downtime, Canary release, Incremental rollouts etc.
  • Works with Development, DevOps and IT operational team throughout the Software Development Life Cycle to ensure sustainable software releases.

Qualifications

  • 4-6 years of experience in IT Operations/DevOps/Application support/SRE team 
  • Proven foundation in Linux administration and troubleshooting.
  • Solid knowledge of APM Tools i.e. Dynatrace / AppDynamics
  • Good understanding of Log aggregators i.e. Splunk/ELK .
  • Solid work experience with load balancers (L4 & L7) preferably apache http(d).
  • Good understanding of TCP/IP and HTTP protocols and Networking, DNS/Firewalls, F5 Load balancing.
  • Experience in Apache Tomcat servers and JVM performance troubleshooting.
  • Knowledge of Ansible
  • Knowledge of Jenkin, Ansible, Docker, Kubernetes and Terraform.
  • Knowledge in OpenStack, Networking, Security or Storage is desirable.
  • Solid experience in at least one scripting language. Python preferred.
  • Experience with building, operating, and maintaining scalable distributed systems, and with operations automation

Soft skills:

  • English (Fluent) – Mandatory

Additional Information

Hybrid working model in Lisbon.

Top Skills

Linux
Python
The Company
616 Employees
Remote Workplace
Year Founded: 2006

What We Do

The Alter Solutions Group is an IT Consultancy group, promoter of Digital Transformation, created in 2006, in Paris. In 2022, Alter Solutions joined the act digital group, constituting a global community of talent in Technology, with presence in twelve countries: Germany, Belgium, Brazil, United States of America, Canada, Morocco, Spain, France, Luxembourg, Poland, Portugal and Serbia. In 2023, we were recertified as a Great Place to Work®. Know more about Life at Alter: https://www.linkedin.com/company/alter-solutions-group/life/altersolutionsgroup

Similar Jobs

Lisboa, PRT
9850 Employees
Lisboa, PRT
9850 Employees

Fyld Logo Fyld

Site Reliability Engineering (SRE)

Information Technology • Mobile • Software • Business Intelligence • Consulting
Lisboa, PRT
130 Employees

Similar Companies Hiring

Silverfort Thumbnail
Security • Sales • Information Technology • Cybersecurity • Automation
GB
357 Employees
Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account