As a Site Reliability Engineer, you will work as an integral member of product teams, helping to build, deploy, and monitor cloud services reliably. You will contribute to complex software development projects to maintain essential, revenue-critical services. Additionally, you will actively develop code and build frameworks to monitor services deployed in production, driving reliability and performance on a large scale. You will be responsible for ensuring the reliability, availability, and performance of our Elasticsearch infrastructure.
CUSTOMER
ConnectWise is the world's leading software company dedicated to the success of IT solution providers. Its vision is to power a thriving IT ecosystem that transforms what’s possible for SMBs and do this by empowering IT solution providers with unmatched software, services, and community to help them achieve their most ambitious vision of success. These tools being developed are used by IT service providers to automate their activities for small and medium sized businesses (SMBs), such as backup and restore, providing security, and performing administrative tasks on Microsoft 365 tenants.
- Build systems and infrastructure for monitoring complex, large-scale distributed systems
- Identify stability and performance issues, and collaborate with developers to triage critical issues in production systems
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services
- Devise ways to actively monitor system throughput, capacity, and reliability
- Debug complex systems and evolve a running environment without causing downtime
- Engage in service capacity planning and demand forecasting, as well as software performance analysis and system tuning
- Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization
- Monitor and troubleshoot Elasticsearch performance issues and outages
- Fundamental knowledge of technologies across a broad range of disciplines, including virtualization, storage, networking, server, and security
- Bachelor’s degree in computer science or equivalent work experience as a System Administrator with programming skills
- Understanding of systems and application design, including the operational trade-offs of various designs
- Experience with monitoring and logging solutions such as Prometheus, Grafana, and ELK stack
- Proficiency in scripting languages such as Python
- Experience with infrastructure-as-code tools, such as Terraform or CloudFormation
- Strong understanding of Linux system administration and networking concepts
- Demonstrable knowledge of Unix, TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures
- Experience in analyzing logs and troubleshooting large-scale, distributed systems
WOULD BE A PLUS
- Experience with instrumenting and monitoring production systems using tools such as ELK stack, Zabbix, Nagios, Statsd/Graphite, APM, etc
- Experience with Amazon AWS Infrastructure (including EC2, S3, VPC, Security Groups, RDS) and related services is desirable
- Practical knowledge of Docker, Vagrant, and configuration management tools like Ansible, Chef, or Puppet
- Experience with one or more general-purpose programming or scripting languages, including but not limited to Python, Bash, Perl, or Go
PERSONAL PROFILE
- Excellent troubleshooting and problem-solving skills
- Ability to work independently and collaboratively in a fast-paced environment
- Strong communication and interpersonal skills
- Excellent organizational, time management, and communication skills
Top Skills
What We Do
Sigma Software Group, an award-winning and trusted IT partner, has been serving customers for over 21 years, providing comprehensive IT solutions to various businesses, ranging from startups to established software product houses. As one of Europe's substantial IT consultancies, it brings together a dedicated workforce of over 2,100 professionals in 40 offices across 19 countries. With a diverse client base, including more than 300 enterprises, including Fortune 500 stalwarts, Sigma Software Group is a preferred choice for developing solutions that help businesses create cutting-edge products while meeting their unique needs.
Sigma Software Group operates as a dynamic ecosystem of tech companies, offering 25 ready-to-implement innovative products and 40+ value-added services. Furthermore, Sigma Software Group is committed to fostering innovation through initiatives such as the Sigma Software Labs business incubator, Sigma Software University, the SID Venture Partners VC Fund, UA Tech Network, Techosystem, the European Business Association, and other collaborative efforts.
Since 2015, Sigma Software Group has consistently earned recognition on the IAOP's prestigious World's Top 100 Outsourcing list. The company's accomplishments have also been acknowledged by prominent global media outlets such as Forbes, CNBC, The Times, and Reuters








