We are seeking a highly skilled Site Reliability Engineer (SRE) to join our Data & Algorithm team, where you'll be pivotal in building and maintaining resilient, scalable, and high-performing systems. You will act as the bridge between development and operations—championing reliability, reducing operational toil, and driving excellence through observability, automation, and deep system-level expertise.
This is a hands-on, high-impact role for someone who thrives in a fast-paced, multitasking environment and has a strong foundation in infrastructure, automation, and modern cloud-native tools.
Key Responsibilities:
- Design and implement resilient and scalable system architectures to ensure high availability.
- Drive the adoption and monitoring of Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all critical services.
- Develop automation tools and scripts (Python & Bash) to reduce manual interventions and operational toil.
- Troubleshoot and resolve infrastructure and application issues, especially around Kubernetes, storage modules, and containerization.
- Collaborate closely with engineering, data, and DevOps teams to implement best practices for system reliability and incident management.
- Conduct root cause analysis and post-incident reviews, implementing improvements to prevent recurrence.
- Use tools like Grafana to monitor system health, derive insights, and tune performance curves effectively.
- Manage and maintain documentation for all systems, processes, and incident responses.
- Support and troubleshoot key-value and NoSQL databases, as well as Kafka or BMQ (forked Kafka) for data streaming.
- Handle multitasking under pressure, prioritize workloads, and maintain effective communication during high-stress scenarios.
- Translate and convert data formats (CSV, JSON, etc.) using scripting to support analytics and system configurations.
- Strong programming/scripting skills in Python and Bash.
- Deep understanding of Kubernetes internals, containerization, and troubleshooting at the infrastructure level.
- Experience in cloud platforms like AWS, GCP, or Azure.
- Solid background in Linux system administration and networking fundamentals.
- Proficient with tools like Git and VS Code.
- Hands-on experience with monitoring tools, especially Grafana.
- Familiarity with NoSQL databases and data streaming platforms (Kafka, BMQ).
- Strong grasp of SRE principles: SLOs, SLIs, SLA management, toil reduction, incident handling.
- Ability to multitask and thrive in high-pressure environments.
Similar Jobs
What We Do
Unison Consulting was launched in Singapore on September 2012, the hub of the financial industry, with innovative visions in the technocratic arena. We are a boutique next-generation Technology Company with strong business-interests in Liquidity risk, Market Risk, Credit Risk and Regulatory Compliance.
Unison provides technology consulting and services to implement Risk Management and Risk Analytics System for Financial Institutions. Our services suite comprises of Techno-Functional consulting, systems integration, Business Intelligence, information management, and custom development of IT solutions, plus project management expertise for financial institutions.
We have expertise in latest cutting edge technology to achieve better total cost of ownership. Through our qualified professionals, we assist you drive your unique risk management strategies, whether that means efficient monitoring, improving risk appetite of the financial institutions, complying with regulations, or capturing growth opportunities through innovation, this is what maximizes your decision taking potential. At Unison Consulting, we view clients as partners, and our success is only measured by the success of our partners. So we put it all on the table in order to exceed expectations.
Our staff consists of young, energetic and innovative consultants who are never afraid to challenge the conventions and push the boundaries in an effort to help our clients. For every project, no matter how large or how small, we strive to not only meet your needs, but deliver a showcase in your field








