Key responsibilities
- Platform reliability & operations:
- Ensure the availability, resilience, and performance of the platform and supporting services.
- Own and improve incident management, including troubleshooting, escalation handling, and follow-ups aligned to SLAs.
- Participate in an on-call rotation, supporting production systems and driving reliability improvements from real incidents.
- Infrastructure engineering (Linux / Cloud / Kubernetes):
- Design, deploy, configure, and manage Linux-based system architecture across environments.
- Build and support platform implementations using AWS and other cloud technologies (compute-centric services and related infrastructure).
- Design and implement large and complex technology projects, from initial design through production rollout and operational handover.
- Support and maintain Kubernetes-based workloads and platform components.
- Automation & Infrastructure as Code:
- Build tooling and solutions to automate recurring operational tasks.
- Use Infrastructure as Code (IaC) to standardize and scale: Terraform for provisioning , Ansible for configuration management and automation
- Improve reliability by reducing manual steps and enabling repeatable deployments.
- CI/CD & developer enablement:
- Manage and maintain CI/CD pipelines across 20+ repositories spanning multiple technology stacks.
- Partner with Engineering teams to improve build/release consistency, pipeline reliability, and deployment safety.
- Observability & operational readiness:
- Implement and enhance monitoring, logging, and alerting, using tools such as: Prometheus, Grafana, Zabbix, Splunk, PagerDuty (or equivalent incident alerting/response tooling).
- Use metrics and incident learnings to reduce noise, improve signal, and shorten time-to-detect/time-to-recover.
- Documentation & standards:
- Produce clear, formal documentation including: Configuration standards, Troubleshooting runbooks, Infrastructure and architecture design documentation.
- Contribute to internal standards that improve consistency, security, and operational maturity.
Required skills & experience
- 5+ years of hands-on experience in Linux systems administration / engineering in production environments.
- Strong working knowledge of the following (or equivalents): Linux, Kubernetes, GitLab, Terraform, Ansible.
- Experience working in Agile (Scrum) teams.
- Experience with AWS (compute-focused services) and/or Google Cloud Platform.
- Proven experience with distributed systems design, maintenance, and troubleshooting.
- Strong scripting/coding ability in at least one of: Python, Golang, bash.
- Experience with observability and incident response tooling such as: Zabbix, Splunk, Prometheus, Grafana, PagerDuty.
- Strong communication skills in English, with the ability to work effectively with customers, vendors, partners, and internal teams across levels.
- Working knowledge (expected familiarity) with datastores and messaging systems such as: PostgreSQL, MongoDB, RabbitMQ. Also Web/application infrastructure components such as: Apache, Nginx
- Demonstrated ability to learn quickly, work independently, make good decisions, and collaborate as a team player in fast-changing environments.
- Strong AI-driven mindset and curiosity about emerging AI technologies.
- Hands-on experience using AI tools (e.g., LLMs, automation frameworks, AI-assisted development tools) to enhance productivity or system performance.
Nice to have
- Experience operating highly available, high-volume web services.
- Strong initiative and self-starter attitude with minimal supervision.
- Demonstrated success reducing operational toil through automation and better tooling.
- Experience improving SLOs/SLIs, error budgets, or formal reliability practices (if applicable to your background).
Skills Required
- 5+ years of hands-on experience in Linux systems administration/engineering in production environments.
- Strong working knowledge of Linux, Kubernetes, GitLab, Terraform, Ansible.
- Experience working in Agile (Scrum) teams.
- Experience with AWS and/or Google Cloud Platform.
- Proven experience with distributed systems design, maintenance, and troubleshooting.
- Strong scripting/coding ability in at least one of: Python, Golang, bash.
- Experience with observability and incident response tooling.
- Strong communication skills in English.
- Demonstrated ability to learn quickly, work independently and collaborate effectively.
What We Do
From London to Singapore and from San Francisco to São Paulo, we help businesses enter new markets, explore new industries, and reach new milestones. We are driven by a deep-seated determination to be the best possible partner for our customers – giving you the support you need to capitalize on a world that’s changing at breakneck speed. Our mission is to provide innovators with a convenient and simple financial interface that enables payments to flow freely and invisibly across borders. We offer a wide range of services, including payment gateway, card acquiring, business accounts, card issuing, alternative payment methods, and more. That’s the reason why we are called Unlimit: we provide unlimited growth opportunities for our customers, freeing them from the payment constraints. Unlimit - Borderless Payments








