Site Reliability Engineer

Reposted 16 Days Ago
Be an Early Applicant
3 Locations
Hybrid
Mid level
Fintech • Financial Services
The Role
Responsible for ensuring the reliability and performance of infrastructure and applications through monitoring, incident management, automation, and collaboration with engineering teams.
Summary Generated by Built In
The role is remote with in-office presence in Brno 1–2 times per month.

The Company

Capital Markets Gateway LLC (CMG) is a capital markets-focused fintech transforming global equity capital markets (ECM) through data, technology, and connectivity. As the preferred source for ECM analytics and the first network connecting the buy-side and sell-side for ECM workflows, we are committed to reshaping how capital markets operate. Founded in 2017 by a team of ECM practitioners, CMG has completed three successful fundraising rounds and is backed by a group of the world’s most prestigious financial institutions. The CMG platform is currently relied upon by nearly 150 buy-side firms representing $40 trillion in AUM and 22 global investment banks. For more information, please visit www.cmgx.io.  

The Role 
 
CMG is looking for a Site Reliability Engineer (SRE) with a strong focus on monitoring, observability, and alerting to ensure the reliability, performance, and scalability of our infrastructure and applications. You will be responsible for designing, implementing, and maintaining monitoring solutions to provide visibility into system health and performance, proactively detect anomalies, and reduce incident response time.  

Our Engineering Team 

The CMG engineering team consists of domain experts who work collaboratively within a culture of cross-domain knowledge sharing. We value engineers who are passionate about modern technologies and best practices. 
Our engineers are encouraged to challenge the status quo and are constantly seeking improvement and efficiency in our code-base and platform. CMG engineers are empowered to explore solutions using bleeding edge technologies such as AI and bring recommendations to the table. We are in a period of making impactful engineering decisions. 
As part of our process, we believe in taking the time for research and prototyping - this is critical in making the right decisions. Given the experience of our team, we have naturally adopted best practices from local development, through code review and into production rollouts. Besides the standard pull requests, test automation, code coverage tracking, containerization, and one-click deployments we are constantly reviewing these foundational components to develop new best practices. 

Responsibilities 

Monitoring & Observability

  • Design, implement, and maintain monitoring and observability solutions using tools like Prometheus, Grafana Stack (Loki/Grafana/Tempo/Alert Manager), Datadog, and OpenTelemetry.  
  • Define and implement SLOs, SLIs, and error budgets to measure system reliability.  
  • Develop and optimize dashboards, alerts, and reports for system performance and business metrics. 

Alerting & Incident Management

  • Design actionable alerting strategies to minimize noise and improve MTTR.  
  • Integrate alerting systems with Jira. 
  • Establish and refine runbooks for on-call teams to handle alerts efficiently. 
  • Empower teams to ensure observability coverage and incident response practices.  

Performance Optimization

  • Analyze system performance metrics, identify bottlenecks, and implement optimizations to improve system efficiency,scalability, and cost-effectiveness.  
  • Help conduct load testing and capacity planning to ensure systems can handle peak traffic loads.  

Automation and Tooling

  • Identify opportunities for automation and develop tools to streamline operational processes, such asfail-over, configuration management, and monitoring.  
  • Implement monitoring and alerting systems within automations to detect and resolve issues proactively.  

Collaboration and Communication

  • Collaborate closely with cross-functional teams, including software engineers, operations, and infrastructure teams, to understand system requirements, providetechnical guidance, and drive solutions.  
  • Communicate effectively to stakeholders about system changes, incidents, and improvements.  
  • Foment and spread SRE principles and practices across company 

Qualifications

  • Proven experience as a Site Reliability Engineer or similar role.  
  • Proficiency in logging, metrics, and tracing frameworks (DataDog, Loki, Prometheus, OpenTelemetry).  
  • Experience with cloud platforms (Azure preferred) and infrastructure-as-code tools (e.g., Terraform).  
  • Strong programming and scripting skills (Python, Bash).  
  • Proficiency in containerization technologies and orchestration tools (Docker, Kubernetes).  
  • Understandingof Linux-based systems, networking, and security principles related to containerized applications.  
  • Strong problem-solving and troubleshooting skills, with a passion for identifying and resolving complex technical issues.  
  • Excellent communication and collaboration abilities.  
  • Ability to thrive in a fast-paced, constantly evolving environment.  
  • Experience with PostgreSQL monitoring and optimization (Optional/Nice to have)

Our Tech Stack

  • Azure as an infrastructure provider. We are reviewing secondary cloud options.  
  • Docker + Kubernetes for microservice orchestration using Istio service mesh  
  • PostgreSQL for relational db, ElasticSearch for indexing, Redis for caching  
  • DataDog, Grafana and OpenTelemetry for observability  
  • GitHub for our Version Control and CI (with our own runners)  
  • CD: Harness and FluxCD  
  • Terraform and Terragrunt as IaaC  
  • Python and bash for scripting infrastructure  
  • React - We’re all in on React – we maintain multiple single-page React apps  
  • TypeScript – 99% of our codebase is TypeScript  
  • Latest .NET version for our backend services  
  • GraphQL - Our standard for API communication is GraphQL served by our DotNet Back-End 

Our Values

  • We innovate with purpose  
  • We focus on outcomes vs. output  
  • We believe diverse and inclusive teams fuel innovation  
  • We are humble yet candid  
  • We do right by the customer  

What We Offer

  • Unlimited vacation   
  • Meal vouchers paid in full by the company   
  • Multisport card contribution   
  • Pension contributions   
  • Language courses   
  • Centrally located office in the heart of Brno   
  • Bi-weekly team lunches provided by the company   
  • Tech courses and conferences   
  • Top of the line MacBook   
  • Company team building events   
  • Flexible working hours and the possibility to work from home  

CMG embraces our ongoing commitment to building a culture reflecting the people, perspectives, and passions it represents. We will accept nothing less than equity, inclusion, and belonging for all. With the only constant in life being change, we will always listen, learn, and improve for the betterment of our teams, customers, and communities. CMG is proud to be an Equal Opportunity Employer. 

Top Skills

.Net
Azure
Bash
Datadog
Docker
Grafana
GraphQL
Kubernetes
Opentelemetry
Postgres
Prometheus
Python
React
Terraform
Typescript
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Chicago, IL
91 Employees
Year Founded: 2015

What We Do

Capital Markets Gateway (CMG) is a financial technology firm that is modernizing the equity capital markets (ECM). CMG connects investors and underwriters via a neutral platform that delivers integrated ECM data and analytics, unrivaled transparency, and workflow efficiencies. Providing a digital system of record for firm-wide deal activity, CMG helps clients make more timely, better-informed decisions. Launched in 2017 by a team of ECM practitioners, the CMG platform is currently relied upon by nearly 100 buy side firms representing $12 trillion in AUM and 15 investment banks.

Similar Jobs

In-Office
Prague, CZE
2115 Employees

Global Payments Inc. Logo Global Payments Inc.

Site Reliability Engineer

eCommerce • Fintech • Payments
In-Office
Prague, CZE
24000 Employees

Nord Security Logo Nord Security

Site Reliability Engineer

Software • Cybersecurity
In-Office or Remote
Prague, CZE
1465 Employees

Proton Logo Proton

Site Reliability Engineer

Information Technology • Software • Cybersecurity
Easy Apply
In-Office
3 Locations
657 Employees

Similar Companies Hiring

Rain Thumbnail
Web3 • Payments • Infrastructure as a Service (IaaS) • Fintech • Financial Services • Cryptocurrency • Blockchain
New York, NY
70 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account