Site Reliability Engineer

Reposted 4 Days Ago
Be an Early Applicant
Hiring Remotely in România
Remote
Mid level
Defense • Industrial • Manufacturing
The Role
Design, implement and maintain GCP infrastructure and GKE workloads using Terraform and Kubernetes. Build observability with Prometheus and Grafana, define SLIs/SLOs, handle on-call incident response, provide production support, automate operations in Python, and mentor the team to adopt SRE practices.
Summary Generated by Built In

This job is remote for people located strictly in Romania.

We are looking for a mid‑level Site Reliability Engineer focused on GCP to help us transition from a traditional IT Support model to a modern SRE operating model. You will design and implement our GCP‑based platform (GKE, Terraform, Prometheus, Grafana, GCP Operations Suite) and act as a hands‑on guide for our existing team as we adopt SRE ways of working, with a strong focus on automation and tooling in Python. 

Responsibilities:

  • Maintain GCP infrastructure using Terraform, including GKE clusters, Compute Engine, Cloud Storage, Cloud SQL or other managed databases, VPC networking, load balancers, and Cloud DNS. 

  • Manage and operate Kubernetes workloads on GKE: deployments, services, ingresses, autoscaling, configuration, secrets and cluster upgrades.  

  • Participate in on‑call rotations for GCP services and lead or assist in incident response.

  • Design and maintain observability for GKE and GCP workloads using Prometheus for metrics collection and Grafana for dashboards and visualization.

  • Provide advanced production support for business‑critical applications (web and backend services), investigating incidents, performance issues and functional degradations together with development teams. 

  • Use metrics, logs, traces and error reports to triage and debug application issues across multiple services and components. 

  • Maintain and improve runbooks, playbooks and knowledge base articles so recurring production issues can be resolved quickly and consistently. 

  • Analyze incident and ticket trends to propose reliability improvements, automation and changes to application configuration or architecture. 

  • Define and implement SLIs and SLOs based on Prometheus metrics and GCP Operations Suite (Cloud Monitoring/Logging) and configure alerts (in Prometheus Alertmanager, Grafana, or Cloud Monitoring) that focus on real customer impact.

Qualifications
  • 2–5 years experience in SRE, DevOps or platform engineering operating production systems, with strong exposure to GCP. 

  • Solid experience with GKE and containerized applications (deployment strategies, scaling, troubleshooting) in production. 

  • Strong Infrastructure‑as‑Code skills with Terraform for provisioning GCP resources (projects, networks, IAM, GKE, databases, etc.). 

  • Experience with Prometheus and Grafana, including:
    - setting up metrics collection (exporters, scraping configs) for applications and infrastructure;
    - building and maintaining Grafana dashboards for services, platforms, and SLOs;
    - configuring alerts (Alertmanager/ Grafana/ Cloud Monitoring) with appropriate thresholds and routing. 

  • Good knowledge of Linux and Docker, including debugging performance, networking and security issues. 

  • Familiarity with GCP Operations Suite (Cloud Monitoring/ Logging) and how to combine it with Prometheus/ Grafana for a complete observability story. 

  • Understanding of GCP security basics: IAM, service accounts, least‑privilege, network security and Secret Manager. 

  • Experience supporting production applications (web or backend services), including debugging issues across logs, metrics, traces and application‑level errors. 

  • Mentoring and coaching mindset: enjoys guiding colleagues through new tools and practices.
     

Schedule: 16:00-00:50 Romania time

Cadex Solutions Corporation is a holding company formed by Trivest Partners LP to build the premier provider of commercial order-to-cash management solutions. With a history spanning nearly 100 years, Cadex is uniquely positioned with in-depth experience that builds relationships alongside results. Our team of industry experts brings innovation and data insight, improves your processes with hands-on help, and provides custom solutions based on specific needs. Cadex has approximately 800 employees serving over 1,000 clients across all industries from locations including the United States, Colombia, Brazil, Romania, Italy, India, Singapore, and South Africa.

Since 2019, Cadex has been putting together a strong portfolio of ARM companies, including:

  • A.G. Adjustments, formed in 1974 and headquartered in Melville, NY
  • D&S Global Solutions, formed in 1997 and fully remote
  • ABC-Amega, formed in 1929 and headquartered in Buffalo, NY
  • TranSubro, formed in 2012 and headquartered in Oceanside, NY
  • DAL, formed in 1974 and headquartered in Clifton Heights, PA
  • RCC. formed in 1970 and headquartered in Maple Grove, MN
  • IRG, formed in 1997 and headquartered in Marlborough, MA

Since our inception in 1997, D&S has been driving innovation in accounts receivable solutions, constantly shaping and expanding beyond anything previously conceived to meet clients’ needs.

Our one of a kind D&S Off-Site Network team delivers the highest level of expertise in an array of languages with unmatched flexibility, clarity, and courtesy. And our experience spans over years, countries and companies of all scopes.

Our solutions are completely customizable, extend beyond any and all expectations, and stem from experience telling us that credit risk comes from any, if not all, aspects of business.

As a result, through our proprietary software, leading-edge technology, and considerable know-how, we work with you to do everything humanly possible to mitigate your credit risk efficiently and effectively, producing an ever-growing set of services we are proud to provide

Skills Required

  • 2-5 years experience in SRE, DevOps or platform engineering with strong exposure to GCP
  • Solid experience with GKE and containerized applications in production (deployments, scaling, troubleshooting)
  • Strong Infrastructure-as-Code skills with Terraform for provisioning GCP resources
  • Experience with Prometheus for metrics collection and Grafana for dashboards and SLOs
  • Experience configuring alerts (Alertmanager, Grafana, Cloud Monitoring) and routing
  • Strong Python skills for automation and tooling
  • Good knowledge of Linux and Docker, including debugging performance, networking and security issues
  • Familiarity with GCP Operations Suite (Cloud Monitoring/Logging) and integration with Prometheus/Grafana
  • Understanding of GCP security basics: IAM, service accounts, least-privilege, network security, Secret Manager
  • Experience supporting production web/backend applications and debugging using logs, metrics, traces and error reports
  • Willingness to participate in on-call rotations and incident response
  • Mentoring and coaching mindset to guide colleagues through new tools and practices
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Year Founded: 2024

What We Do

Cadex is the world leader in the research, development, and manufacturing of “Turn-Key” impact systems and testing laboratories, specializing in helmet testing equipment technology.

Similar Jobs

Remote
27 Locations
95 Employees
74K-90K Annually

Xebia Logo Xebia

Site Reliability Engineer

Artificial Intelligence • Cloud • Information Technology • Software • Consulting • Data Privacy
Remote
5 Locations
3254 Employees

Kraken Digital Asset Exchange Logo Kraken Digital Asset Exchange

Site Reliability Engineer

Blockchain • Financial Services • Cryptocurrency • Web3
Remote
22 Locations
2900 Employees

Replit Logo Replit

Site Reliability Engineer

Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI
Remote
26 Locations
300 Employees

Similar Companies Hiring

Fortune Brands Innovations Thumbnail
Manufacturing
Deerfield, IL
2450 Employees
Amalgamated Sugar Thumbnail
Food • Greentech • Agriculture • Industrial • Manufacturing
Boise, Idaho
768 Employees
Outpost Space Thumbnail
Aerospace • Defense
US
24 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account