Network SRE Full Stack Observability Engineer

Posted 3 Days Ago
Be an Early Applicant
Hiring Remotely in Heredia, CRI
Remote
Senior level
Fintech • Financial Services
The Role
Design, build, and maintain observability for network infrastructure: monitoring, logging, tracing, dashboards, alerting, automation, and reliability engineering. Collaborate with SRE, DevOps, and Network teams to analyze telemetry, implement SLIs/SLOs, and improve network performance and uptime.
Summary Generated by Built In

Network SRE Full Stack Observability Engineer

Job Summary

We are seeking a highly skilled and motivated SRE Full Stack Network Observability Engineer to join our team. In this role, you will focus on building and maintaining observability solutions for our network infrastructure, ensuring reliability, scalability, and performance. You will work closely with network engineers, and SRE team to design and implement monitoring, alerting, and visualization tools that provide actionable insights into our network systems.

This is a unique opportunity to combine your expertise in full-stack development, network engineering, and site reliability engineering to enhance the observability and reliability of our critical infrastructure.

Key Responsibilities

  • Design and Implement Observability Solutions: Develop and maintain monitoring, logging, and tracing systems for network infrastructure using tools like Prometheus, Grafana, Splunk, ELK Stack, or similar platforms.
  • Full-Stack Development: Build custom dashboards, APIs, and tools to provide real-time insights into network performance and reliability.
  • Network Monitoring: Collaborate with network engineers to implement telemetry solutions for routers, switches, firewalls, and cloud networking components.
  • Incident Management: Create automated alerting systems to detect and respond to network anomalies, ensuring minimal downtime and fast recovery.
  • Automation and Scripting: Develop scripts and automation workflows using Python, Go, or similar languages to streamline observability and troubleshooting processes.
  • Data Analysis: Analyze network telemetry data to identify trends, bottlenecks, and areas for optimization.
  • Collaboration: Work closely with SRE, DevOps, and Network Engineering teams to ensure observability solutions aligned with organizational goals.
  • Reliability Engineering: Apply SRE principles to improve the reliability and scalability of network systems, including implementing SLIs, SLOs, and error budgets.
  • Documentation: Create and maintain detailed documentation for observability tools, workflows, and best practices.
<>o

Required Qualifications

  • Education: Bachelor’s degree in computer science, Information Technology, Network Engineering, or a related field (or equivalent experience).
  • Experience:
    • 5+ years of experience in network engineering, site reliability engineering, or full-stack development.
    • Strong background in network observability and monitoring tools.
  • Technical Skills:
    • Proficiency in programming languages such as Python, Go, or JavaScript.
    • Experience with observability tools like Prometheus, Grafana, Splunk, ELK Stack, or Datadog.
    • Strong understanding of network protocols (TCP/IP, BGP, OSPF, DNS, etc.).
    • Knowledge of Infrastructure as Code (IaC) tools like Terraform or Ansible.
    • Understanding fundamental concepts in cloud networking (AWS, Azure, GCP) and hybrid environments.
    • Basic familiarity with container orchestration tools (e.g., Kubernetes) and service meshes.
  • Soft Skills:
    • Strong problem-solving and analytical skills.
    • Excellent communication and collaboration abilities.
    • Ability to work in a fast-paced, dynamic environment.

Preferred Qualifications

  • Strong proficiency in programming languages such as Python, Go, or JavaScript, with the ability to develop scripts and tools for automation and observability.
  • Extensive experience with observability tools like Prometheus, Grafana, Splunk, ELK Stack, or Datadog, including setting up monitoring, alerting, and visualization workflows.
  • Experience with AI/ML-based network monitoring tools.
  • Certifications such as CCNA, CCNP, AWS Advanced Networking, or Kubernetes certifications (CKA/CKAD).
  • Familiarity with container orchestration tools (e.g., Kubernetes) and service meshes.
  • Familiarity with chaos engineering practices to test network resilience.

This role offers the opportunity to work in a dynamic, global environment, driving innovation and operational excellence in critical network domains. If you are passionate about network reliability, observability, and full-stack development, we encourage you to apply!

------------------------------------------------------

Job Family Group: Technology

------------------------------------------------------

Job Family:Systems & Engineering

------------------------------------------------------

Time Type:Full time

------------------------------------------------------

Most Relevant Skills Please see the requirements listed above.

------------------------------------------------------

Other Relevant Skills For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

 

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.

Skills Required

  • Bachelor's degree in Computer Science, IT, Network Engineering or equivalent experience
  • 5+ years experience in network engineering, site reliability engineering, or full-stack development
  • Strong background in network observability and monitoring tools
  • Proficiency in Python, Go, or JavaScript
  • Experience with Prometheus, Grafana, Splunk, ELK Stack, or Datadog
  • Strong understanding of network protocols (TCP/IP, BGP, OSPF, DNS)
  • Knowledge of Infrastructure as Code tools like Terraform or Ansible
  • Understanding of cloud networking (AWS, Azure, GCP) and hybrid environments
  • Basic familiarity with Kubernetes and service meshes
  • Experience applying SRE principles (SLIs, SLOs, error budgets) and incident management
  • Full-stack development experience building dashboards, APIs, and observability tools

Citi Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Citi and has not been reviewed or approved by Citi.

  • Healthcare Strength Benefits coverage is positioned as comprehensive, including health, dental, and vision insurance plus on-site clinics, prescription drug support, and disability coverage. Family-building support such as fertility assistance is described as a notable differentiator within the overall package.
  • Retirement Support Retirement benefits are framed as strong, highlighted by a 401(k) with matching and additional plan options like a Roth 401(k). Financial support is reinforced through discounts and broader financial guidance resources tied to the benefits ecosystem.
  • Wellbeing & Lifestyle Benefits Wellbeing support extends beyond insurance through programs like an Employee Assistance Program, counseling/legal resources, and gym or wellness reimbursement. These offerings increase the perceived total rewards value even when cash compensation sentiment varies by role.

Citi Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Kwun Tong, Kowloon
223,850 Employees

What We Do

Citi's mission is to serve as a trusted partner to our clients by responsibly providing financial services that enable growth and economic progress. Our core activities are safeguarding assets, lending money, making payments and accessing the capital markets on behalf of our clients. We have 200 years of experience helping our clients meet the world's toughest challenges and embrace its greatest opportunities. We are Citi, the global bank – an institution connecting millions of people across hundreds of countries and cities.

Similar Jobs

Movable Ink Logo Movable Ink

Front-end Engineer

Artificial Intelligence • Marketing Tech • Software
Easy Apply
Remote or Hybrid
Costa Rica
600 Employees

TransUnion Logo TransUnion

Platform Engineer

Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
Remote or Hybrid
Heredia, Ulloa, Lagunilla, CRI
13000 Employees

TrueML Logo TrueML

Senior Software Engineer

Fintech • Machine Learning • Payments • Social Impact • Software • Financial Services
In-Office or Remote
3 Locations
450 Employees
75K-95K Annually

Akamai Technologies Logo Akamai Technologies

Site Reliability Engineer

Cloud • Security • Software • Cybersecurity
In-Office or Remote
2 Locations
10285 Employees
15M-32M Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account