Site Reliability Engineer (SRE)

Sorry, this job was removed at 06:18 p.m. (CST) on Monday, Aug 04, 2025
New York City, NY
In-Office
Software • Analytics
The AI platform for financial services.
The Role

We're building Al thought partners to make people smarter and more creative, accelerating the creation and sharing of knowledge in financial services. We're unabashedly ambitious, and we're dead set on building the biggest Financial AI company in the world. Our team is lean, smart, and endlessly curious.

What You Will Own
  • Infrastructure Management: Design, deploy, and maintain cloud infrastructure on AWS and/or Azure, ensuring high availability and resilience.

  • Monitoring and Performance: Implement and manage monitoring solutions using Datadog to proactively identify and address system issues.

  • Container Orchestration: Manage Kubernetes clusters, utilizing Helm for package management and deployment automation.

  • Automation and Scripting: Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, and create automation scripts in Bash or Python to streamline operations.

  • Collaboration: Work closely with development and operations teams to propagate DevOps culture, share best practices, and ensure seamless integration and deployment processes.

  • Incident Response: Troubleshoot and resolve complex cross-platform issues related to OS, networking, and databases in a cloud-based environment.

  • Documentation: Maintain comprehensive documentation of system configurations, procedures, and troubleshooting guides.

What You Will Need
  • Bachelor’s degree in Computer Science, Information Technology, or a related field.

  • Experience

    • 3-5 years of hands-on experience with AWS and/or Azure cloud platforms, including services like EC2, S3, VPC, and Lambda.

    • 2-3 years of experience managing Kubernetes clusters in production environments.

    • 2-3 years of experience with Helm for Kubernetes package management.

    • 2-3 years of experience with Datadog or similar monitoring tools.

    • 3-5 years of experience with Linux system administration and shell scripting.

    • 2-3 years of experience with Infrastructure as Code (IaC) tools like Terraform.

  • Skills

    • Proficiency in scripting languages such as Bash and Python.

    • Strong understanding of networking fundamentals, including TCP/IP, DNS, and load balancing.

    • Experience with CI/CD pipelines and tools like Jenkins, GitLab CI, or GitHub Actions.

    • Experience with cloud-native security best practices and compliance frameworks.

    • Excellent problem-solving skills and the ability to navigate complex challenges effectively.

    • Strong communication and collaboration skills.

Bonus

  • Experience with MLOps monitoring and observability.

  • Experience with PostgreSQL, Elasticsearch, and vector databases such as Qdrant or similar technologies.

  • Experience with monitoring and security tools such as Datadog, AWS GuardDuty, CloudWatch, and CloudTrail.

  • Certifications in AWS, Azure, or Kubernetes.

  • Experience with other cloud platforms like Google Cloud Platform (GCP).

  • Experience with distributed tracing and observability tools.

Who You Are
  • You thrive in fast-paced environments. You are high-intensity and care a lot about what you do, and you're ecstatic to work at a start-up

  • You are ambitious. You have fun solving problems that others think are impossible.

  • You are curious. You find joy in learning about AI, technology, and finance

  • You are an owner. You are autonomous, self-directed, and comfortable working with ambiguity

  • You are collaborative, organized, and thoughtful.

Why Join Rogo?
  • Exceptional traction: strong PMF with the world's largest investment banks, hedge funds, and private equity firms.

  • World-class team: we take talent density seriously. We like working with incredibly smart, driven people.

  • Velocity: we work fast, which means you learn a lot and constantly take on new challenges.

  • Frontier technology: we're developing cutting-edge AI systems, pushing the boundaries of published research, redefining what's possible, and inventing the future.

  • Cutting Edge Product: Our platform is state-of-the-art and crazily powerful. We're creating tools that make people smarter, reinventing how you discover, create, and share knowledge.

Similar Jobs

Magnite Logo Magnite

Senior Site Reliability Engineer

AdTech • Big Data • Digital Media • Software
Hybrid
New York, NY, USA
950 Employees
135K-155K Annually
Hybrid
New York, NY, USA
289097 Employees

Braze Logo Braze

Senior Site Reliability Engineer

Marketing Tech • Mobile • Software
Easy Apply
Hybrid
New York City, NY, USA
1918 Employees
130K-232K Annually

Citadel Logo Citadel

Site Reliability Engineer

Information Technology • Software • Financial Services • Big Data Analytics
In-Office
New York, NY, USA
4000 Employees
105K-300K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, NY
55 Employees
Year Founded: 2021

What We Do

Artificial intelligence is transforming the global financial services industry. Rogo is the first generative AI company built to help financial firms navigate this transformation. Our mission is simple: Improve how firms work by deploying bespoke AI solutions

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account