Site Reliability Engineer (SRE)

Job Posted 16 Days Ago Posted 16 Days Ago
Be an Early Applicant
New York, NY
Mid level
Software • Analytics
The Role
As a Site Reliability Engineer, you'll design and maintain cloud infrastructure, manage Kubernetes, implement monitoring solutions, and automate processes while collaborating with teams to improve operational efficiency.
Summary Generated by Built In

We're building Al thought partners to make people smarter and more creative, accelerating the creation and sharing of knowledge in financial services. We're unabashedly ambitious, and we're dead set on building the biggest Financial AI company in the world. Our team is lean, smart, and enormously ambitious. We're growing fast out of our beautiful office in NYC.

WHY JOIN ROGO?

  • Exceptional traction: strong PMF with the world's largest investment banks, hedge funds, and private equity firms.

  • World-class team: we take talent density seriously. We like working with incredibly smart, driven people.

  • Velocity: we work fast, which means you learn a lot and constantly take on new challenges.

  • Frontier technology: we're developing cutting-edge AI systems, pushing the boundaries of published research, redefining what's possible, and inventing the future.

  • Cutting Edge Product: Our platform is state-of-the-art and crazily powerful. We're creating tools that make people smarter, reinventing how you discover, create, and share knowledge.

Key Responsibilities:

  • Infrastructure Management: Design, deploy, and maintain cloud infrastructure on AWS and/or Azure, ensuring high availability and resilience.

  • Monitoring and Performance: Implement and manage monitoring solutions using Datadog to proactively identify and address system issues.

  • Container Orchestration: Manage Kubernetes clusters, utilizing Helm for package management and deployment automation.

  • Automation and Scripting: Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, and create automation scripts in Bash or Python to streamline operations.

  • Collaboration: Work closely with development and operations teams to propagate DevOps culture, share best practices, and ensure seamless integration and deployment processes.

  • Incident Response: Troubleshoot and resolve complex cross-platform issues related to OS, networking, and databases in a cloud-based environment.

  • Documentation: Maintain comprehensive documentation of system configurations, procedures, and troubleshooting guides.

Qualifications:

  • Education: Bachelor’s degree in Computer Science, Information Technology, or a related field.

  • Experience:

    • 3-5 years of hands-on experience with AWS and/or Azure cloud platforms, including services like EC2, S3, VPC, and Lambda.

    • 2-3 years of experience managing Kubernetes clusters in production environments.

    • 2-3 years of experience with Helm for Kubernetes package management.

    • 2-3 years of experience with Datadog or similar monitoring tools.

    • 3-5 years of experience with Linux system administration and shell scripting.

    • 2-3 years of experience with Infrastructure as Code (IaC) tools like Terraform.

  • Skills:

    • Proficiency in scripting languages such as Bash and Python.

    • Strong understanding of networking fundamentals, including TCP/IP, DNS, and load balancing.

    • Experience with CI/CD pipelines and tools like Jenkins, GitLab CI, or GitHub Actions.

    • Experience with cloud-native security best practices and compliance frameworks.

    • Excellent problem-solving skills and the ability to navigate complex challenges effectively.

    • Strong communication and collaboration skills.

Preferred Qualifications:

  • Experience with MLOps monitoring and observability.

  • Experience with PostgreSQL, Elasticsearch, and vector databases such as Qdrant or similar technologies.

  • Experience with monitoring and security tools such as Datadog, AWS GuardDuty, CloudWatch, and CloudTrail.

  • Certifications in AWS, Azure, or Kubernetes.

  • Experience with other cloud platforms like Google Cloud Platform (GCP).

  • Experience with distributed tracing and observability tools.

WHO YOU ARE

  • You thrive in fast-paced environments. You are high-intensity and care a lot about what you do, and you're ecstatic to work at a start-up

  • You are ambitious. You have fun solving problems that others think are impossible.

  • You are curious. You find joy in learning about AI, technology, and finance

  • You are an owner. You are autonomous, self-directed, and comfortable working with ambiguity

  • You are collaborative, organized, and thoughtful.

Top Skills

AWS
Aws Guardduty
Azure
Bash
Ci/Cd
Cloudtrail
Cloudwatch
Datadog
Elasticsearch
Github Actions
Gitlab Ci
Helm
Jenkins
Kubernetes
Linux
Postgres
Python
Qdrant
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, NY
15 Employees
On-site Workplace
Year Founded: 2021

What We Do

Rogo is an analytics platform that utilizes natural language processing to transform traditional data workflows.

Our mission is to unleash the power of data and enable everyone to become a data genius. Our platform transforms the way industry, academics, and individuals interact with data by allowing users to work with, analyze, and visualize their data in the easiest way imaginable: by simply asking in plain English.

Contact us at team@rogodata.com or apply for roles at https://boards.greenhouse.io/rogo.

Similar Jobs

Hebbia AI Logo Hebbia AI

Site Reliability Engineer, Senior

Artificial Intelligence • Legal Tech • Machine Learning • Natural Language Processing • Software • Financial Services • Generative AI
New York, NY, USA
90 Employees
160K-215K Annually

Citadel Securities Logo Citadel Securities

Site Reliability Engineer

Information Technology • Software • Financial Services
New York, NY, USA
1900 Employees
125K-350K Annually

Zocdoc Logo Zocdoc

Senior Site Reliability Engineer

Healthtech • Information Technology • Software • Telehealth
Easy Apply
Hybrid
New York, NY, USA
715 Employees
160K-230K Annually

Alchemy Logo Alchemy

Site Reliability Engineer

Blockchain • Information Technology • Software • Cryptocurrency • Web3
Easy Apply
Hybrid
2 Locations
200 Employees

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account