Site Reliability Engineer (SRE) Lead

Sorry, this job was removed at 07:42 p.m. (CST) on Friday, Dec 13, 2024
Easy Apply
Hiring Remotely in USA
Remote
Internship
Fintech • Information Technology • Payments • Software • Financial Services
The Role

About us.
Trumid is a dynamic fintech revolutionizing the landscape of fixed income trading. With intelligent, easy-to-use, electronic solutions, we are rapidly growing and seeking exceptional talent to help redefine the boundaries of technology and finance.
Founded in 2014 by a team of fixed income market experts, Trumid has quickly become one of the top three corporate bond e-trading platforms in the U.S. Today, over 1,300 traders from an extensive and expanding client network of 890+ buy-and sell-side institutions transact on Trumid monthly.
With a rich history of innovation and a unique ability to innovate at scale, we collaborate closely with our clients, iterating quickly toward optimal solutions. With market share and client engagement at all-time highs and our pace of product development faster than ever, this is an exciting and transformative time at Trumid.
Our business model thrives on participation, and so does our company culture. We rely on every team member’s contribution to help us accomplish our goals. To succeed at Trumid, you must be curious, passionate about your craft, ambitious, collaborative, and driven.Learn more at www.trumid.com.

The opportunity.

Trumid is looking for a Lead Site Reliability Engineer (SRE) to ensure our systems' reliability, scalability, and performance as we continue to grow. This role offers a unique opportunity to shape our fast-growing firm's reliability practices and infrastructure. You will be crucial in optimizing our existing infrastructure, implementing new technologies, and enhancing our incident response capabilities.

As a Lead SRE, you will oversee the stability and performance of our trading platform, which serves a large and growing client base. You’ll work closely with development and DevOps teams to build scalable solutions and automate processes to enhance system reliability. You will also play a critical role in incident management, problem resolution, and capacity planning, ensuring that our systems meet our users' high expectations.

This role is ideal for someone passionate about reliability, automation, and efficiency. You will have the chance to lead initiatives that directly impact our platform's stability and user experience, ensuring that we maintain the highest levels of service availability.

Responsibilities will include:

  • Transform the SRE function to evolve, simplify, and scale existing solutions. Innovate and create new solutions and practices where needed.
  • Drive improvements in system reliability, scalability, and performance through innovative solutions and industry best practices.
  • Lead incident response efforts, including troubleshooting, resolution, and conducting post-mortem analysis to prevent future incidents.
  • Automate repetitive tasks to reduce manual intervention and improve operational efficiency.
  • Collaborate closely with software development, DevOps, and infrastructure teams to embed reliability into the development lifecycle.
  • Design, implement, and maintain highly available, scalable, and resilient infrastructure to meet the demands of our growing client base.
  • Develop and maintain monitoring, logging, and alerting frameworks to ensure system health and to identify and resolve issues preemptively.
  • Conduct capacity planning and performance tuning to support future growth.

About you.

  • SRE expert with foundation knowledge of SRE best practices.
  • Demonstrated hands-on experience managing large-scale and highly-available cloud-based systems.
  • Deep understanding of cloud components in at least one of the major cloud providers (eg, AWS, GCP, Azure), including infrastructure, services, and tooling.
  • Expertise in containerization and orchestration tools (e.g., Docker, Kubernetes) and experience with deployment strategies such as blue-green and canary deployments.
  • Strong knowledge of CI/CD pipelines and experience in integrating reliability practices within CI/CD processes.
  • Proficient with monitoring and observability tools (e.g., Prometheus, Grafana, Alertmanager) to ensure system health and to create effective alerting mechanisms.
  • Experience with Infrastructure as Code (IaC) tools like Terraform and Ansible and experience automating infrastructure deployment and management.
  • Excellent problem-solving skills, focusing on diagnosing complex issues in large-scale distributed systems.
  • Strong scripting and programming skills in Python, Bash, Go, or similar languages.
  • Strong communication and collaboration skills, capable of working effectively with cross-functional teams in a fast-paced environment.
  • Passion for reliability, automation, and continuous improvement.
  • Bachelor's degree in computer science (or equivalent) and at least 10 years of professional experience at a fast-paced tech oriented company.  Experience with financial and trading systems is a plus but not required.

Employee Benefits.

  • Highly competitive compensation
  • Fully paid medical, dental, and vision coverage
  • Remote work
  • Team-oriented and collaborative company culture

Trumid is an equal-opportunity employer.

In compliance with New York City Pay Transparency Law, the base salary range for this role in New York City is between $220,000 and $300,000. This range does not include discretionary bonuses or other compensation or benefits offered with this job. Several factors are considered when determining a candidate’s salary.

What the Team is Saying

Sean
Colin
Luba
Tony
Greg
The Company
HQ: New York, NY
153 Employees
Hybrid Workplace
Year Founded: 2014

What We Do

Building tomorrow’s credit trading network.

We’re a rapidly growing fintech bringing leading-edge technology and product design to corporate bond trading. With a start-up mentality, we’re constantly innovating and advancing, remaining nimble and agile as we grow. We combine market expertise with a diversity of thinking - experiences, backgrounds, and opinions from a variety of industries collaborating to drive innovation and bring ideas to life.

Our business model thrives on participation and connection, and so does our culture. We find joy in solving problems and working together towards common goals. Passionate, curious, and ambitious with a sense of fun? We’d love to hear from you. Visit us at www.trumid.com

Why Work With Us

Our business model thrives on participation and connection, and so does our company culture. We believe in collaborative innovation and solving for fun. Working together to achieve common goals and finding joy in pushing into unexplored areas and new ways of thinking.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery

Trumid Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

We embrace flexible and distributed working philosophies. Based on the position, we offer options for fully remote, hybrid, or in our New York office.

Typical time on-site: Flexible
HQNew York, NY
We are in the heart of Midtown, nestled between Bryant Park and Times Square. A vibrant area surrounded by great restaurants, shopping, and entertainment with the convenience of multiple public transportation options.

Similar Jobs

Trumid Logo Trumid

Senior Software Engineer (Distributed Systems)

Fintech • Information Technology • Payments • Software • Financial Services
Easy Apply
Remote
USA
153 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account