Manager, Site Reliability Engineering

Posted 15 Days Ago
Be an Early Applicant
Hiring Remotely in USA
Remote
Mid level
Blockchain • Fintech • Cryptocurrency
At Gemini, no job is too small and no project too big as we endeavor to build the future of money.
The Role
Lead and manage the Site Reliability Engineering team to ensure the design, deployment, and maintenance of reliable, scalable infrastructure. Drive continuous improvements and collaborate with engineering teams on operational excellence and system efficiency.
Summary Generated by Built In

About the Company

Gemini is a global crypto and Web3 platform founded by Tyler Winklevoss and Cameron Winklevoss in 2014. Gemini offers a wide range of crypto products and services for individuals and institutions in over 70 countries.

Crypto is about giving you greater choice, independence, and opportunity. We are here to help you on your journey. We build crypto products that are simple, elegant, and secure. Whether you are an individual or an institution, we help you buy, sell, and store your bitcoin and cryptocurrency. 

At Gemini, our mission is to unlock the next era of financial, creative, and personal freedom.

In the United States, we have a flexible hybrid work policy for employees who live within 30 miles of our office headquartered in New York City and our office in Seattle. Employees within the New York and Seattle metropolitan areas are expected to work from the designated office twice a week, unless there is a job-specific requirement to be in the office every workday. Employees outside of these areas are considered part of our remote-first workforce. We believe our hybrid approach for those near our NYC and Seattle offices increases productivity through more in-person collaboration where possible.

The Department: Platform

Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Platform focuses around building a scalable and secure foundations platform, enabling Engineering to deploy, validate, and operate their services in production, improve resiliency of the service and increase organizational efficiency by reducing operational toil and increase system efficiency through architectural evolution.

The Site Reliability Engineering team engages directly with our other engineering teams to onboard them onto our platform systems, reviewing and recommending design and architectural decisions, and guiding our engineering teams on how to implement the tooling provided by the larger Platform organization required to ensure systems can scale and react to changing conditions, with continuous improvement loops.
The Role: Manager, Site Reliability Engineering

In this position, you will lead a team of skilled Site Reliability Engineers responsible for the design, deployment, and maintenance of our production systems. You will play a crucial role in ensuring the reliability, scalability, and performance of our infrastructure, as well as driving continuous improvement initiatives. Your expertise in SRE practices and experience with the listed technologies will enable you to effectively guide the team towards achieving operational excellence. 

Responsibilities:

  • Lead, mentor and manage a team of Site Reliability Engineers, fostering a culture of collaboration, innovation, and operational excellence. Provide guidance and career development opportunities to team members.
  • Develop, communicate, and execute the SRE team's strategic goals, objectives, and roadmap in alignment with the overall business objectives.
  • Oversee the design, implementation, and maintenance of highly available and scalable production systems.
  • Drive continuous improvement initiatives by identifying areas for enhancement and implementing best practices, automation, and process improvements.
  • Collaborate with cross-functional teams and Departments to ensure smooth integration of applications and systems.
  • Define and enforce Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to ensure system reliability and uptime.
  • Monitor system performance, troubleshoot issues, and ensure timely incident response, root cause analysis, and problem resolution.
  • Implement effective monitoring, logging, and alerting systems to proactively identify and mitigate potential issues.
  • Stay up-to-date with industry trends, emerging technologies, and best practices related to SRE and DevOps, and apply them to improve operational efficiency.
  • Identify potential risks to system reliability and implement strategies to mitigate them.
  • Ensure that all systems and processes comply with relevant regulations, standards, and best practices.

Minimum Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
  • Proven experience as a Site Reliability Engineer or similar role, with at least 3-5 years of hands-on experience in managing production systems.
  • Strong expertise in the listed technologies: Ansible, Concourse CI, Jenkins, Github Actions, EKS (Kubernetes), Linux Administration, terraform.
  • Demonstrated experience in leading and managing a team of technical professionals for at least 2 years.
  • Solid understanding of SRE principles, including reliability, scalability, availability, and performance.
  • Proficient in scripting and automation (e.g., Python, Bash, or similar).
  • Experience with infrastructure-as-code (IaC) tools, configuration management, and CI/CD pipelines.
  • Knowledge of cloud platforms (e.g., AWS, Azure, or Google Cloud) and containerization technologies (e.g., Docker).
  • Excellent problem-solving skills and the ability to thrive in a fast-paced, dynamic environment.
  • Strong communication and leadership skills, with the ability to collaborate effectively with both technical and non-technical stakeholders.

Preferred Qualifications:

  • Relevant certifications, such as Certified Kubernetes Administrator (CKA) or AWS Certified DevOps Engineer.
  • Experience with monitoring and observability tools (e.g., Datadog, New Relic, Prometheus, Grafana, ELK Stack).
  • Familiarity with agile methodologies and experience working in an Agile/Scrum environment.

It Pays to Work Here

 

The compensation & benefits package for this role includes:

  • Competitive starting salary
  • A discretionary annual bonus
  • Long-term incentive in the form of a new hire equity grant
  • Comprehensive health plans
  • 401K with company matching
  • Paid Parental Leave
  • Flexible time off

Salary Range: The base salary range for this role is between $172,000 - $215,000 in the State of New York, the State of California and the State of Washington. This range is not inclusive of our discretionary bonus or equity package. When determining a candidate’s compensation, we consider a number of factors including skillset, experience, job scope, and current market data.

At Gemini, we strive to build diverse teams that reflect the people we want to empower through our products, and we are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. Equal Opportunity is the Law, and Gemini is proud to be an equal opportunity workplace. If you have a specific need that requires accommodation, please let a member of the People Team know.

The Company
HQ: New York, NY
660 Employees
Hybrid Workplace
Year Founded: 2014

What We Do

Gemini is a licensed digital asset exchange and custodian. We built the Gemini platform so customers can buy, sell, and store digital assets (e.g., Bitcoin, Ethereum, and Zcash) in a regulated, secure, and compliant manner.

Why Work With Us

Digital assets and blockchain technology have the power to transform the world for good. This truth, along with our core values, form the bedrock of our company and culture. We are a mission-driven, team-based, inclusive, and determined community of thought leaders who invest in each other and the long game. Join us in our mission!

Gallery

Gallery

Jobs at Similar Companies

MassMutual India Logo MassMutual India

Intern

Big Data • Fintech • Information Technology • Insurance • Financial Services
Hyderabad, Telangana, IND

MyBambu Logo MyBambu

Compliance Quality Assurance (QA) Specialist

Fintech • Mobile • Other • Payments • Social Impact • Financial Services • App development
West Palm Beach, FL, USA
120 Employees

EDGE Logo EDGE

Sr. Business Analyst

Fintech • Software • Analytics • Financial Services
Chicago, IL, USA
20 Employees

Similar Companies Hiring

EDGE Thumbnail
Software • Fintech • Financial Services • Analytics
Chicago, IL
20 Employees
MyBambu Thumbnail
Social Impact • Payments • Other • Mobile • Fintech • Financial Services • App development
West Palm Beach, Florida
120 Employees
MassMutual India Thumbnail
Insurance • Information Technology • Fintech • Financial Services • Big Data
Hyderabad, Telangana

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account