RESPONSIBILITIES:
- Lead and mentor the TechOps team responsible for infrastructure, SRE, and system administration functions.
- Oversee cloud and on-premise environments, ensuring stability, scalability, and security.
- Partner with engineering and security teams to design and implement reliable deployment and maintenance processes.
- Define and enforce infrastructure standards, SLAs, and operational best practices.
- Develop and maintain a robust monitoring and alerting strategy across all systems, applications, and services.
- Implement tools and dashboards to provide visibility into system performance, uptime, and incident trends.
- Drive continuous improvement in alert quality — minimizing noise while ensuring rapid detection of critical issues.
- Establish and track key reliability metrics (e.g., uptime, latency, MTTR, MTBF).
- Oversee incident response processes to ensure quick resolution, root-cause identification, and post-incident learning.
- Implement reliability engineering principles to reduce operational toil and prevent recurrence of major incidents.
- Collaborate with engineering teams on infrastructure scaling, redundancy, and capacity planning initiatives.
- Develop and enforce operational runbooks, maintenance schedules, and change management processes.
- Proactively identify and address potential points of failure in systems and processes.
- Evaluate and adopt new tools that enhance monitoring, automation, and overall reliability.
- Ensure system and infrastructure documentation is accurate and up to date.
QUALIFICATIONS:
- 7+ years of experience in Technical Operations, Infrastructure, or SRE roles, with at least 2+ years in a leadership or management capacity.
- Proven success managing system administration, SRE, or DevOps teams in production environments.
- Strong understanding of cloud infrastructure (AWS, Azure, or GCP), networking, and Linux system administration.
- Hands-on experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, New Relic).
- Solid grasp of automation, CI/CD pipelines, and Infrastructure-as-Code (e.g., Terraform, Ansible).
- Deep knowledge of incident management processes and ITIL or reliability best practices.
- Excellent communication, leadership, and collaboration skills.
Top Skills
What We Do
Supernova is the technology leader in securities-based lending ("SBL") solutions that connect and empower the entire financial ecosystem. We offer the world’s first and only cloud-based, fully-customizable, end-to-end software solution to automate securities-based lending from origination through the life of the loan.
Why Work With Us
At Supernova, we're all about helping investors to achieve financial wellness. And that starts with cultivating an awesome company culture where everyone enjoys working hard and celebrating...together. We envision a world where all people have the highest probability for accomplishing their goals with the least amount of risk.
Gallery
Supernova Technology Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
Employees report to the office at least 4 days a week on which ever days make most sense for them.













