Lead Site Reliability Engineer - India

Reposted 10 Days Ago
Be an Early Applicant
Hiring Remotely in India
Remote
Senior level
Fintech • Real Estate • Software
The Role
Lead the technical direction for infrastructure systems, ensuring reliability and scalability while managing incident response and project delivery in collaboration with teams.
Summary Generated by Built In
About Juniper Square

Private markets are one of the largest, most complex, and most underserved corners of global finance. Our mission at Juniper Square is to unlock their full potential. We’re the Operations Partner trusted by 2,300+ GPs, unifying technology, data, and fund administration services into a single platform that helps GPs move faster, make better decisions, and scale with precision. With $300B+ under administration and 700,000+ LPs on platform, we’ve built the scale to match our ambition. And with JunieAI, our purpose-built AI platform, we’re reimagining how private markets operate, embedding intelligence across every workflow. Founder-led since 2014, backed by $350M+ in funding, and now 1,000+ employees strong, we’re building a company designed to shape the future of private markets for decades to come.

Our culture is built for people who want to do ambitious, meaningful work alongside exceptionally talented teammates. We think like owners, move with urgency, and take pride in solving hard problems that truly matter to our customers and the future of private markets. We believe the best ideas come from open debate, deep collaboration, and diverse perspectives, which is why we believe transparency is the default and feedback makes us stronger. If you’re energized by high standards, rapid growth, and the opportunity to help define a category at a pivotal moment, come join us!

Juniper Square offers employees a variety of ways to work, ranging from a fully remote experience to working full-time in one of our physical offices. We invest heavily in digital-first operations, allowing our teams to collaborate effectively across 27 U.S. states, 2 Canadian Provinces, India, Luxembourg, and England. We also have physical offices in San Francisco, New York City, Mumbai and Bangalore for employees who prefer to work in an office some or all of the time.

What you'll do

Technical Leadership & Architecture

  • Own and drive the technical direction for your team's infrastructure systems, making architectural decisions that balance reliability, scalability, and cost.

  • Design systems of moderate to high complexity using distributed systems best practices; anticipate future use cases and minimize technical debt.

  • Conduct architectural reviews and advance design patterns across the organization.

  • Identify and implement improvements to existing software architecture; define and expand design patterns to solve common platform problems.

  • Define and enforce security best practices across team-owned systems; proactively surface gaps to senior leadership.

Reliability & Operational Excellence

  • Own the reliability posture of team-owned services — establish SLOs, monitor SLAs, and hold the team accountable to them.

  • Lead incident response for complex, multi-service issues; systematically debug, identify root causes, and ensure issues do not recur.

  • Establish standards for logging, monitoring, and operationalization across all team-owned systems.

  • Foresee potential operational issues and implement preventative measures to safeguard the customer experience.

  • Participate in and help lead the on-call rotation; ensure production systems are appropriately instrumented.

Project & Delivery Ownership

  • Act as DRI (Directly Responsible Individual) for medium-to-large SRE projects spanning months and involving cross-team collaboration.

  • Partner with Engineering Managers and Product Managers to scope roadmap initiatives, break down work into actionable increments, and commit to delivery plans.

  • Negotiate scope effectively when required, ensuring adjustments align with customer needs and project goals.

  • Proactively identify and resolve project risks — dependencies, architectural drift, and staffing blockers — before they impact delivery.

Qualifications

Required Skills

  • 7-10 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering in a production cloud environment.

  • 5+ years of hands-on experience with AWS cloud services across compute, networking, storage, and security.

  • 5+ years managing Linux-oriented production environments at scale.

  • 5+ years using Infrastructure-as-Code (Terraform, CDK, CloudFormation) and/or GitOps best practices.

  • 3+ years operating and troubleshooting production Kubernetes environments.

  • 3+ years applying AWS Well-Architected Framework principles across reliability, security, performance, and cost pillars.

  • 3+ years in cloud security best practices including IAM, secrets management, network security, and compliance.

  • 3+ years working with PostgreSQL in production: performance tuning, replication, backup, and recovery.

  • Demonstrated track record of leading multi-person technical projects from scoping through delivery.

Technical Skills

  • Strong general programming skills; comfort writing automation scripts and tooling in Python, Go, or similar.

  • Deep knowledge of observability tooling — metrics, logging, distributed tracing — and how to use them to drive reliability.

  • Solid understanding of data retention, backup, and recovery processes across cloud-native systems.

  • Experience with CI/CD pipelines, release management, and deployment automation.

  • Familiarity with service mesh, API gateway patterns, and microservices architectures.

AI Fluency

  • Experience using AI-assisted workflows across the SDLC, with an emphasis on production reliability, operability, and maintainability of large-scale systems (design, deployment, monitoring, incident response)

  • Hands-on experience integrating LLMs or AI systems into production environments, with a focus on reliability, latency, observability, and failure handling (e.g., automated triage, incident copilots, runbook automation)

  • Familiarity with agent-based or workflow automation systems applied to operational use cases such as alert triage, remediation loops, system diagnostics, or automated runbook execution

  • Demonstrated ability to apply AI tools to improve system reliability, reduce MTTR, automate operational workflows, and enhance observability and alerting systems

  • Working knowledge of LLMs, embeddings, RAG, and their operational constraints in production systems (latency, cost, drift, safety, and observability)

  • Ability to identify opportunities where AI can meaningfully improve system reliability, on-call efficiency, incident response, and infrastructure automation

Nice to have (SRE):

  • Experience handling model degradation, fallback strategies, and cost anomalies

Leadership & Collaboration

  • Proven ability to lead technical discussions, drive alignment across engineering and product, and communicate decisions clearly to stakeholders.

  • Experience mentoring junior and mid-level engineers in both technical skills and professional development.

  • Able to operate independently with minimal supervision; comfortable making final technical decisions as DRI.

  • Strong communication skills in English — written and verbal — with experience influencing cross-functional partners.

Why Juniper Square
  • High-impact role at the intersection of cloud infrastructure and financial technology — your work directly underpins products managing hundreds of billions in AUM.

  • Significant growth potential: opportunity to help shape the SRE practice and prepare the platform for exponential scale.

  • A promising technology roadmap spanning capacity planning, Kubernetes migrations, and service-oriented architecture modernization.

  • Collaborative, engineering-driven culture that values quality, curiosity, and ownership.

  • Competitive compensation and benefits package.

Skills Required

  • 7-10 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering in a production cloud environment
  • 5+ years of hands-on experience with AWS cloud services
  • 5+ years managing Linux-oriented production environments at scale
  • 5+ years using Infrastructure-as-Code tools like Terraform, CDK, or CloudFormation
  • 3+ years operating and troubleshooting production Kubernetes environments
  • 3+ years applying AWS Well-Architected Framework principles
  • 3+ years in cloud security best practices including IAM and network security
  • 3+ years working with PostgreSQL in production
  • Demonstrated track record of leading multi-person technical projects

Juniper Square Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Juniper Square and has not been reviewed or approved by Juniper Square.

  • Parental & Family Support Paid parental and family leave is frequently characterized as generous, with structured return‑to‑work support referenced. Feedback suggests family‑forming and caregiver supports are a notable strength.
  • Leave & Time Off Breadth Unlimited PTO alongside paid holidays and sick time is consistently highlighted. Team practices like no‑meeting days and flexible scheduling further support time away.
  • Wellbeing & Lifestyle Benefits A digital‑first, remote‑friendly setup is reinforced by home‑office stipends, setup reimbursements, and expanded mental‑health support. Feedback suggests these perks materially improve daily work flexibility and wellbeing.

Juniper Square Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Austin, TX
217 Employees
Year Founded: 2014

What We Do

At Juniper Square, we work hard every day to set the bar high in the work we do. We bring out the very best in each other without sacrificing kindness, quality, and the willingness to learn. You will see that every function and team is given respect here, because when any one of us wins, we all win. We bring a beginner’s mind to our work, not ego—enabling learning and creative and critical thinking. Some people talk about transparency, but here we treat you like the owner that you are: We share knowledge and information with all our employees, knowing that informed teams are successful teams (and happy ones too).

Why Work With Us

Our vision is to make the world’s private capital markets more efficient, transparent, and accessible through financial technology. We have an opportunity to transform an enormous and important industry, and we feel lucky to be working with the most talented, kindest, and most ambitious colleagues of our careers. Come join us!

Gallery

Gallery

Similar Jobs

Nexthink Logo Nexthink

Technical Partner Manager

Artificial Intelligence • Big Data • Cloud • Information Technology • Machine Learning • Software
Remote or Hybrid
Bengaluru, Karnataka, IND
1200 Employees

GitLab Logo GitLab

Business Development Representative

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
India
2500 Employees

Cloudflare Logo Cloudflare

Country Director, India

Cloud • Information Technology • Security • Software • Cybersecurity
Remote or Hybrid
India
4400 Employees

Nexthink Logo Nexthink

Consultant

Artificial Intelligence • Big Data • Cloud • Information Technology • Machine Learning • Software
Remote or Hybrid
Bengaluru, Karnataka, IND
1200 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account