Staff SRE Engineer - Data Infra

Reposted 11 Days Ago
Be an Early Applicant
São Paulo
In-Office
Senior level
Financial Services
The Role
The Staff Systems Engineer will lead the evolution of the SRE team for the Data Platform focusing on automation, reliability, incident management, and performance optimization while mentoring junior engineers.
Summary Generated by Built In
About Us

Nu is one of the largest digital financial platforms in the world, with more than 122 million customers across Brazil, Mexico, and Colombia. Guided by our mission to fight complexity and empower people, we are redefining financial services in Latin America and this is still just the beginning of the purple future we're building.

Listed on the New York Stock Exchange (NYSE: NU), we combine proprietary technology, data intelligence, and an efficient operating model to deliver financial products that are simple, accessible, and human.

Our impact has been recognized by global rankings such as Time 100 Companies, Fast Company’s Most Innovative Companies, and Forbes World’s Best Bank. Visit our institutional page https://international.nubank.com.br/careers/ 

About the team

We are seeking an experienced and highly motivated Staff Site Reliability Engineer to join our Data Infra SRE team. This critical role will play a pivotal part in shaping the future direction of the SRE team for our Data Platform, contributing significantly to its evolution plan, as we go toward a Data Mesh architecture. We are looking for an individual with a proactive and entrepreneurial mindset who can drive innovation and excellence in reliability engineering. We are seeking an experienced and highly motivated Staff Site Reliability Engineer to join our Data Infra SRE team. This critical role will play a pivotal part in shaping the future direction of the SRE team for our Data Platform, contributing significantly to its evolution plan. While the team was initially formed under specific circumstances, our ambitious decentralization goals mean that we cannot scale effectively without a heavy investment in automation. Therefore, our vision is to heavily invest in automation with AI and AI agents, leveraging new frameworks like LangGraph alongside more classical automation approaches. For example, we aim to drastically reduce the time and effort involved in data platform crash resolution and coordination through intelligent automation. Another key initiative is to develop a "swarm of AI agents" that will act as a lubricant for the Data Platform, focusing on sophisticated anomaly detection mechanisms and predictive analytics to preventatively detect problems and automatically notify the respective responsible teams. This innovative approach will allow us to achieve unprecedented levels of reliability and efficiency. We are looking for an individual with a proactive and entrepreneurial mindset who can drive innovation and excellence in reliability engineering..

Your role

Your contributions in this role will directly address critical challenges, such as:

  • Strategic Direction and Evolution: Proactively identifying opportunities and leading initiatives to define and refine the strategic direction of the SRE team within the data platform context, specifically contributing to the Archipelago evolution plan.
  • Architectural Leadership: Providing architectural guidance and expertise for the design, implementation, and maintenance of highly reliable, scalable, and performant data infrastructure.
  • Incident Management and Resolution: Leading the effort in establishing and refining incident response protocols, ensuring efficient resolution of critical data platform issues, and driving post-incident analysis for continuous improvement.
  • Performance Optimization: Identifying and implementing solutions to optimize the performance, efficiency, and resource utilization of the data platform.
  • Automation and Tooling: Championing the development and adoption of advanced automation solutions, including leveraging AI and AI agents with new frameworks like LangGraph, for tasks such as data platform crash resolution and coordination. You will also oversee the development of more classical automation tools.
  • Proactive System Health: Designing and implementing advanced monitoring, alerting, and anomaly detection mechanisms to ensure the proactive identification and prevention of potential issues. This includes exploring the concept of a "swarm of AI agents" to act as a lubricant for the Data Platform, focusing on predictive analytics and notifying responsible teams preventatively.
  • Mentorship and Leadership: Mentoring less senior SREs, fostering a culture of reliability engineering excellence, and leading technical initiatives within the team.
Our SRE team is formally responsible for:
  • Service Level Objectives (SLO) Management: Defining, monitoring, and enforcing SLOs for critical data platform services.
  • System Observability: Implementing and maintaining comprehensive monitoring, logging, and tracing solutions across the data platform.
  • Toil Reduction: Identifying and automating repetitive manual tasks to improve team efficiency and focus on strategic initiatives.
  • Disaster Recovery and Business Continuity: Developing and testing disaster recovery plans to ensure the resilience of the data platform.
  • Capacity Planning: Forecasting resource needs and planning for infrastructure scaling to meet anticipated demand.
  • Performance Engineering: Optimizing system performance and addressing bottlenecks to ensure efficient operation.
  • Security Best Practices: Implementing and advocating for security best practices within the data platform.

Platform APIs: enabling alert management on virtually any service with simple interactions

Benefits
  • Chance of earning equity at Nubank
  • Food/ Meal Card (Vale-Refeição and/or Vale Alimentação)
  • Public Transportation Commuting Benefit (Vale-Transporte)
  • NuCare – Psychological, Financial and Legal Assistance Program
  • Life Insurance
  • Medical Plan
  • Dental Plan
  • NuLanguage – Language Course Program
  • Nucleo - Our learning platform of courses
  • Extended Parental Leave
  • Daycare Allowance
  • Parental Consultancy
  • Work-from-home Allowance
  • Gym Partnerships
  • 30 days of paid vacation
  • Relocation Assistance Package, if applicable
Work Model for this Role

Hybrid 2-3 times/week: Our hybrid work model brings us to the office at least twice a week, on strategic days designed to maximize team connection and collaboration. For more details, visit https://building.nubank.com/nu-hybrid-work-model/

Top Skills

AI
Anomaly Detection
Automation
Data Infrastructure
Langgraph
Monitoring
Predictive Analytics
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: São Paulo, São Paulo
13,649 Employees
Year Founded: 2013

What We Do

Nu was born in 2013 with the mission to fight complexity to empower people in their daily lives by reinventing financial services.

We are one of the world’s largest digital banking platforms, serving more than 70 million customers across Brazil, Mexico, and Colombia.

As one of the leading technology companies in the world, Nu leverages proprietary technologies and innovative business practices to create new financial solutions and experiences for individuals and SMEs that are simple, intuitive, convenient, low-cost, empowering, and human.

Guided by its mission, Nu is fostering access to financial services across Latin America.

Similar Jobs

Mastercard Logo Mastercard

Customer Success Manager

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Hybrid
São Paulo, BRA
35300 Employees

John Deere Logo John Deere

Junior Software Engineer

Artificial Intelligence • Cloud • Internet of Things • Machine Learning • Analytics • Industrial
Hybrid
Indaiatuba, São Paulo, BRA
69000 Employees

MongoDB Logo MongoDB

Senior Solutions Architect

Big Data • Cloud • Software • Database
Easy Apply
Hybrid
São Paulo, BRA
5550 Employees

MongoDB Logo MongoDB

Solutions Architect

Big Data • Cloud • Software • Database
Easy Apply
Hybrid
São Paulo, BRA
5550 Employees

Similar Companies Hiring

Yooz Thumbnail
Software • Machine Learning • Fintech • Financial Services • Cloud • Automation • Artificial Intelligence
Aimargues, FR
470 Employees
Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Rain Thumbnail
Web3 • Payments • Infrastructure as a Service (IaaS) • Fintech • Financial Services • Cryptocurrency • Blockchain
New York, NY
40 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account