Arcesium

Principal Engineer - DBRE

Posted 23 Days Ago

Be an Early Applicant

Hyderabad, Telangana, IND

In-Office

Expert/Leader

Cloud • Fintech • Information Technology • Software • Financial Services

The Role

Lead technical direction for the DBRE platform across SQL Server, Aurora PostgreSQL, and Snowflake. Own HA/DR, replication, observability, automation, alert engineering, incident response, and reliability KPIs. Drive cross-cutting initiatives from design to production, reduce operational toil with automation, and partner with application, infra, and SRE teams on performance, capacity, and reliability strategy.

Summary Generated by Built In

Company Overview

Arcesium is a global financial technology firm that solves complex data-driven challenges faced by some of the world’s most sophisticated financial institutions. We constantly innovate our platform and capabilities to meet tomorrow’s challenges, anticipate the risks our clients encounter, and design advanced solutions to help our clients achieve transformational business outcomes.

Financial technology is a high-growth industry as change and innovation continue to disrupt the status-quo and prompt major transformation. Arcesium is at a particularly interesting time in our own growth as we look to leverage our successfully established market position and expand operations in pursuit of strategic new business opportunities. We value intellectual curiosity, proactive ownership, and collaboration with colleagues, and we empower you to meaningfully contribute from day one and accelerate your professional development.

We are looking for an exceptional engineer to provide expert-level technical leadership for our Database Reliability Engineering (DBRE) platform. This is a hands-on individual contributor role that owns the architectural direction for our most complex database reliability challenges - high availability, disaster recovery, observability, and platform automation — across thousands of SQL Server, Aurora PostgreSQL, and Snowflake environments running mission-critical workloads for the world’s most sophisticated financial institutions.

What you’ll do:

Drive architectural direction for the database platform across SQL Server, Aurora PostgreSQL, and Snowflake — covering high availability, disaster recovery, replication, backup and recovery, capacity, performance, and security.
Own complex, cross-cutting initiatives such as cross-region disaster recovery, platform refresh orchestration, alerting redesign, and cost optimization, taking each from problem statement through to a deployed, owned solution.
Lead by example with exemplary code, design documents, RFCs, and runbooks, setting the standard for technical writing, code quality, and operational rigor across the DBRE team.
Reduce operational toil by engineering automation across provisioning, refresh, patching, scaling, failover, and decommissioning — treating manual operations as bugs to be eliminated.
Lead alert engineering to drive sustainable reductions in alert volume while improving signal quality, partnering with application teams on alert ownership, attribution, and SLA design.
Drive incident response and root-cause analysis for the most complex production incidents, and convert RCAs into platform-level improvements that prevent recurrence.
Define reliability KPIs (availability, MTTR, alert sustainability, SLA adherence) and build the dashboards and reporting cadence to track them.
Partner with application engineering, infrastructure, and SRE teams on schema design, query performance, data lifecycle, and shared reliability patterns, and engage senior leadership on strategy, multi-quarter roadmaps, and budget trade-offs.

What you’ll need:

A bachelor’s or master’s degree in computer science, Engineering, or a related field with 9+ years of professional engineering experience, including significant time in a principal-level or equivalent individual contributor role.
Deep, hands-on expertise in at least one major relational database platform (SQL Server or PostgreSQL) including replication, HA/DR architectures, performance tuning, query optimization, and internals.
Strong working knowledge of cloud infrastructure (AWS preferred): VPC networking, EC2, EBS, FSx, IAM, RDS/Aurora, and cross-region replication.
Strong programming skills in at least one of Python, PowerShell, Go, or T-SQL — capable of writing production-quality automation, not just scripts.
A proven track record designing and delivering large-scale reliability initiatives (HA/DR, observability, automation platforms) with measurable outcomes.
Experience leading complex incident response, root-cause analysis, and post-incident improvement programs in 24x7 environments.
Experience with observability platforms (Datadog, Prometheus, Grafana), modern alerting design, infrastructure-as-code (Terraform, CloudFormation), and CI/CD pipelines (GitLab CI, Jenkins).
Exceptional verbal and written communication skills, with the ability to produce clear design documents and executive-level summaries and to influence stakeholders across engineering, infrastructure, and business teams.
Experience across multiple database platforms (SQL Server / PostgreSQL / Snowflake / Aurora) and familiarity with financial-services data domains is a bonus.

Arcesium's Personal Data Privacy Notice for Candidates is linked here.

Recruiting Security
Emails from genuine Arcesium recruiters who are employees of the company will always come from the @arcesium.com domain. In some cases, you may also be contacted by independent search firms engaged to recruit on our behalf; emails from their employees should always come from their firm's applicable domain. We'll never ask for your banking information or any payment as part of the recruiting process. If something seems off or you're contacted by an unexpected third party, please reach out to us at [email protected] (US/UK), [email protected] (India) or [email protected] (Portugal/Sweden).

Arcesium is an equal opportunity employer.

Skills Required

Bachelor's or Master's degree in Computer Science, Engineering, or related field with 9+ years of professional engineering experience, including principal-level experience.
Deep, hands-on expertise in SQL Server or PostgreSQL including replication, HA/DR architectures, performance tuning, query optimization, and internals.
Strong working knowledge of AWS: VPC, EC2, EBS, FSx, IAM, RDS/Aurora, and cross-region replication.
Production-quality programming skills in at least one of Python, PowerShell, Go, or T-SQL.
Proven track record designing and delivering large-scale reliability initiatives (HA/DR, observability, automation platforms) with measurable outcomes.
Experience leading complex incident response, root-cause analysis, and post-incident improvement programs in 24x7 environments.
Experience with observability platforms (Datadog, Prometheus, Grafana), modern alerting design, infrastructure-as-code (Terraform, CloudFormation), and CI/CD pipelines (GitLab CI, Jenkins).
Exceptional verbal and written communication skills, ability to produce clear design documents and executive-level summaries, and influence stakeholders.
Experience across multiple database platforms (SQL Server, PostgreSQL, Snowflake, Aurora) and familiarity with financial-services data domains.

Arcesium Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Arcesium and has not been reviewed or approved by Arcesium.

Healthcare Strength — Healthcare coverage is described as comprehensive, including medical, dental, vision, mental‑health programs, and protections for life, accident, and disability. The scope indicates strong support for physical and mental well‑being.
Retirement Support — Retirement offerings include a 401(k) and a pension alongside financial protections such as life and disability insurance. These elements point to meaningful long‑term financial security within total rewards.
Parental & Family Support — Parental leave is characterized as generous, with extended maternity and supportive paternity leave in some locations, plus family and adoption assistance. This family focus complements broader time‑off and caregiving supports.

Learn more about Arcesium's Compensation & Benefits →

Arcesium Insights

What's It Like to Work at Arcesium? Arcesium Culture & Values Arcesium Career Growth & Development What's the Work-Life Balance Like at Arcesium? Arcesium Leadership & Management Arcesium Company Growth, Stability & Outlook

View all jobs at Arcesium

View Arcesium Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: New York, NY

1,500 Employees

Year Founded: 2015

What We Do

Arcesium is a global financial technology and professional services firm, delivering post-investment and enterprise data management solutions to some of the world's most sophisticated financial institutions, including hedge funds, banks, institutional asset managers, and private equity firms. Expertly designed to achieve a single source of truth throughout a client's ecosystem, Arcesium's cloud-native technology is built to systematize the most complex workflows and help clients achieve scale. Building on a platform developed and tested by investment and technology development firm, the D. E. Shaw group, Arcesium was launched as a joint venture with Blackstone Alternative Asset Management. J.P. Morgan, another large client, later joined as our third partner. Today, Arcesium services over $679 billion in global client AUM with a staff of over 1,500 software engineering, accounting, operations, and treasury professionals.