Engineering Division - Production Runtime Experience - Vice President - Bengaluru

Posted 7 Days Ago
Be an Early Applicant
Bengaluru, Bengaluru Urban, Karnataka, IND
In-Office
150K-250K Annually
Senior level
Fintech • Financial Services
The Role
Lead development and productionization of agentic GenAI systems and LLM-based tooling to automate diagnostics, remediation, and orchestration for large-scale runtime environments. Integrate agents with observability and incident systems, build RAG and evaluation pipelines, enforce safety/governance, optimize for scale and cost, and mentor engineering teams to deliver auditable, business-aligned reliability improvements.
Summary Generated by Built In

BUSINESS UNIT OVERVIEW 

Enterprise Technology Operations (ETO) is a Business Unit within Core Engineering focused on running scalable production management services with a mandate of operational excellence and operational risk reduction achieved through large scale automation, best-in-class engineering, and application of data science and machine learning. The Production Runtime Experience (PRX) team in ETO applies software engineering and machine learning to production management services, processes, and activities to streamline monitoring, alerting, automation, and workflows. 


TEAM OVERVIEW The Machine Learning and Artificial Intelligence team in PRX applies advanced ML and GenAI to reduce the risk and cost of operating the firm’s large-scale compute infrastructure and extensive application estate. Building on strengths in statistical modelling, anomaly detection, predictive modelling, and time-series forecasting, we leverage foundational LLM Models to orchestrate multi-agent systems for automated production management services. By unifying classical ML with agentic AI, we deliver reliable, explainable, and cost-efficient operations at scale. 


ROLE AND RESPONSIBILITIES In this role, you will be responsible for launching and implementing GenAI agentic solutions aimed at reducing the risk and cost of managing large-scale production environments with varying complexities. You will address various production runtime challenges by developing agentic AI solutions that can diagnose, reason, and take actions in production environments to improve productivity and address issues related to production support. 

What you’ll do: 

Build agentic AI systems: Design and implement tool-calling agents that combine retrieval, structured reasoning, and secure action execution (function calling, change orchestration, policy enforcement) following MCP protocol. Engineer robust guardrails for safety, compliance, and least-privilege access. 

Productionize LLMs: Build evaluation framework for open-source and foundational LLMs; implement retrieval pipelines, prompt synthesis, response validation, and self-correction loops tailored to production operations.

Integrate with runtime ecosystems: Connect agents to observability, incident management, and deployment systems to enable automated diagnostics, runbook execution, remediation, and post-incident summarization with full traceability. 

Collaborate directly with users: Partner with production engineers, and application teams to translate production pain points into agentic AI roadmaps; define objective functions linked to reliability, risk reduction, and cost; and deliver auditable, business-aligned outcomes. 

Safety, reliability, and governance: Build validator models, adversarial prompts, and policy checks into the stack; enforce deterministic fallbacks, circuit breakers, and rollback strategies; instrument continuous evaluations for usefulness, correctness, and risk. 

Scale and performance: Optimize cost and latency via prompt engineering, context management, caching, model routing, and distillation; leverage batching, streaming, and parallel tool-calls to meet stringent SLOs under real-world load. 

Build a RAG pipeline: Curate domain-knowledge; build data-quality validation framework; establish feedback loops and milestone framework maintain knowledge freshness. 

Raise the bar: Drive design reviews, experiment rigor, and high-quality engineering practices; mentor peers on agent architectures, evaluation methodologies, and safe deployment patterns.


QUALIFICATIONS

A Bachelor’s degree (Masters/ PhD preferred) in a computational field (Computer Science, Applied Mathematics, Engineering, or in a related quantitative discipline), with 7+ years of experience as an applied data scientist / machine learning engineer. 


ESSENTIAL SKILLS 

• 7+ years of software development in one or more languages (Python, C/C++, Go, Java); strong hands-on experience building and maintaining large-scale Python applications preferred. 

• 3+ years designing, architecting, testing, and launching production ML systems, including model deployment/serving, evaluation and monitoring, data processing pipelines, and model fine-tuning workflows. 

• Practical experience with Large Language Models (LLMs): API integration, prompt engineering, finetuning/adaptation, and building applications using RAG and tool-using agents (vector retrieval, function calling, secure tool execution). 

• Understanding of different LLMs, both commercial and open source, and their capabilities (e.g., OpenAI, Gemini, Llama, Qwen, Claude). 

• Solid grasp of applied statistics, core ML concepts, algorithms, and data structures to deliver efficient and reliable solutions. 

• Strong analytical problem-solving, ownership, and urgency; ability to communicate complex ideas simply and collaborate effectively across global teams with a focus on measurable business impact. 

Preferred: 

Proficiency building and operating on cloud infrastructure (ideally AWS), including containerized services (ECS/EKS), serverless (Lambda), data services (S3, DynamoDB, Redshift), orchestration (Step Functions), model serving (SageMaker), and infra-as-code (Terraform/CloudFormation). 


YOUR CAREER

Goldman Sachs is a meritocracy where you will be given all the tools to advance your career. At Goldman Sachs, you will have access to excellent training programmes designed to improve multiple facets of your skill portfolio. Our in-house training programme, “Goldman Sachs University” offers a comprehensive series of courses that you will have access to as your career progresses. Goldman Sachs University has an impressive catalogue of courses which span technical, business and leadership skills.

Salary Range
The expected base salary for this New York, New York, United States-based position is $150000-$250000. In addition, you may be eligible for a discretionary bonus if you are an active employee as of fiscal year-end.

Benefits
Goldman Sachs is committed to providing our people with valuable and competitive benefits and wellness offerings, as it is a core part of providing a strong overall employee experience. A summary of these offerings, which are generally available to active, non-temporary, full-time and part-time US employees who work at least 20 hours per week, can be found here.

Skills Required

  • Bachelor's degree in Computer Science, Applied Mathematics, Engineering, or related quantitative field
  • 7+ years experience as an applied data scientist or machine learning engineer
  • 7+ years software development experience in one or more languages (Python, C/C++, Go, Java)
  • Hands-on experience building and maintaining large-scale Python applications
  • 3+ years designing, architecting, testing, and launching production ML systems, including deployment, evaluation, monitoring, and data pipelines
  • Practical experience with Large Language Models: API integration, prompt engineering, fine-tuning/adaptation, RAG, and tool-using agents (vector retrieval, function calling, secure tool execution)
  • Understanding of commercial and open-source LLMs (OpenAI, Gemini, Llama, Qwen, Claude) and their capabilities
  • Solid grasp of applied statistics, core ML concepts, algorithms, and data structures
  • Strong analytical problem-solving, ownership, communication, and cross-team collaboration skills
  • Proficiency building and operating on cloud infrastructure and related services (AWS, ECS/EKS, Lambda, S3, DynamoDB, Redshift, Step Functions, SageMaker) and infra-as-code (Terraform/CloudFormation)

Goldman Sachs Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Goldman Sachs and has not been reviewed or approved by Goldman Sachs.

  • Healthcare Strength Coverage includes medical, dental, vision, disability, life and accident insurance, with multiple plan options and most premiums subsidized; coverage often starts on day one. Wellness resources, on-site health centers in some locations, and EAP access reinforce the depth of health support.
  • Parental & Family Support Family care includes on-site childcare in some offices, expectant parent resources, and transitional programs for returning parents. Feedback suggests parental leave is very generous, with reports of around 20 weeks paid leave and stipends for adoption, surrogacy, and fertility-related services.
  • Retirement Support The firm provides a 401(k) plan with employer matching contributions and broad financial education to help employees plan for retirement. Resources also support saving for education and preparing for unexpected events.

Goldman Sachs Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, NY
67,118 Employees

What We Do

At Goldman Sachs, we believe progress is everyone’s business. That’s why we commit our people, capital and ideas to help our clients, shareholders and the communities we serve to grow. Founded in 1869, Goldman Sachs is a leading global investment banking, securities and investment management firm. Headquartered in New York, we maintain offices in all major financial centers around the world. More about our company can be found at www.goldmansachs.com

Similar Jobs

Zscaler Logo Zscaler

Development Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Hybrid
Bangalore, Bengaluru, Karnataka, IND
8697 Employees

TransUnion Logo TransUnion

Software Engineer

Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
13000 Employees

ServiceNow Logo ServiceNow

Principal Software Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
29000 Employees

ServiceNow Logo ServiceNow

Sr Mgr, Software Engrg Mgmt - Moveworks

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
29000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account