Observability Platform Engineer

Reposted Yesterday
Be an Early Applicant
New York, NY, USA
In-Office
120K-150K Annually
Senior level
Financial Services
The Role
Design, build, and operate scalable observability platforms across cloud and on-prem systems. Implement telemetry (logs, metrics, traces, RUM, synthetic), SLOs, alerting, and automation; integrate with ITSM; reduce alert noise and improve incident response and reliability.
Summary Generated by Built In

Neuberger's Technology team is seeking an Observability Engineer to lead and evolve our observability strategy across cloud and on-premises environments. You will serve as the primary owner and subject matter expert for our Datadog platform — building, scaling, and operating a comprehensive monitoring solution that continuously validates service health (24/7) across business-critical systems, including external websites and key infrastructure components (e.g., firewalls, OpenShift). You will design and implement end-to-end observability solutions spanning logs, metrics, traces, Service Level Objectives (SLOs), synthetic monitoring, and Real User Monitoring (RUM) to improve reliability, accelerate incident response, and deliver clear visibility into service performance.
This is an individual contributor role with strong Datadog engineering and scripting expectations — not a pure administrator role, though prior admin experience is beneficial. You will partner closely with application, SRE/DevOps, infrastructure, and security teams and serve as the internal champion and evangelist for Datadog adoption, standards, and best practices. The environment includes an active migration from OpenView to Datadog, with workflows integrating into ServiceNow for incident routing and escalation.

What You'll Do
  • Serve as the primary Datadog platform owner — architecting, building, and maintaining scalable observability solutions across cloud and on-prem environments (Windows and Linux/Unix), with direct ownership of monitoring capabilities for key applications and services.

  • Partner closely with application, DevOps, SRE/operations, infrastructure, and security teams to translate reliability goals into actionable Datadog monitoring strategies, dashboards, SLOs, and alerting frameworks.

  • Lead and execute the migration from OpenView to Datadog, ensuring continuity of coverage and an improvement in monitoring fidelity across all migrated services and infrastructure.

  • Develop and automate processes using Datadog's APIs, Terraform provider, and scripting (Python, PowerShell, Bash) to manage monitors, dashboards, alerts, and telemetry configuration at scale — ensuring consistency across Windows Server and Unix (Linux/Solaris) environments.

  • Implement and optimize Datadog APM, distributed tracing, log management, infrastructure monitoring, and Network Performance Monitoring (NPM) to provide full-stack visibility.

  • Build and evolve Datadog RUM and Synthetic Monitoring capabilities to track end-user experience and proactively validate availability of external-facing services and critical workflows.

  • Define and operationalize SLOs and error budgets within Datadog; drive alert noise reduction through correlation, enrichment, threshold tuning, and monitor dependency mapping.

  • Integrate Datadog with ServiceNow for incident/problem ticket routing and escalation; produce runbooks, post-incident reviews, and executive/operational dashboards to support reliability reporting.

  • Champion OpenTelemetry (OTel) adoption and drive consistent logging, metrics, and tracing standards across the engineering organization using Datadog as the central observability platform.

  • Onboard new applications and services into Datadog; provide guidance and enablement to engineering teams on instrumentation, agent deployment, and observability best practices.

  • Collaborate on platform cost optimization, data governance, and scaling strategies to ensure Datadog remains performant and cost-effective as the environment grows.

Required Skills and Experience
  • BS/BA in Computer Science, Information Systems, Engineering, or equivalent experience.

  • 5+ years in Observability, APM, SRE, or Platform Engineering — with at least 2–3 years of hands-on, production-grade Datadog experience.

  • Deep expertise across Datadog's core product suite: APM, Infrastructure Monitoring, Log Management, Synthetics, RUM, SLOs, Dashboards, Monitors, and Alerting.

  • Proficiency in both Windows Server and Unix (Linux/Solaris) environments, including agent deployment, service instrumentation, and OS-level performance analysis.

  • Strong scripting and automation skills (Python, PowerShell, Bash) with hands-on experience using the Datadog API/SDK and Terraform to manage observability configurations as code.

  • Solid understanding of distributed tracing, metrics pipelines, logging standards, and SLO/error budget frameworks within Datadog.

  • Experience integrating Datadog with cloud platforms (Azure and AWS) and centralizing cross-environment telemetry.

  • Demonstrated ability to reduce alert noise and MTTR through Datadog monitor tuning, correlation, and enrichment strategies.

  • Experience with ITSM integrations (e.g., Datadog → ServiceNow) and producing clear service maps, dependency views, and stakeholder-facing dashboards.

  • Excellent communication and stakeholder management skills, with the ability to translate technical observability concepts for non-technical audiences.

  • Strong documentation habits, attention to detail, and the ability to work both independently and collaboratively in a fast-paced environment.

Nice to Have
  • Datadog certifications (e.g., Datadog Fundamentals, APM, or Log Management).

  • Experience migrating from legacy monitoring platforms (e.g., OpenView, AppDynamics, Nagios) to Datadog.

  • Familiarity with .NET development (C#), including Datadog instrumentation patterns for .NET applications.

  • Experience in financial services or other regulated industries.

  • Familiarity with ITSM integrations and CMDB alignment for incident, problem, and change management workflows.

  • Experience with CI/CD pipeline integration, synthetic testing strategies, and Datadog-based performance/capacity analysis for latency-sensitive systems.

  • Knowledge of network monitoring concepts and Datadog NPM/NDM capabilities.

Hybrid Notice

This is a hybrid position. Currently, the hybrid work schedule for this position is 2–3 days in the office. Please understand that the hybrid schedule may be modified or eliminated at any time at Neuberger's discretion.

#LI-DD2

#LI-Hybrid

Engineer II

Applicants must be authorized and have the right to work in the country where the role is located without the need for current or future sponsorship.

Compensation Details

The salary range for this role is $120,000-$150,000. This is the lowest to highest salary we in good faith believe we would pay for this role at the time of this posting. We may ultimately pay more or less than the posted range, and the range may be modified in the future. This range is only applicable for jobs to be performed in the job posting location. An employee’s pay position within the salary range will be based on several factors including, but limited to, relevant education, qualifications, certifications, experience, skills, seniority, geographic location, business sector, performance, shift, travel requirements, sales or revenue-based metrics, market benchmarking data, any collective bargaining agreements, and business or organizational needs. This job is also eligible for a discretionary bonus, which, along with base salary and retirement contributions, is part of our total comprehensive package. We offer a comprehensive package of benefits including paid time off, medical/dental/vision insurance, retirement, life insurance and other benefits to eligible employees.

Note: No amount of pay is considered to be wages or compensation until such amount is earned, vested, and determinable. The amount and availability of any bonus, commission, production, or any other form of compensation that are allocable to a particular employee remains in the Company's sole discretion unless and until paid and may be modified at the Company’s sole discretion, consistent with the law.

Neuberger is an equal opportunity employer. The Firm and its affiliates do not discriminate in employment because of race, creed, national origin, religion, age, color, sex, marital status, sexual orientation, gender identity, disability, citizenship status or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact [email protected].

Learn about the Applicant Privacy Notice.

Skills Required

  • BS/BA in Computer Science, Information Systems, Engineering, or equivalent experience
  • 5+ years in Observability/APM/SRE/Platform Engineering delivering production-grade telemetry and reliability outcomes
  • Proficiency operating in Windows Server and Unix (Linux/Solaris) environments including service instrumentation and agent/collector deployment
  • Experience designing and operating distributed tracing, metrics and logging standards, SLOs/error budgets, and actionable alerting
  • Hands-on experience with cloud monitoring across Azure and AWS, integrating platform telemetry into centralized observability solutions
  • Hands-on experience with Observability/APM suites (OpenView, AppDynamics, Datadog) and network management tools (Network Node Manager, Network Automation, NetProfiler)
  • Scripting and automation expertise (Python, PowerShell, Bash) and familiarity with APIs/SDKs; experience using infrastructure-as-code to manage observability configurations (e.g., Terraform) and configuration formats (e.g., YAML)
  • Proven ability to reduce alert noise and MTTR through correlation, enrichment, and threshold tuning; producing service maps, dependency views, and dashboards
  • Excellent communication and stakeholder management skills
  • Ability to work independently and collaboratively; strong documentation habits and attention to detail
  • Experience with .NET development (C#)
  • Experience in financial services or other regulated industries
  • Familiarity with ITSM integrations and CMDB alignment for incident, problem, and change processes
  • Exposure to CI/CD integration, synthetic testing strategies, performance/capacity analysis, and relevant observability/cloud certifications

Neuberger Berman Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Neuberger Berman and has not been reviewed or approved by Neuberger Berman.

  • Retirement Support Employer-funded retirement contributions are widely highlighted as a standout perk that materially strengthens total rewards. Feedback suggests this benefit often offsets concerns about lower cash compensation.
  • Healthcare Strength Comprehensive medical, dental, and vision coverage is characterized as solid and reliable. Feedback suggests health benefits are a stable pillar of the package.
  • Leave & Time Off Breadth PTO and paid leave are commonly viewed as supportive within the industry context. Feedback suggests time-off policies contribute meaningfully to perceived overall value.

Neuberger Berman Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, NY
2,667 Employees
Year Founded: 1939

What We Do

Neuberger Berman, founded in 1939, is a private, independent, employee-owned investment manager. The firm manages a range of strategies—including equity, fixed income, quantitative and multi-asset class, private equity, real estate and hedge funds—on behalf of institutions, advisors and individual investors globally. With offices in 25 countries, Neuberger Berman’s diverse team has over 2,400 professionals. For eight consecutive years, the company has been named first or second in Pensions & Investments Best Places to Work in Money Management survey (among those with 1,000 employees or more). In 2020, the PRI named Neuberger Berman a Leader, a designation awarded to fewer than 1% of investment firms for excellence in Environmental, Social and Governance (ESG) practices. The PRI also awarded Neuberger Berman an A+ in every eligible category for our approach to ESG integration across asset classes. For important disclosures: http://www.nb.com/linkedin

Similar Jobs

LangChain Logo LangChain

Senior Front-end Engineer

Information Technology • Software • Database
In-Office
3 Locations
123 Employees
155K-195K Annually

LangChain Logo LangChain

Software Engineer

Information Technology • Software • Database
In-Office
3 Locations
123 Employees
155K-195K Annually

LangChain Logo LangChain

Senior Full-stack Engineer

Information Technology • Software • Database
In-Office
3 Locations
123 Employees
165K-198K Annually

LangChain Logo LangChain

Full-stack Engineer

Information Technology • Software • Database
In-Office
3 Locations
123 Employees
140K-175K Annually

Similar Companies Hiring

Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account