Lead Data Scientist

Posted Yesterday
2 Locations
In-Office or Remote
166K-214K Annually
Senior level
Software
The Role
Lead development and deployment of NLP and transformer-based/LLM models for financial surveillance and compliance. Mentor junior staff, perform EDA, annotation and model analysis, contribute to model governance, and collaborate across product, engineering, and business stakeholders.
Summary Generated by Built In
Who are we?

Smarsh empowers its customers to manage risk and unleash intelligence in their digital communications. Our growing community of over 6500 organizations in regulated industries counts on Smarsh every day to help them spot compliance, legal or reputational risks in 80+ communication channels before those risks become regulatory fines or headlines.  Relentless innovation has fueled our journey to consistent leadership recognition from analysts like Gartner and Forrester, and our sustained, aggressive growth has landed Smarsh in the annual Inc. 5000 list of fastest-growing American companies since 2008.

Summary
 
As a Lead Data Scientist (NLP & Financial Compliance) at Smarsh, you will spearhead the development of state-of-the-art natural language processing (NLP) and large language model (LLM) solutions that power next-generation compliance and surveillance systems. You’ll work on highly specialized problems at the intersection of natural language processing, communications intelligence, financial supervision, and regulatory compliance, where unstructured data from emails, chats, voice transcripts, and trade communications hold the keys to uncovering misconduct and risk.
 
The role will involve working with other Senior Data Scientists and mentoring Associate Data Scientists in analyzing complex data, generating insights, and creating solutions as needed across a variety of tools and platforms. This role demands both technical excellence in NLP modeling and a deep understanding of financial domain behavior—including insider trading, market manipulation, off-channel communications, MNPI, bribery, and other supervisory risk areas. The ideal candidate for this position will possess the ability to perform both independent and team-based research and generate insights from large data sets with a hands-on/can do attitude of servicing/managing day to day data requests and analysis.
 
This role also offers a unique opportunity to get exposure to many problems and solutions associated with taking machine learning and analytics research to production. On any given day, you will have the opportunity to interface with business leaders, machine learning researchers, data engineers, platform engineers, data scientists and many more, enabling you to level up in true end-to-end data science proficiency.

How will you contribute?

  • Collect, analyze, and interpret small/large datasets to uncover meaningful insights to support the development of statistical methods / machine learning algorithms.
  • Lead the design, training, and deployment of NLP and transformer-based models for financial surveillance and supervisory use cases (e.g., misconduct detection, market abuse, trade manipulation, insider communication).
  • Development of machine learning models and other analytics following established workflows, while also looking for optimization and improvement opportunities
  • Data annotation and quality review 
  • Exploratory data analysis and model fail state analysis 
  • Contribute to model governance, documentation, and explainability frameworks aligned with internal and regulatory AI standards.
  • Client/prospect guidance in machine learning model and analytic fine-tuning/development processes
  • Provide guidance to junior team members on model development and EDA
  • Work with Product Manager(s) to intake project/product requirements and translate these to technical tasks within the team’s tooling, technique and procedures
  • Continued self-led personal development

What will you bring?

  • Strong understanding of financial markets, compliance, surveillance, supervision, or regulatory technology
  • Experience with one or more data science and machine/deep learning frameworks and tooling, including scikit-learn, H2O, keras, pytorch, tensorflow, pandas, numpy, carot, tidyverse
  • Command of data science and statistics principles (regression, Bayes, time series, clustering, P/R, AUROC, exploratory data analysis etc…)
  • Strong knowledge of key programming concepts (e.g. split-apply-combine, data structures, object-oriented programming)
  • Solid statistics knowledge (hypothesis testing, ANOVA, chi-square tests, etc…)
  • Knowledge of NLP transfer learning, including word embedding models (gloVe, fastText, word2vec) and transformer models (Bert, SBert, HuggingFace, and GPT-x etc.)
  • Experience with natural language processing toolkits like NLTK, spaCy, Nvidia NeMo
  • Knowledge of microservices architecture and continuous delivery concepts in machine learning and related technologies such as helm, Docker and Kubernetes
  • Familiarity with Deep Learning techniques for NLP.
  • Familiarity with LLMs - using ollama & Langchain
  • Excellent verbal and written skills
  • Proven collaborator, thriving on teamwork
  •  
    Preferred Qualifications
  • Master’s or Doctor of Philosophy degree in Computer Science, Applied Math, Statistics, or a scientific field
  • Familiarity with cloud computing platforms (AWS, GCS, Azure)
  • Experience with automated supervision/surveillance/compliance tools

About our culture

Smarsh hires lifelong learners with a passion for innovating with purpose, humility and humor. Collaboration is at the heart of everything we do. We work closely with the most popular communications platforms and the world’s leading cloud infrastructure platforms. We use the latest in AI/ML technology to help our customers break new ground at scale. We are a global organization that values diversity, and we believe that providing opportunities for everyone to be their authentic self is key to our success. Smarsh leadership, culture, and commitment to developing our people have all garnered Comparably.com Best Places to Work Awards. Come join us and find out what the best work of your career looks like.

Skills Required

  • Strong understanding of financial markets, compliance, surveillance, supervision, or regulatory technology
  • Experience with data science and machine/deep learning frameworks (scikit-learn, H2O, keras, pytorch, tensorflow, pandas, numpy, carot, tidyverse)
  • Command of data science and statistics principles (regression, Bayes, time series, clustering, P/R, AUROC, EDA)
  • Strong knowledge of programming concepts (split-apply-combine, data structures, object-oriented programming)
  • Solid statistics knowledge (hypothesis testing, ANOVA, chi-square tests)
  • Knowledge of NLP transfer learning and transformer models (GloVe, fastText, word2vec, Bert, SBert, HuggingFace, GPT-x)
  • Experience with NLP toolkits (NLTK, spaCy, Nvidia NeMo)
  • Knowledge of microservices architecture and continuous delivery (helm, Docker, Kubernetes)
  • Familiarity with deep learning techniques for NLP
  • Familiarity with LLM tooling (ollama, Langchain)
  • Excellent verbal and written communication skills
  • Proven collaborator and team player
  • Master's or PhD in Computer Science, Applied Math, Statistics, or related (preferred)
  • Familiarity with cloud platforms (AWS, GCS, Azure) (preferred)
  • Experience with automated supervision/surveillance/compliance tools (preferred)

Smarsh Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Smarsh and has not been reviewed or approved by Smarsh.

  • Leave & Time Off Breadth Time off is characterized by unlimited PTO, generous vacation allowances, and paid holidays, with policies that support taking time away. These elements are positioned as a core strength that helps balance lower base pay in some roles.
  • Wellbeing & Lifestyle Benefits Perks include wellness programs, commuter and bike reimbursement, volunteer time off, peer recognition, remote-work support, and a sabbatical option. The variety of non-salary benefits contributes to a supportive work-life environment.
  • Retirement Support A 401(k) with employer match and profit sharing is offered, with immediate vesting described for the match. This strengthens the long-term financial component of total rewards.

Smarsh Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Redwood City, CA
1,470 Employees

What We Do

Smarsh provides cloud-based archiving and compliance solutions for companies in regulated and litigious industries.

Similar Jobs

CrowdStrike Logo CrowdStrike

Lead Data Scientist

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
USA
10000 Employees
125K-180K Annually
Remote
United States
40741 Employees
142K-196K Annually

OneSix Logo OneSix

Lead Data Scientist

Artificial Intelligence • Big Data • Machine Learning • Analytics • Business Intelligence • Consulting • Generative AI
Remote or Hybrid
2 Locations
50 Employees
160K-200K Annually

Stord Logo Stord

Lead Data Scientist

Logistics • Software
In-Office or Remote
2 Locations
222 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account