Samba TV

Senior Ontologist - Knowledge Graph & Identity

Reposted Yesterday

Be an Early Applicant

San Francisco, CA, USA

In-Office

180K-230K Annually

Mid level

AdTech • Machine Learning

The Role

The Data Scientist will manage end-to-end data projects focusing on knowledge graphs and identity, mentor juniors, and ensure high-quality code and documentation.

Summary Generated by Built In

Samba is a media intelligence company. We know what the world is watching, reading, and thinking about — in real time, at scale, across every screen. Our data exists with the consent of over a billion people, organized into the most complete picture of consumer attention ever built. The biggest brands in the world use that picture to make smarter decisions. We think it’s the most interesting data asset on the planet, because it’s the most culturally relevant.

As Senior Ontologist on Samba TV's Knowledge Graph & Identity team, you will own the design, development, and governance of the semantic data models and ontological frameworks that sit at the foundation of Samba's knowledge graph. You are the domain authority for how Samba represents and relates the entities that matter most to our business - and you ensure that representation is rigorous, scalable, and aligned with industry standards.

This is a hands-on technical role. You will spend the majority of your time designing ontologies, writing SPARQL, building knowledge graph pipelines, and working closely with data engineering and data science peers to put your models into production. You bring enough breadth in ML and AI to leverage embedding-based and LLM-augmented approaches where they strengthen the graph, and you contribute meaningfully to entity resolution and identity linking work that depends on the semantic layer you define.

This role reports to the Data Science Manager, Knowledge Graph & Identity.

What You'll Do:

Ontology Design & Governance

Own the end-to-end design, development, and versioning of Samba TV's core ontologies in RDF/RDFS/OWL - defining entity classes, properties, hierarchies, and constraints that accurately model Samba's data domain at scale
Author and maintain SHACL shapes for post-load graph validation, consistency checking, and data quality enforcement
Define and document derived-attribute schemas - genre affinity, brand affinity, topic affinity, lifecycle signals, and viewing summaries - and own the logical definitions that govern how raw events become durable graph attributes
Establish ontology design standards, change management processes, and versioning practices; evaluate alignment with W3C standards and relevant industry schemas (Schema.org, EIDR, DDEX, W3C PROV)
Lead ontology design reviews with product, data engineering, and data science stakeholders - articulating trade-offs between expressivity, scalability, and query performance clearly

Event-to-Ontology Derivation

Define the aggregation and scoring logic that transforms raw TV viewership and web activity events into the durable affinities, summaries, and inferred signals that live in the graph
Co-own derivation pipeline design with data engineering - specifying transformation logic, intermediate schemas, and validation checkpoints for Databricks/Spark pipelines that feed the materialized graph substrate
Reason carefully about what belongs in the graph vs. what should remain virtualized in the data lake - balancing query performance against storage and refresh cost

Knowledge Graph Development & AI Integration

Build and maintain production-quality knowledge graph pipelines in Python and SPARQL - well-tested, documented, and scalable to Samba's data volumes
Design and implement entity resolution and record linkage pipelines that map real-world entities (content titles, devices, audiences, advertisers) to canonical knowledge graph nodes
Develop enrichment workflows that integrate third-party data sources (metadata providers, identity vendors, web sources) into Samba's knowledge graph in a consistent, governed way
Apply embedding-based and LLM-augmented approaches to ontology mapping, entity disambiguation, and semantic similarity problems
Support content and semantic embedding pipelines that feed into the vector store and underpin GraphRAG-based AI solutions

Cross-functional Collaboration & Mentorship

Partner with data engineering and platform teams to ensure the knowledge graph is integrated, queryable, and production-ready at scale
Collaborate with product to translate business requirements into ontological and graph data model decisions
Formally mentor Ontology Engineers and junior data scientists on semantic modeling, SHACL design patterns, and graph best practices
Lead internal technical talks and workshops on ontology, knowledge graph, and semantic web topics

Who You Are:

Must-Haves

5–8 years of hands-on experience in ontology engineering, semantic data modeling, or knowledge graph development - with a demonstrable track record of production ontologies at scale
Deep expertise in W3C semantic web standards: RDF, RDFS, OWL, SPARQL 1.1, and SHACL - with hands-on experience building and validating graph schemas in a production triplestore (Amazon Neptune, Stardog, GraphDB, Jena, or equivalent)
Strong Python - production-quality, well-tested code; comfortable building data pipelines and graph processing workflows
First-principles understanding of description logics, ontology design patterns, and the practical trade-offs between OWL expressivity and triplestore scalability
Hands-on experience with entity resolution, record linkage, or deduplication at scale - mapping messy, multi-source real-world data to clean ontological representations
Bachelor's degree required in Computer Science, Information Science, Computational Linguistics, Mathematics, or a related field; Master's or PhD strongly preferred
Strong communicator - able to defend ontological modeling decisions in design reviews and explain trade-offs to non-specialist stakeholders

Strongly Preferred

Hands-on experience with Amazon Neptune or Stardog - including data virtualization (Neptune Orion or Stardog Virtual Graphs) over data lake sources
Experience designing aggregation and derivation logic that converts raw behavioral event data into durable, graph-resident derived attributes
Domain knowledge in media, entertainment, or ad tech - TV viewership (ACR/STB), digital audience modeling (device graphs, identity resolution), or ad exposure data
Familiarity with industry content and identity schemas: EIDR, Schema.org VideoObject, DDEX, or equivalent
Experience with embedding models, vector databases (Milvus, Pinecone, Weaviate), and GraphRAG architectures (LangChain/LlamaIndex)
Familiarity with GNN-based approaches to knowledge graph reasoning or entity resolution a plus
Working knowledge of PySpark and Databricks for large-scale transformation pipelines

Samba is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We strive to empower connection with one another, reflect the communities we serve, and tackle meaningful projects that make a real impact.

Samba may collect personal information directly from you, as a job applicant, Samba may also receive personal information from third parties, for example, in connection with a background, employment or reference check, in accordance with the applicable law. For further details, please see Samba's Applicant Privacy Policy. For residents of the EU , Samba Inc. is the data controller.

Skills Required

Bachelor's degree in Statistics, Data Science, Computer Science, Mathematics or a related field
3-5 years of hands-on data science experience
Advanced Python, SQL, and PySpark skills
Experience with Databricks, Delta Lake, and cloud platforms
Solid command of core ML techniques
Familiarity with MLOps practices
Strong communication skills
Ability to mentor junior data scientists

View all jobs at Samba TV

View Samba TV Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Francisco, CA

318 Employees

Year Founded: 2008

What We Do

Television remains a vibrant cultural influence and an essential source of entertainment and information worldwide. Tremendous growth in content choices, and viewing platforms that allow us to watch anything, anytime, on any screen, has actually made it harder for viewers to discover and keep up with all the great programming available. It’s also more competitive for content providers to keep your attention, and for marketers to make strong, measurable connections with their target consumers. Technology that improves the viewing experience, enables content discovery, and addresses audience fragmentation across screens will strengthen television’s business model and relevance to consumers. Data is at the center of any solution to make TV better. Samba TV's technology is built into Smart TVs and easily maps to smart phones and tablets. By recognizing what's on screen, Samba TV learns what viewers like and using machine learning algorithms, enables discovery of shows and actors in a whole new way. Likewise, our data and measurement products are transforming the way stakeholders across the media landscape are thinking about their business. Given the dramatic growth in streaming services, connected devices, time-shifting, and multi-screen viewership, our data products solve real problems and create a meaningful competitive advantage for our clients.