Vector Data Engineer

Posted 4 Days Ago
Be an Early Applicant
2 Locations
In-Office
Mid level
Healthtech • Pharmaceutical • Manufacturing
The Role
The Vector Data Engineer will design and implement semantic-search infrastructures, develop scalable pipelines for multi-omics data, and collaborate on AI applications.
Summary Generated by Built In

At Johnson & Johnson, we believe health is everything. Our strength in healthcare innovation empowers us to build a world where complex diseases are prevented, treated, and cured, where treatments are smarter and less invasive, and solutions are personal. Through our expertise in Innovative Medicine and MedTech, we are uniquely positioned to innovate across the full spectrum of healthcare solutions today to deliver the breakthroughs of tomorrow, and profoundly impact health for humanity. Learn more at https://www.jnj.com

Job Function:

Data Analytics & Computational Sciences

Job Sub Function:

Data Science

Job Category:

Scientific/Technology

All Job Posting Locations:

Cornellà de Llobregat, Barcelona, Spain, Madrid, Spain

Job Description:

Johnson and Johnson Innovative Medicine (J&J IM), a pharmaceutical company of Johnson & Johnson is recruiting for a Vector Data Engineer.  This position has a primary location of Barcelona, Spain. The secondary location is Madrid. This is a hybrid role. 

Our expertise in Innovative Medicine is informed and inspired by patients, whose insights fuel our science-based advancements. Visionaries like you work in teams that save lives by developing the medicines of tomorrow.  

Join us in developing treatments, finding cures, and pioneering the path from lab to life while championing patients every step of the way. Learn more at https://www.jnj.com/innovative-medicine 

 

Position Summary: 

The Vector Data Engineer designs and implements the embedding and semantic-search infrastructure that connects discovery, translational, and clinical data into AI-ready knowledge representations. 

This role bridges multi-omics data engineering and machine-learning infrastructure, enabling scientists and agentic tools to discover biological insights through vector-based search and reasoning. 

 

Key Responsibilities: 

  • Develop scalable pipelines that convert multi-omics and clinical data (e.g., proteomics, transcriptomics, spatial omics, biomarkers) into vectorized embeddings for AI and semantic retrieval. 

  • Build and maintain vector databases and hybrid data stores using technologies such as TileDB, Weaviate, or Snowflake Cortex. 

  • Collaborate with the Data Transformation Engineers to design standardized data formats suitable for embedding generation and cross-modality mapping. 

  • Integrate metadata, ontology terms, and provenance into vector representations to ensure traceability and governance compliance. 

  • Partner with AI/ML Team to deploy embeddings supporting agentic reasoning, semantic similarity, and cross-dataset query. 

  • Optimize indexing, retrieval, and inference performance across large-scale multi-omics data collections. 

  • Evaluate and incorporate emerging representation-learning and knowledge-graph techniques to improve data discoverability and model interoperability. 

 

Qualifications: 

  • MS/PhD in Computer Science, Computational Biology, Data Science, or related field. 

  • 3+ years of experience building or maintaining vector or semantic-retrieval infrastructure. 

  • Hands-on experience with multi-omics or biomedical data integration (e.g., RNA-seq, proteomics, clinical endpoints). 

  • Proficiency in Python and frameworks such as LangChain, Transformers, or sentence-embedding models. 

  • Familiarity with TileDB, Snowflake, Weaviate, FAISS, or other vector/array database systems. 

  • Understanding of metadata modeling, ontologies (e.g., OBO, UMLS), and FAIR data practices. 

  • Strong ability to collaborate across solution architecture, data science, and AI/ML teams. 

 

Strategic Impact: 

  • Multi-omics and clinical data assets transformed into interoperable, vectorized embeddings supporting scientific AI applications. 

  • AI can perform semantic queries and reasoning over governed datasets. 

  • Vector database infrastructure scales efficiently and complies with governance and lineage standards. 

  • Accelerated insight generation across discovery, translational, and clinical domains. 

 

#JRDDS 

 

Top Skills

Faiss
Langchain
Python
Snowflake
Tiledb
Transformers
Weaviate
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New Brunswick, NJ
143,612 Employees
Year Founded: 1886

What We Do

Profound Change Requires Boldness.

Johnson & Johnson is the largest and most broadly based healthcare company in the world. We’re producing life-changing breakthroughs every day, and have been for the last 130 years.

The combination of new technologies and your expertise enables amazing things to happen. Teams from J&J’s consumer business are creating digital tools to help people track the health of their skin. Those working in medical devices are 3-D printing artificial joints personalized for each patient, while researchers in pharmaceuticals use AI to discover lifesaving drugs. Imagine what the rest of our team of 134,000 people at 260 companies in more than 60 countries across the world is accomplishing. We redefine what it means to be a big company in today’s world.

Social Media Community Guidelines:
http://www.jnj.com/social-media-community-guidelines

Similar Jobs

SailPoint Logo SailPoint

Manager, Professional Services - fluent French and English

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
3 Locations

ServiceNow Logo ServiceNow

Architect

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Madrid, Comunidad de Madrid, ESP

Quantum Metric, Inc. Logo Quantum Metric, Inc.

Consultant

Cloud • eCommerce • Enterprise Web • Information Technology • Software
In-Office or Remote
Madrid, Comunidad de Madrid, ESP

Datadog Logo Datadog

Software Engineer

Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Easy Apply
Hybrid
Madrid, Comunidad de Madrid, ESP

Similar Companies Hiring

Axle Health Thumbnail
Logistics • Information Technology • Healthtech • Artificial Intelligence
Santa Monica, CA
17 Employees
Camber Thumbnail
Social Impact • Healthtech • Fintech
New York, NY
53 Employees
Sailor Health Thumbnail
Telehealth • Social Impact • Healthtech
New York City, NY
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account