Prevalent AI Jobs

Lead Data Scientist

Prevalent AI

Lead Data Scientist

Reposted 22 Days Ago

Be an Early Applicant

Kakkanad, Ernakulam, Kerala, IND

In-Office

Senior level

Artificial Intelligence • Information Technology • Software • Database • Analytics

The Role

Lead Data Scientist to develop AI-driven security solutions, mentor a team, and oversee advanced analytics projects in cybersecurity. Role involves designing machine learning models, collaborating with business SMEs, and ensuring responsible AI practices.

Summary Generated by Built In

Company Profile:

Prevalent AI (PAI) is a Security Data Science Company, founded in the UK, by experts recognized globally, for solving the world’s toughest security problems. We apply the world’s best Security Data Science knowledge and expertise to help companies understand, deploy, and support the most advanced security solutions, by developing a security architecture based on a deep understanding of Data Science, Security Tradecraft and Big Data Technologies.

PAI’s Security Data Science (SDS) platform is a big data security analytics platform that can ingest wide range of security telemetry data and apply advanced analytical approaches to identify and detect control weakness and security risks within enterprises.

PAI team consists of Cyber Security Domain Specialists, Information Security Analysts, Data Scientists, Data Engineers, and Data Analysts focused on developing advanced security analytics solutions (Solution Development) and delivering security insights to our clients.

Prevalent AI India Pvt Ltd., a subsidiary of Prevalent AI, has offices in Infopark, Cochin, Kerala. For more information, please visit https://www.prevalent.ai

ROLE PURPOSE

As a Lead Data Scientist at Prevalent, you will lead a team in developing AI-driven solutions that power our core Security Data Science Products. You will work with diverse, large-scale data to uncover insights, build predictive and generative AI models, and solve complex business problems in the cybersecurity and third-party risk management domain.

Beyond hands-on technical work, you will help shape product strategy, drive innovation across the AI/ML stack, and mentor your team. This role offers the opportunity to experiment with cutting-edge technologies—including large language models and agentic AI—lead impactful projects, and make a real difference in Prevalent’s data-driven future.

KEY ACCOUNTABILITIES

Data Science & Machine Learning

Collaborate with business SMEs to understand requirements and translate them into data science solutions using data preparation, visualization, statistical modeling, and machine learning techniques (supervised, unsupervised, and optimization)

Design, build, and deploy predictive and classification models, including deep learning architectures (CNNs, Transformers, GNNs) suited to security data problems

Analyze and validate data for consistency; develop prototypes to demonstrate key elements of models, visualizations, and data transformations

Communicate insights and predictions through clear reports and visualizations tailored for both technical and non-technical audiences

Work closely with engineering teams to ensure accurate, production-grade implementation of data science designs through documentation, prototype code, testing, and code reviews

Generative AI & LLM Integration

Design and implement LLM-powered features such as intelligent document processing, automated risk assessment, threat summarization, and conversational interfaces

Build and optimize Retrieval-Augmented Generation (RAG) pipelines using vector databases (e.g., Pinecone, Weaviate, pgvector) and embedding models for domain-specific knowledge retrieval

Evaluate, fine-tune, and deploy foundation models (e.g., OpenAI, Anthropic, open-source LLMs such as Llama/Mistral) using techniques like LoRA, RLHF, and DPO

Design agentic AI workflows and multi-step reasoning systems using frameworks such as LangChain, LangGraph, or CrewAI for complex security automation tasks

Implement prompt engineering best practices, evaluation frameworks, and guardrails to ensure reliable, safe, and auditable LLM outputs in production.

MLOps & Productionization

Own the end-to-end ML lifecycle: experiment tracking (MLflow/W&B), model registry, CI/CD for ML, automated retraining, and model versioning

Deploy and monitor models in production using cloud-native services (AWS SageMaker, GCP Vertex AI, or Azure ML) with containerized workflows (Docker, Kubernetes)

Build model monitoring and observability pipelines to track data drift, performance degradation, and model health in real time

Design and manage feature stores and data pipelines to ensure reproducibility and efficiency at scale.

LLMOps

Build and manage LLM serving infrastructure using tools like vLLM, TGI (Text Generation Inference), or Triton Inference Server for efficient, low-latency model deployment

Implement prompt versioning, management, and regression testing pipelines to ensure consistency and traceability across prompt iterations

Set up LLM observability and tracing using platforms such as LangSmith, Arize Phoenix, or Helicone to monitor latency, token usage, cost, and output quality

Optimize inference costs through strategies like semantic caching, request batching, model routing (large vs. small model tiering), and quantization

Design and maintain automated evaluation pipelines for LLM outputs, combining programmatic evals, LLM-as-judge patterns, and human-in-the-loop review workflows

Orchestrate production guardrails including content filtering, output validation, PII detection, and toxicity screening as part of the serving pipeline

Manage LLM gateway and API layer for centralized rate limiting, usage tracking, key management, fallback routing, and multi-provider abstraction.

Responsible AI & Security

Champion responsible AI practices: bias and fairness auditing, model explainability (SHAP, LIME), and compliance with AI governance frameworks

Ensure robustness against adversarial attacks, prompt injection, data leakage, and other LLM-specific security risks

Maintain documentation and audit trails for model decisions in alignment with regulatory and enterprise requirements.

TEAM LEADERSHIP

Lead, mentor, and grow a team of data scientists by setting clear goals, assigning responsibilities, conducting regular 1:1s, and tracking performance

Promote best practices in data science, solution architecture, code quality, and experimentation methodology across the team

Communicate complex data-driven insights to non-technical stakeholders and executive leadership with clarity and impact

Drive a culture of continuous learning, knowledge sharing, and innovation within the data science team

Partner with Product Management and Engineering leadership to influence product roadmap, prioritize AI/ML initiatives, and conduct build-vs-buy analysis for AI capabilities.

SKILLS & EXPERIENCE

Core Data Science & ML (Required)

8+ years of experience in Data Science or Machine Learning, with at least 2 years in a lead or senior IC role

Strong proficiency in Python and SQL; working knowledge of R, Spark, or Scala is a plus

Deep understanding of ML algorithms: logistic regression, tree-based models (XGBoost, LightGBM), SVMs, KNN, ensemble methods, and neural networks

Hands-on experience with deep learning frameworks (PyTorch, TensorFlow) and architectures (CNNs, RNNs, Transformers, Attention mechanisms)

Strong foundation in NLP techniques: text classification, NER, sentiment analysis, topic modeling, and semantic search

Experience with statistical analysis, A/B testing, causal inference, and experimental design.

Generative AI & LLMs (Required)

Practical experience building applications with LLMs (GPT-4, Claude, Llama, Mistral, or equivalent)

Hands-on experience designing RAG architectures, working with vector databases, and implementing embedding-based retrieval systems

Familiarity with fine-tuning techniques (LoRA, QLoRA, PEFT), RLHF/DPO, and prompt engineering methodologies

Experience with agentic AI frameworks and multi-step LLM orchestration patterns

Understanding of LLM evaluation, red-teaming, hallucination mitigation, and production guardrails

Infrastructure & Tools (Required)

Experience with cloud platforms (AWS, GCP, or Azure) and managed ML services (SageMaker, Vertex AI, Azure ML)

Proficiency with MLOps tooling: experiment tracking (MLflow, W&B), model registries, and CI/CD for ML pipelines

Familiarity with containerization (Docker, Kubernetes) and infrastructure-as-code practices

Experience with modern data stack: data lakehouse architectures (Databricks, Snowflake), streaming (Kafka), and feature stores

Proficiency with data visualization tools and frameworks (Tableau, Streamlit, Gradio, or D3.js) for prototyping and stakeholder communication

Nice to Have

Experience in cybersecurity, third-party risk management, or GRC (Governance, Risk, and Compliance) domains.
Contributions to open-source ML/AI projects.
Published research in ML, NLP, or AI safety.
Experience with graph neural networks or knowledge graphs for security applications.

EDUCATION

Master’s or Ph.D. in Computer Science, Data Science, Statistics, Mathematics, Engineering, or a related quantitative field. Equivalent practical experience with a strong portfolio of ML/AI work will also be considered.

Skills Required

8+ years of experience in Data Science or Machine Learning
Strong proficiency in Python and SQL
Hands-on experience with deep learning frameworks like PyTorch or TensorFlow
Practical experience building applications with LLMs (GPT-4, Claude, Llama)
Experience with cloud platforms (AWS, GCP, Azure) and managed ML services
Master's or Ph.D. in Computer Science, Data Science, Statistics, Mathematics, Engineering

View all jobs at Prevalent AI

View Prevalent AI Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: London

157 Employees

Year Founded: 2017

What We Do

Prevalent AI was founded to assemble the world’s best AI and Data Science talent, a team capable of building the security analytics of the future. In a security technology landscape filled with rigid, siloed solutions and disparate data, organizations are unable to tackle threats and vulnerabilities effectively. By combining our Security Data Fabric with AI-powered Exposure Management, we provide our clients with complete clarity of their cyber risk. Our Security Data Fabric automates the integration of complex and disparate data into a single unified knowledge graph, turning data chaos into data clarity with AI-powered entity resolution. Our Exposure Management platform identifies every attack surface, contextualizes and prioritizes risk findings, and rapidly remediates exposures — so you’ll always stay one step ahead of attackers.