Lead Data Scientist

Reposted 4 Days Ago
Be an Early Applicant
Kakkanad, Ernakulam, Kerala, IND
In-Office
Senior level
Artificial Intelligence • Information Technology • Software • Database • Analytics
The Role
Lead Data Scientist to develop AI-driven security solutions, mentor a team, and oversee advanced analytics projects in cybersecurity. Role involves designing machine learning models, collaborating with business SMEs, and ensuring responsible AI practices.
Summary Generated by Built In

Company Profile:

Prevalent AI (PAI) is a Security Data Science Company, founded in the UK, by experts recognized globally, for solving the world’s toughest security problems. We apply the world’s best Security Data Science knowledge and expertise to help companies understand, deploy, and support the most advanced security solutions, by developing a security architecture based on a deep understanding of Data Science, Security Tradecraft and Big Data Technologies. 

PAI’s Security Data Science (SDS) platform is a big data security analytics platform that can ingest wide range of security telemetry data and apply advanced analytical approaches to identify and detect control weakness and security risks within enterprises. 

 PAI team consists of Cyber Security Domain Specialists, Information Security Analysts, Data Scientists, Data Engineers, and Data Analysts focused on developing advanced security analytics solutions (Solution Development) and delivering security insights to our clients. 

Prevalent AI India Pvt Ltd., a subsidiary of Prevalent AI, has offices in Infopark, Cochin, Kerala. For more information, please visit https://www.prevalent.ai 

ROLE PURPOSE 

As a Lead Data Scientist at Prevalent, you will lead a team in developing AI-driven solutions that power our core Security Data Science Products. You will work with diverse, large-scale data to uncover insights, build predictive and generative AI models, and solve complex business problems in the cybersecurity and third-party risk management domain. 

Beyond hands-on technical work, you will help shape product strategy, drive innovation across the AI/ML stack, and mentor your team. This role offers the opportunity to experiment with cutting-edge technologies—including large language models and agentic AI—lead impactful projects, and make a real difference in Prevalent’s data-driven future. 

KEY ACCOUNTABILITIES 

Data Science & Machine Learning 

  • Collaborate with business SMEs to understand requirements and translate them into data science solutions using data preparation, visualization, statistical modeling, and machine learning techniques (supervised, unsupervised, and optimization) 
  • Design, build, and deploy predictive and classification models, including deep learning architectures (CNNs, Transformers, GNNs) suited to security data problems 
  • Analyze and validate data for consistency; develop prototypes to demonstrate key elements of models, visualizations, and data transformations 
  • Communicate insights and predictions through clear reports and visualizations tailored for both technical and non-technical audiences 
  • Work closely with engineering teams to ensure accurate, production-grade implementation of data science designs through documentation, prototype code, testing, and code reviews 

Generative AI & LLM Integration 

  • Design and implement LLM-powered features such as intelligent document processing, automated risk assessment, threat summarization, and conversational interfaces 
  • Build and optimize Retrieval-Augmented Generation (RAG) pipelines using vector databases (e.g., Pinecone, Weaviatepgvector) and embedding models for domain-specific knowledge retrieval 
  • Evaluate, fine-tune, and deploy foundation models (e.g., OpenAI, Anthropic, open-source LLMs such as Llama/Mistral) using techniques like LoRA, RLHF, and DPO 
  • Design agentic AI workflows and multi-step reasoning systems using frameworks such as LangChainLangGraph, or CrewAI for complex security automation tasks 
  • Implement prompt engineering best practices, evaluation frameworks, and guardrails to ensure reliable, safe, and auditable LLM outputs in production.



MLOps & Productionization 

  • Own the end-to-end ML lifecycle: experiment tracking (MLflow/W&B), model registry, CI/CD for ML, automated retraining, and model versioning 
  • Deploy and monitor models in production using cloud-native services (AWS SageMaker, GCP Vertex AI, or Azure ML) with containerized workflows (Docker, Kubernetes) 
  • Build model monitoring and observability pipelines to track data drift, performance degradation, and model health in real time 
  • Design and manage feature stores and data pipelines to ensure reproducibility and efficiency at scale.



LLMOps 

  • Build and manage LLM serving infrastructure using tools like vLLM, TGI (Text Generation Inference), or Triton Inference Server for efficient, low-latency model deployment 
  • Implement prompt versioning, management, and regression testing pipelines to ensure consistency and traceability across prompt iterations 
  • Set up LLM observability and tracing using platforms such as LangSmithArize Phoenix, or Helicone to monitor latency, token usage, cost, and output quality 
  • Optimize inference costs through strategies like semantic caching, request batching, model routing (large vs. small model tiering), and quantization 
  • Design and maintain automated evaluation pipelines for LLM outputs, combining programmatic evals, LLM-as-judge patterns, and human-in-the-loop review workflows 
  • Orchestrate production guardrails including content filtering, output validation, PII detection, and toxicity screening as part of the serving pipeline 
  • Manage LLM gateway and API layer for centralized rate limiting, usage tracking, key management, fallback routing, and multi-provider abstraction.



Responsible AI & Security 

  • Champion responsible AI practices: bias and fairness auditing, model explainability (SHAP, LIME), and compliance with AI governance frameworks 
  • Ensure robustness against adversarial attacks, prompt injection, data leakage, and other LLM-specific security risks 
  • Maintain documentation and audit trails for model decisions in alignment with regulatory and enterprise requirements.



TEAM LEADERSHIP 

  • Lead, mentor, and grow a team of data scientists by setting clear goals, assigning responsibilities, conducting regular 1:1s, and tracking performance 
  • Promote best practices in data science, solution architecture, code quality, and experimentation methodology across the team 
  • Communicate complex data-driven insights to non-technical stakeholders and executive leadership with clarity and impact 
  • Drive a culture of continuous learning, knowledge sharing, and innovation within the data science team 
  • Partner with Product Management and Engineering leadership to influence product roadmap, prioritize AI/ML initiatives, and conduct build-vs-buy analysis for AI capabilities.



SKILLS & EXPERIENCE 

Core Data Science & ML (Required) 

  • 8+ years of experience in Data Science or Machine Learning, with at least 2 years in a lead or senior IC role 
  • Strong proficiency in Python and SQL; working knowledge of R, Spark, or Scala is a plus 
  • Deep understanding of ML algorithms: logistic regression, tree-based models (XGBoostLightGBM), SVMs, KNN, ensemble methods, and neural networks 
  • Hands-on experience with deep learning frameworks (PyTorch, TensorFlow) and architectures (CNNs, RNNs, Transformers, Attention mechanisms) 
  • Strong foundation in NLP techniques: text classification, NER, sentiment analysis, topic modeling, and semantic search 
  • Experience with statistical analysis, A/B testing, causal inference, and experimental design.



Generative AI & LLMs (Required) 

  • Practical experience building applications with LLMs (GPT-4, Claude, Llama, Mistral, or equivalent) 
  • Hands-on experience designing RAG architectures, working with vector databases, and implementing embedding-based retrieval systems 
  • Familiarity with fine-tuning techniques (LoRAQLoRA, PEFT), RLHF/DPO, and prompt engineering methodologies 
  • Experience with agentic AI frameworks and multi-step LLM orchestration patterns 
  • Understanding of LLM evaluation, red-teaming, hallucination mitigation, and production guardrails 

Infrastructure & Tools (Required) 

  • Experience with cloud platforms (AWS, GCP, or Azure) and managed ML services (SageMaker, Vertex AI, Azure ML) 
  • Proficiency with MLOps tooling: experiment tracking (MLflow, W&B), model registries, and CI/CD for ML pipelines 
  • Familiarity with containerization (Docker, Kubernetes) and infrastructure-as-code practices 
  • Experience with modern data stack: data lakehouse architectures (Databricks, Snowflake), streaming (Kafka), and feature stores 
  • Proficiency with data visualization tools and frameworks (Tableau, StreamlitGradio, or D3.js) for prototyping and stakeholder communication 

Nice to Have 

  • Experience in cybersecurity, third-party risk management, or GRC (Governance, Risk, and Compliance) domains.
  • Contributions to open-source ML/AI projects.
  • Published research in ML, NLP, or AI safety.
  •  Experience with graph neural networks or knowledge graphs for security applications. 

EDUCATION 

Master’s or Ph.D. in Computer Science, Data Science, Statistics, Mathematics, Engineering, or a related quantitative field. Equivalent practical experience with a strong portfolio of ML/AI work will also be considered. 

 

Skills Required

  • 8+ years of experience in Data Science or Machine Learning
  • Strong proficiency in Python and SQL
  • Hands-on experience with deep learning frameworks like PyTorch or TensorFlow
  • Practical experience building applications with LLMs (GPT-4, Claude, Llama)
  • Experience with cloud platforms (AWS, GCP, Azure) and managed ML services
  • Master's or Ph.D. in Computer Science, Data Science, Statistics, Mathematics, Engineering
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: London
157 Employees
Year Founded: 2017

What We Do

Prevalent AI was founded to assemble the world’s best AI and Data Science talent, a team capable of building the security analytics of the future. In a security technology landscape filled with rigid, siloed solutions and disparate data, organizations are unable to tackle threats and vulnerabilities effectively. By combining our Security Data Fabric with AI-powered Exposure Management, we provide our clients with complete clarity of their cyber risk. Our Security Data Fabric automates the integration of complex and disparate data into a single unified knowledge graph, turning data chaos into data clarity with AI-powered entity resolution. Our Exposure Management platform identifies every attack surface, contextualizes and prioritizes risk findings, and rapidly remediates exposures — so you’ll always stay one step ahead of attackers.

Similar Jobs

Tufin Logo Tufin

Network Engineer

Security • Cybersecurity
Remote or Hybrid
India
500 Employees

Pfizer Logo Pfizer

Program Manager

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Remote or Hybrid
India
121990 Employees

Zscaler Logo Zscaler

Lead Technical Enablement Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
India
8697 Employees

Capco Logo Capco

Architect

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Remote or Hybrid
India
6000 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account