Prevalent AI Jobs

Data Architect

Prevalent AI

Data Architect

Posted 24 Days Ago

Be an Early Applicant

Kakkanad, Ernakulam, Kerala, IND

In-Office

Mid level

Artificial Intelligence • Information Technology • Software • Database • Analytics

The Role

The Data Architect will develop and govern data models within the Security Data Fabric, ensuring data integrity and flexibility while collaborating with technical teams and utilizing AI tools to enhance decision-making and model design.

Summary Generated by Built In

Job Description — Data Architect

Security Data Fabric

Role purpose :

At Prevalent AI, we empower organizations to take control of every risk across every attack surface. Our clients rely on our cutting-edge Security Data Fabric as the foundation for comprehensive Exposure Management, enabling enhanced decision-making. By helping clients see everything, fix what matters, and stop attacks before they happen, we’re reshaping the future of security.

The Data Architect defines and governs the canonical data semantics of the Data Fabric and ensures that data remains trustworthy, traceable, and evolvable as usage, scale, and complexity grow.

You own the logical data model and ontology of the fabric, and you control how it evolves. Beyond initial modelling, you define how cross-cutting concerns — lineage, provenance, data quality, retention, and versioning — are represented in the model and enforced consistently. You ensure that semantic changes are explicit, versioned, and backward-compatible, so analysts and product teams can experiment safely without eroding trust in canonical data.

You work alongside a Technical Architect who owns platform, infrastructure, and the physical realisation of the model across stores.

Key accountabilities :

Canonical model and ontology

Own the logical data model and ontology of the Data Fabric — the definitive representation of core entities, their attributes, and the relationships between them — along with the layer boundaries within the fabric.

Evaluate modelling approaches across the semantic spectrum, from property graph schemas through document schemas to more formal semantic frameworks, and recommend the approach that best fits expressiveness, tooling, and team capability.

Provide guidance on how query patterns and use cases shape modelling choices. Work with product, analyst, and engineering teams so that the canonical model is effective for the queries it must serve, not only theoretically sound.

Align canonical models with external standards where valuable, including OCSF, STIX, MITRE D3FEND, etc. Decide what to adopt, extend, or set aside.

Define canonical entity and reference data: what an asset, identity, finding, threat, or control is; and the controlled vocabularies that populate their attributes.

Lineage, provenance, quality, retention, and versioning

Define how lineage and provenance are represented in the canonical model, so every entity, attribute, and relationship is traceable to source, collection time, and confidence.

Define how data quality is expressed in the model — through constraints, invariants, and health metrics — and how it is measured consistently across consumers.

Define how sensitivity is captured as metadata on the model: classification of entities and attributes for PII, confidentiality, and regulatory category, so downstream consumers and platform controls can act on it consistently.

Define how time is modelled as a first-class concern, including valid time and transaction time, and define how retention is represented and enforced.

Define versioning for entities, attributes, and relationships, and the compatibility semantics that consumers can rely on.

Semantic change governance

Own the rules and process for schema evolution and semantic change. Make decisions fast and with discipline.

Run the RFC/ADR process for model changes. Ensure changes are explicit, versioned, and backward-compatible by default.

Define the deprecation and migration path when breaking changes are unavoidable.

Recommend and implement modelling standards, naming conventions, and documentation practices.

AI-Native way of working

Use AI tools (Claude, Codex, and equivalents) as primary instruments for everyday work: drafting model designs, exploring trade-offs, generating ADRs, producing documentation, and prototyping schema and validation artefacts.

Apply AI to accelerate decision-making — surveying prior art, comparing alternatives, stress-testing designs against query patterns and edge cases — so that decisions are reached in days/hours rather than weeks.

Build a personal practice of AI-assisted modelling and share it with the team. Raise the team’s floor on effective AI usage through example, review, and reusable prompts.

Engagement and mentorship

Translate business and security domain requirements into durable model decisions. Engage stakeholders with varying technical backgrounds and build consensus on contested questions.

Coach and mentor engineers, analysts, and architects on data modelling practice.

Out of scope

Entity resolution logic and matching strategies, and graph inference rules.

Physical storage, pipeline execution, and infrastructure design.

Product-specific metrics, scoring logic, and UI representations.

Skills and experience :

Required

Data modelling depth. Strong conceptual, logical, and physical modelling across relational, document, and graph paradigms.

Graph modelling. Hands-on experience modelling for property graph databases (Neo4j or equivalent). Able to reason about node vs relationship vs property decisions and their consequences.

Semantic modelling literacy. Familiarity with the spectrum of approaches to formal data semantics, from property graph schemas to RDF/OWL with SHACL. Able to evaluate trade-offs between expressiveness, tooling maturity, and team capability without being dogmatic about any one approach.

Query-pattern-aware modelling. Demonstrated ability to shape models around the queries and use cases they must serve, balancing semantic correctness with practical performance and developer experience.

Modern data stack. Experience with lakehouse architectures (Apache Iceberg, Delta Lake, or equivalent) and at least one cloud data platform such as Databricks, Snowflake, or Synapse.

Query languages. Working proficiency in SQL. Familiarity with Cypher, SPARQL or GraphQL is valuable.

Metadata and governance tooling. Practical experience with metadata management, business glossaries, and at least one metadata platform.

AI-assisted working practice. Active, daily user of AI coding and reasoning tools (Claude, Codex, or equivalents). Comfortable using them to draft designs, explore alternatives, generate documentation, and prototype model artefacts. Treats AI as a force multiplier on the work, not a separate concern.

Analytical and communication skills. Strong written, verbal, and presentation skills. Able to engage technical and non-technical audiences and build consensus.

Valuable

Security domain fluency. Familiarity with OCSF, vulnerability data (CVE, CVSS, EPSS, KEV, CWE), asset and identity models, compliance frameworks as data, and threat intelligence standards (STIX, TAXII, MITRE ATT&CK).

Modelling tooling. Experience with Hackolade, Protégé, or equivalent tools across graph, document, and ontology modelling.

Methodology breadth. Familiarity with Kimball, Inmon, and Data Vault, and judgement on when each applies.

Education :

Degree in Computer Science, Engineering, or a related discipline.
A Master’s degree or equivalent experience is valued. Directly relevant experience can substitute for formal qualifications.

Skills Required

Strong conceptual, logical, and physical data modeling skills across relational, document, and graph paradigms
Hands-on experience with property graph databases
Familiarity with formal data semantics
Experience with lakehouse architectures and cloud data platforms
Working proficiency in SQL and familiarity with graph query languages
Practical experience with metadata management tools
Active user of AI coding and reasoning tools
Strong analytical and communication skills
Familiarity with security domain standards and frameworks
Experience with modeling tools
Familiarity with data modeling methodologies
Degree in Computer Science, Engineering, or related discipline
Master's degree or equivalent experience is valued

View all jobs at Prevalent AI

View Prevalent AI Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: London

157 Employees

Year Founded: 2017

What We Do

Prevalent AI was founded to assemble the world’s best AI and Data Science talent, a team capable of building the security analytics of the future. In a security technology landscape filled with rigid, siloed solutions and disparate data, organizations are unable to tackle threats and vulnerabilities effectively. By combining our Security Data Fabric with AI-powered Exposure Management, we provide our clients with complete clarity of their cyber risk. Our Security Data Fabric automates the integration of complex and disparate data into a single unified knowledge graph, turning data chaos into data clarity with AI-powered entity resolution. Our Exposure Management platform identifies every attack surface, contextualizes and prioritizes risk findings, and rapidly remediates exposures — so you’ll always stay one step ahead of attackers.