Senior Data Engineer

Posted 4 Days Ago
Be an Early Applicant
2 Locations
Remote
Senior level
Information Technology • Software • Cybersecurity
The Role
As a Senior Data Engineer, you will architect security data ecosystems by designing data lakehouse architectures, implementing real-time streaming pipelines, and enabling AI/ML features. You will manage data ingestion patterns and ensure system integrity through automation and observability.
Summary Generated by Built In

As a Senior Data Engineer, you will be the architect of our security data ecosystem. Your primary mission is to design and build high-performance data lake architectures and real-time streaming pipelines that serve as the foundation for COGNNA's Agentic AI initiatives. You will ensure that our AI models have access to fresh, high-quality security telemetry through sophisticated ingestion patterns.

Key Responsibilities

1. Data Lake & Storage Architecture

  • Architectural Design: Design and implement multi-tier Data Lakehouse architectures to support both structured security logs and unstructured AI training data.
  • Storage Optimization: Define lifecycle management, partitioning, and clustering strategies to ensure high-performance querying while optimizing for cloud storage costs.
  • Schema Evolution: Manage complex schema evolution for security telemetry, ensuring compatibility with downstream AI/ML feature engineering.

2. Real-Time & Streaming Processing

  • Streaming Ingestion: Build and manage low-latency, high-throughput ingestion pipelines capable of processing millions of security events per second in real-time.
  • Unified Processing: Design unified batch and stream processing architectures to ensure consistency across historical analysis and real-time threat detection.
  • Event-Driven Workflows: Implement event-driven patterns to trigger AI agent reasoning based on incoming live data streams.

3. AI/ML Enablement & Feature Engineering

  • Vector Data Foundations: Architect the data infrastructure required to support semantic search applications and variants of RAG architectures for our generative AI models.
  • Feature Management: Design and maintain a centralized repository for ML features, ensuring consistent data is used for both model training and real-time inference.
  • AI Pipeline Orchestration: Build automated workflows to handle data preparation, model evaluation, and deployment within our cloud AI ecosystem.

4. DataOps & Systems Design

  • Infrastructure as Code: Utilize declarative tools (e.g., Terraform) to manage the entire lifecycle of our cloud data resources and AI endpoints.
  • Quality & Observability: Implement automated data quality frameworks and real-time monitoring to detect "data drift" or pipeline failures before they impact AI model performance.

Requirements
  • Experience & Education: 5+ years in Data Engineering or Backend Engineering, focused on large-scale distributed systems. B.S. or M.S. in Computer Science or a related technical field.
  • Cloud Architecture: Deep architectural mastery of the Google Cloud Platform ecosystem, specifically regarding managed analytical warehouses, serverless compute, and identity/access management. Proven track record of deploying enterprise-scale Data Lakehouses from scratch.
  • Real-Time Mastery: Expertise in building production-grade distributed messaging and stream processing engines (e.g., managed Apache Beam/Flink environments) capable of handling high-velocity telemetry.
  • AI Enablement: Strong understanding of how data architecture impacts AI performance. Experience building embedding pipelines, feature stores, and automated workflows for model training and evaluation.
  • Software Fundamentals: Expert-level Python and advanced SQL. Proficiency in high-performance languages like Go or Scala is highly desirable.
  • Operational Excellence: Advanced knowledge of CI/CD, containerization on Kubernetes, and managing cloud infrastructure through code to ensure reproducible environments.
Preferred Qualifications
  • Experience with dbt for modern analytics engineering.
  • Understanding of cybersecurity data standards (OCSF/ECS).
  • Previous experience in an AI-first startup or a high-growth security tech company.

Benefits

💰 Competitive Package – Salary + equity options + performance incentives
🧘 Flexible & Remote – Work from anywhere with an outcomes-first culture
🤝 Team of Experts – Work with designers, engineers, and security pros solving real-world problems
🚀 Growth-Focused – Your ideas ship, your voice counts, your growth matters
🌍 Global Impact – Build products that protect critical systems and data

Top Skills

Apache Beam
Apache Flink
Dbt
Go
Google Cloud Platform
Kubernetes
Python
Scala
SQL
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
50 Employees
Year Founded: 2022

What We Do

Detect the Undetectable. Defeat the Unpredictable.

Similar Jobs

Remofirst Logo Remofirst

Data Quality Analyst

Information Technology
Remote
15 Locations
79 Employees

tactical Logo tactical

Project Manager

AdTech • Digital Media
Remote
Egypt
131 Employees

Convene Logo Convene

BD Software sales within Oil & Gas industry (Must)

Digital Media • Software • Business Intelligence • Consulting
Remote
Egypt
648 Employees

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account