What You’ll Do:
- Design and Develop: Architect and implement high-scale data pipelines leveraging Apache Spark, Flink, and Airflow to process streaming and batch data efficiently.
- Data Lakehouse and Storage Optimization: Build and maintain data lakes and ingestion frameworks using Snowflake, Apache Iceberg, and Parquet, ensuring scalability, cost efficiency, and optimal query performance.
- Data Modeling and System Design: Design robust, maintainable data models to handle structured and semi-structured datasets for analytical and operational use cases.
- Real-time and Batch Processing: Develop low-latency pipelines using Kafka and Spark Structured Streaming, supporting billions of events per day.
- Workflow Orchestration: Automate and orchestrate end-to-end ELT processes with Airflow, ensuring reliability, observability, and recovery from failures.
- Cloud Infrastructure: Build scalable, secure, and cost-effective data solutions leveraging AWS native services (S3, Lambda, ECS, etc.).
- Monitoring and Optimization: Implement strong observability, data quality checks, and performance tuning to maintain high data reliability and pipeline efficiency.
What We’re Looking For:
- Bachelor’s or Master's degree in Computer Science, Engineering, or a related field
- 3+ years of experience in data engineering with a proven track record of designing large-scale, distributed data systems.
- Strong expertise in Snowflake and other distributed analytical data stores.
- Hands-on experience with Apache Spark, Flink, Airflow, and modern data lakehouse formats (Iceberg, Parquet).
- Deep understanding of data modeling, schema design, query optimization, and partitioning strategies at scale.
- Proficiency in Python, SQL, Scala, Go/Nodejs with strong debugging and performance-tuning skills.
- Experience in streaming architectures, CDC pipelines, and data observability frameworks.
- Proficient in deploying containerized applications (Docker, Kubernetes, ECS).
- Familiarity with using AI Coding assistants like Cursor, Claude Code, or GitHub Copilot
Preferred Qualifications:
- Exposure to CI/CD pipelines, automated testing, and infrastructure-as-code for data workflows.
- Familiarity with streaming platforms (Kafka, Kinesis, Pulsar) and real-time analytics engines (Druid, Pinot, Rockset).
- Understanding of data governance, lineage tracking, and compliance requirements in a multi-tenant SaaS platform.
Top Skills
What We Do
Safe Security is a pioneer in the “Cybersecurity and Digital Business Risk Quantification” (CRQ) space. It helps organizations measure and mitigate enterprise-wide cyber risk in real-time using it’s ML Enabled API-First SAFE Platform by aggregating automated signals across people, process and technology, both for 1st & 3rd Party to dynamically predict the breach likelihood (SAFE Score) & $$ Value at Risk of an organization
Headquartered in Palo Alto, Safe Security has over 200 customers worldwide including multiple Fortune 500 companies averaging an NPS of 73 in 2020.
Backed by John Chambers and senior executives from Softbank, Sequoia, PayPal, SAP, and McKinsey & Co., it was also one of the Top Contributors to the National Vulnerability Database(NVD) of the U.S. Government in 2019 and the ATT&CK MITRE Contributor in 2020.
The company, since 2018, has also been working with MIT in joint research for the development of their SAFE Scoring Algorithm. Safe Security has received several awards including the Morgan Stanley CTO Innovation Award.








