Senior Data Engineer

Posted 3 Days Ago
Be an Early Applicant
Hiring Remotely in Córdoba, ARG
Remote
Senior level
Security • Cybersecurity
The Role
Design, build, and optimize scalable data pipelines and Iceberg-based data lakes to support ML/AI detection systems. Implement feature engineering frameworks, offline/online feature stores, training data pipelines, observability, and collaborate with data scientists while mentoring junior engineers.
Summary Generated by Built In

About Us:

 

Proofpoint is a global leader in human- and agent-centric cybersecurity. We protect how people, data, and AI agents connect across email, cloud, and collaboration tools. Over 80 of the Fortune 100, 10,000 large enterprises, and millions of smaller organizations trust Proofpoint to stop threats, prevent data loss, and build resilience across their people and AI workflows. Our mission is simple: safeguard the digital world and empower people to work securely and confidently. Join us in our pursuit to defend data and protect people.

How We Work:

At Proofpoint you’ll be part of a global team that breaks barriers to redefine cybersecurity guided by our BRAVE core values: 

Bold in how we dream and innovate

Responsive to feedback, challenges and opportunities

Accountable for results and best in class outcomes

Visionary in future focused problem-solving

Exceptional in execution and impact

The Role 

 

We're seeking a Senior Data Engineer to build and maintain the ML/AI data infrastructure powering our email security platform. In this role, you'll design and optimize scalable data pipelines that enable threat detection and investigation while supporting both machine learning models and LLM-powered agents that provide context-aware security insights. 

  

You'll work on our Detection Intelligence Platform (DIP) building feature engineering frameworks, and offline/online feature stores that serve as the foundation for ML model research and context engineering for AI agents. You'll collaborate with data scientists, ML engineers, and security researchers to build data models and context stores that power our detection systems and enable human security analysts to investigate threats effectively. 

  

Key Responsibilities: 

 

  • Develop and maintain scalable data pipelines on AWS/Azure using technologies such as Spark, Airflow, Athena, Kubernetes etc. to process structured and unstructured email data at scale 

  • Design and optimize Iceberg-based data lake tables and schemas for efficient storage, querying and versioning across petabyte-scale datasets distributed across data centers globally 

  • Build and manage feature engineering frameworks that support offline batch processing and online real-time feature serving for ML model training and inference 

  • Develop and maintain training data pipelines optimized for distributed ML model training, ensuring data lineage and reproducibility 

  • Collaborate with data scientists and security researchers to understand data requirements and translate them into robust, production-grade data solutions 

  • Monitor and optimize data pipeline performance, implementing observability and alerting to ensure data freshness and quality 

  • Mentor junior engineers and foster a culture of engineering excellence and knowledge sharing 

 

 

Required Experience: 

  

  • Several years of industry experience building and maintaining distributed data systems and high-scale data pipelines in a managed cloud environment (AWS / Azure / GCP) using big data processing engines such as Spark, Flink, Dask, Ray, Beam, DataBricks Workflows or similar 

  • Deep proficiency in Python for developing production-grade data processing code 

  • Strong experience with Infrastructure-as-Code frameworks, particularly Terraform 

  • Solid understanding and hands-on experience with open table formats for data lakes (Apache Iceberg, Hudi, DeltaLake) and data modeling best practices  

  • Experience with AWS Athena, Glue, or similar data query and cataloging services 

  • Experience with Apache Airflow or similar workflow orchestration tools for batch and real-time pipeline management 

  • Demonstrated ability to design and implement scalable ETL/ELT pipelines handling complex data transformations 

  • Excellent communication skills and ability to collaborate effectively with technical and non-technical stakeholders 

 

Good to have: 

 

  • Experience with feature engineering frameworks and feature stores (e.g., Feast, Tecton, or custom solutions) 

  • Familiarity with Kubernetes for containerized data workloads and orchestration 

  • Background in building data infrastructure for machine learning and AI applications 

  • Experience with data quality frameworks and observability tools for data pipelines 

Why Proofpoint?

At Proofpoint, we believe that an exceptional career experience includes a comprehensive compensation and benefits package. Here are just a few reasons you’ll love working with us:

  • Competitive compensation

  • Comprehensive benefits

  • Career success on your terms

  • Flexible work environment

  • Annual wellness and community outreach days

  • Always on recognition for your contributions

  • Global collaboration and networking opportunities

 

Our Culture:

Our culture is rooted in values that inspire belonging, empower purpose and drive success-every day, for everyone.

We encourage applications from individuals of all backgrounds, experiences, and perspectives. If you need accommodation during the application or interview process, please reach out to [email protected].


How to Apply

Interested? Submit your application along with any supporting information- we can’t wait to hear from you!

Skills Required

  • Several years building and maintaining distributed data systems and high-scale data pipelines in managed cloud environments (AWS/Azure/GCP) using Spark, Flink, Dask, Ray, Beam, Databricks Workflows or similar
  • Deep proficiency in Python for production-grade data processing code
  • Strong experience with Infrastructure-as-Code frameworks, particularly Terraform
  • Hands-on experience with open table formats for data lakes (Apache Iceberg, Hudi, Delta Lake) and data modeling best practices
  • Experience with AWS Athena, Glue, or similar data query and cataloging services
  • Experience with Apache Airflow or similar workflow orchestration tools
  • Demonstrated ability to design and implement scalable ETL/ELT pipelines handling complex data transformations
  • Excellent communication skills and ability to collaborate effectively with technical and non-technical stakeholders
  • Experience with feature engineering frameworks and feature stores (e.g., Feast, Tecton)
  • Familiarity with Kubernetes for containerized data workloads and orchestration
  • Background building data infrastructure for machine learning and AI applications
  • Experience with data quality frameworks and observability tools for data pipelines

Proofpoint Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Proofpoint and has not been reviewed or approved by Proofpoint.

  • Healthcare Strength Healthcare coverage spans medical, dental, vision, life, and disability, complemented by global physical, mental, and financial health programs. Wellbeing resources such as mindfulness, resilience, and meditation courses are explicitly highlighted.
  • Leave & Time Off Breadth Time-off provisions include PTO, paid holidays and sick days, with parental and family medical leave available. Added flexibility appears in wellness days, a hybrid-first model, and limited work-from-anywhere periods.
  • Fair & Transparent Compensation Pay is considered competitive in many roles and settings, with external recognition indicating strong standing relative to peers. Feedback suggests employees in several departments view compensation favorably when considering base, bonus, and benefits together.

Proofpoint Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Belfast
3,780 Employees
Year Founded: 2002

What We Do

We provide the most effective cybersecurity and compliance solutions to protect people on every channel including email, the web, the cloud, and social media.

Similar Jobs

N-iX Logo N-iX

Senior Data Engineer

Information Technology • Consulting
Remote
11 Locations
2135 Employees

Caylent Logo Caylent

Senior Data Engineer

Cloud • Software
Remote
Argentina
206 Employees

Greenbox Capital Logo Greenbox Capital

Platform Engineer

Fintech • Financial Services
Remote
Argentina
88 Employees

Perficient Logo Perficient

Senior Data Engineer

Information Technology
In-Office or Remote
5 Locations
3295 Employees

Similar Companies Hiring

Oso Thumbnail
Software • Security • Infrastructure as a Service (IaaS)
New York, New York
36 Employees
Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Milestone Systems Thumbnail
Artificial Intelligence • Security • Software • Analytics • Big Data Analytics
Lake Oswego, OR
1500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account