Software Mind Jobs

[VCK] Senior Data Engineer (AI Ingestion Platform)

Software Mind

[VCK] Senior Data Engineer (AI Ingestion Platform)

Posted 2 Days Ago

Be an Early Applicant

Hiring Remotely in Buenos Aires, Ciudad Autónoma de Buenos Aires, ARG

In-Office or Remote

Senior level

Software

The Role

Design, build, and maintain tenant-isolated ingestion pipelines that transform emails and documents into PII-minimised, chunked content indexed into per-tenant vector stores. Implement Microsoft Graph and SharePoint/OneDrive ingestion, OCR orchestration, text extraction, vector indexing (OpenSearch/Pinecone), schema and lineage documentation, and strategies for incremental ingestion and audit traceability.

Summary Generated by Built In

Company Description

We are Software Mind, an awesome team of engineers who are ready to ramp up any top-notch company’s projects! Our aim? To always be one step ahead. Become part of a multicultural company in constant growth with an excellent work environment certified by Great Place To Work!

Job Description

About the Project

Software Mind is building a private, tenant-isolated AI assistant for the real estate title and settlement industry. The platform is a retrieval-first (RAG) system that ingests historical email, documents, and structured metadata into a per-tenant vector index, and serves grounded, cited, expert-weighted answers through a chat-style Q&A interface with single sign-on and full audit logging.

The platform is AWS-native with a Python/FastAPI backend, Vue.js frontend, OpenSearch/Pinecone vector store, and OpenAI/Anthropic/Bedrock as LLM provider. You will join a senior, cross-functional LATAM-based team where hands-on AI delivery experience not just familiarity is the baseline expectation.

You own the ingestion and processing backbone of the platform the pipelines that transform raw email and document corpora into clean, PII-minimised, chunked, and indexed data in the per-tenant vector store. This is the foundational layer the AI extraction gateway depends on; quality here directly determines system accuracy.

Your Responsibilities

Build and own the historical email ingestion pipeline via Microsoft Graph API

Implement SharePoint / OneDrive document ingestion pipeline with scoped folder access

Design and implement the PII minimisation pre-processing layer

Build the vector store indexing workflow (OpenSearch/Pinecone) with per-tenant data isolation

Define and implement the data processing schema; produce and maintain schema documentation

Build the OCR routing orchestrator and integrate OCR service for scanned documents

Implement the raw text / content extraction layer for all supported document types

Define and prototype push vs. pull ingestion strategy, from one-time PoC through to incremental nightly pipeline

Ensure data lineage and audit traceability are built into pipeline outputs from the outset

Qualifications

Must-Have Skills & Experience

6+ years in data engineering; strong pipeline and ETL/ELT experience required

Proficiency in Python for data pipeline development

Experience with Microsoft Graph API or similar enterprise email/document APIs (M365, Exchange Online)

AWS data services: S3, DynamoDB, Glue, and/or Lambda-based event-driven processing

Familiarity with PII detection and data minimisation techniques (regex-based, NER-based, or purpose-built libraries)

Experience with vector store indexing or semantic search pipeline construction

Additional Information

Nice-to-Have

Prior experience building ingestion pipelines specifically for AI/ML, NLP, or LLM-based platforms

OCR tooling experience: AWS Textract, Tesseract, or commercial OCR services

Understanding of per-tenant data isolation patterns, tenant-scoped encryption, and row-level security

Familiarity with LangChain document loaders, embedding pipelines, or vector index management

We are accepting applications from LATAM countries
#LI-DNI

Skills Required

6+ years in data engineering with strong pipeline and ETL/ELT experience
Proficiency in Python for data pipeline development
Experience with Microsoft Graph API or similar enterprise email/document APIs (M365, Exchange Online)
Experience with AWS data services: S3, DynamoDB, Glue, and/or Lambda-based event-driven processing
Familiarity with PII detection and data minimisation techniques (regex-based, NER-based, or purpose-built libraries)
Experience with vector store indexing or semantic search pipeline construction
Hands-on AI delivery experience for ingestion/LLM systems
Prior experience building ingestion pipelines for AI/ML, NLP, or LLM-based platforms
OCR tooling experience: AWS Textract, Tesseract, or commercial OCR services
Understanding of per-tenant data isolation patterns, tenant-scoped encryption, and row-level security
Familiarity with LangChain document loaders, embedding pipelines, or vector index management

Software Mind Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Software Mind and has not been reviewed or approved by Software Mind.

Fair & Transparent Compensation — Pay is considered competitive for core hiring markets, with “good salary” cited in multiple locales. Public salary snapshots provide a baseline that helps candidates assess offers and negotiations.
Flexible Benefits — Remote or hybrid options are prominently highlighted, and a remote‑work program is publicly noted alongside positively cited work‑from‑home experiences. Flexibility around schedules and location is presented as part of the package.
Wellbeing & Lifestyle Benefits — Private medical care, language classes, sports/fitness support, and learning initiatives are listed for several Central/Eastern European locations, with occasional workation perks promoted. These lifestyle‑oriented offerings complement base pay and can enhance perceived total rewards.

Learn more about Software Mind's Compensation & Benefits →

Software Mind Insights

What's It Like to Work at Software Mind? Software Mind Culture & Values Software Mind Career Growth & Development What's the Work-Life Balance Like at Software Mind? Software Mind Leadership & Management Software Mind Company Growth, Stability & Outlook

View all jobs at Software Mind

View Software Mind Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Cracow

1,000 Employees

Year Founded: 1999

What We Do

Software Mind is a global digital transformation partner with operations throughout Europe, the US and LATAM. Driven by tech and empowered by people, we provide companies with software engineers and autonomous, cross-functional development teams who manage software life cycles from ideation to release and beyond. For over 20 years we’ve been enriching organizations with the talent they need to boost scalability, drive dynamic growth and bring disruptive ideas to life. Our top-notch engineering teams combine ownership with leading technologies, including cloud, AI, data science and embedded software to accelerate digital transformations and boost software delivery. A culture, driven by trust, that embraces openness, craves more and acts with respect enables our experts to create evolutive solutions that support scale-ups, unicorns and enterprise-level companies around the world.