We are looking for an onshore + remote Senior Data Engineer to join our team. You will be responsible for designing, developing, and maintaining data pipelines and structures that support our data science practice and customer-facing data products.
Responsibilities- Design and Development: Develop, test, and maintain data pipelines using Python and orchestration tools such as Airflow or Kestra.
- Code Quality: Write clean, maintainable, and efficient code, following best practices for coding standards, security, testing, and deployment.
- Database Management: Design and optimize database tables, write efficient SQL queries, and manage database migration scripts.
- Documentation: Develop and maintain documentation around data sources and their composition requirements and resulting transformations from ingestion to final data structures. Also, ensure accurate tracking through the data lifecycle.
- Collaboration: Work with the product team, data scientists, and other stakeholders to define and implement data solutions using new and existing data sources and technologies.
- Continuous Improvement: Participate in code reviews, contribute to team learning, and stay updated with industry trends and technologies.
RequirementsTechnical Qualifications
- Python
- Demonstrable experience in data-focused libraries (Pandas, DuckDB, etc.)
- Experience working with DAGs or equivalent structures
- Experience in process automation in Python
- Experience Integrating with third-party APIs
- Proven Experience in a Data Engineering or Similar Role (ideally 5+ yrs)
- Understanding Data Lineage and Strategies
- ETL Pipeline Design/Development
- Data Modeling Experience
- Experience Building Scalable Data Lakes/Warehouses
- Experience analyzing and organizing large data sets
- Experience in Event-Based Data Processing
- Strong Data Documentation Experience
- Strong SQL (Postgres RDBMS) Experience
- Table design and optimization
- Advanced Query Building and Optimization
- Advanced Data Aggregation Strategies
- Experience with ETL/Workflow Automation & Tools (Kestra, Airflow, or similar)
- Git SCM (Gitlab)
- Experience in Regulated Industries (Healthcare, Banking, etc.)
Bonus:
- AWS (S3, Step Functions, Batch, Athena, Glue)
- Experience in Data Analysis
- Experience working with Data Science/ML teams
- Experience with Typescript
Similar Jobs
What We Do
Our mission is to improve human health by connecting and organizing the nation’s healthcare data.
Predoc is the first and only full service, AI-native medical record retrieval and analysis platform. Predoc offers a range of services including: Medical Record Retrieval, Record Indexation (from both structured and unstructured data), and Medical Record Analysis.
We are building a reality where healthcare teams have access to the right data at the right time, enabling them to fulfill their core purpose - to improve human health. We believe in amplifying clinical judgement and expertise, not replacing it.