Data Engineer (AI/ML)

Posted 8 Days Ago
Randolph, IL, USA
In-Office
101K-139K Annually
Senior level
Insurance
The Role
Design, build, and optimize scalable data pipelines for large-scale structured and unstructured healthcare data to support ML and GenAI workloads. Implement ingestion, transformation, validation, orchestration, monitoring, and data foundations (feature pipelines, embeddings/vector stores). Collaborate with architects, data scientists, and DevOps to deploy and secure cloud-based pipelines while ensuring compliance with SOC 2, HIPAA, and GDPR. Mentor junior engineers and drive best practices in version control, metadata, reproducibility, performance tuning, and cost optimization.
Summary Generated by Built In
Job Description Summary
The Data Engineer will design, build, and optimize scalable, secure data pipelines that power analytics and product platforms. For this role specifically, the focus will be on Machine Learning (ML) and Generative Artificial Intelligence (GenAI) workloads, while contributing to innovation and ensuring compliance with healthcare industry standards. This role is expected to provide strong hands-on technical expertise, collaborate across teams, and contribute to architecture decisions that align engineering practices with organizational goals.

Job Description

  • Design, build, and maintain reliable, high-performance data pipelines for large-scale structured and unstructured healthcare data. 

  • Use PySpark and modern cloud-based tools (Databricks, AWS Glue, EMR, Snowflake) to transform and process data efficiently. 

  • Support ingestion, transformation, and validation processes that ensure data consistency, integrity, and availability. 

  • Partner with Data Architects, Data Scientists, and Analysts to translate business needs into scalable engineering solutions. 

  • Collaborate with platform and DevOps teams to deploy, scale, and monitor data pipelines using Airflow and Kubernetes. 

  • Participate in code reviews, documentation, and continuous improvement efforts across the engineering team. 

  • Implement and maintain data validation frameworks to ensure pipeline accuracy and completeness. 

  • Contribute to best practices in version control, metadata management, and reproducibility. 

  • Stay current with emerging technologies in data engineering and cloud computing, recommending improvements to existing infrastructure. 

  • Participate in performance tuning, cost optimization, and scaling strategies for cloud-based data systems. 

  • Identify automation opportunities to streamline ETL/ELT processes and reduce operational overhead. 

  • Share knowledge and mentor junior team members on tools, techniques, and best practices. 

  • Promote a culture of collaboration, innovation, and continuous learning within the engineering organization. 

  • Support compliance with SOC 2, HIPAA, and GDPR by adhering to established data privacy and security practices. 

The posting range for this position is:

100,800.00 - 138,600.00

Required Education, Certifications and Experience:

Education:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field. 

Experience:

  • 5+ years of experience in data engineering, including building and managing pipelines in cloud-based environments. 

Knowledge Skills and Abilities

  • Experience with building and operationalizing the data foundations that support machine learning and generative AI use cases, including feature pipelines, training/inference data preparation, and retrieval-ready datasets (e.g., embeddings and vector stores)
  • Familiarity with GenAI skills and adjacent tooling (foundation models, prompt engineering, RAG, embeddings/vector databases, and GenAI orchestration frameworks).
  • Hands-on experience with AWS AI/ML and data services, including Amazon Bedrock, Bedrock Agent Core, SageMaker, Glue, and EMR.
  • Experience designing and optimizing data architectures, including data foundations that support ML and GenAI workloads.
  • Hands-on experience with workflow orchestration (Airflow) and containerization (Kubernetes). 
  • Hands-on technical expertise, cross-team collaboration, and contributing to architecture decisions
  • Proficiency in Python, SQL, and distributed data frameworks (PySpark, Databricks, AWS Glue, EMR). 
  • Working knowledge of cloud platforms (AWS or Azure) and data warehouses (Snowflake). 
  • Familiarity with NoSQL and relational databases, as well as data modeling best practices. 
  • Strong analytical, problem-solving, and communication skills. 
  • Understanding of compliance frameworks (SOC 2, HIPAA) and secure data management principles.
  • Experience working with healthcare datasets or knowledge of healthcare standards (HIPAA, HL7, FHIR) preferred.

#LI_HYBRID

      The posted salary range is the lowest to highest salary we, in good faith, believe we would pay for this role at the time of this postingWe may ultimately pay more or less than the hiring range and this hiring range may also be modified in the future. A candidate’s position within the hiring range may be based on several factors including, but not limited to, specific competencies, relevant education, qualifications, certifications, relevant experience, skills, seniority, performance, shift, travel requirements, and business or organizational needs. This job is also eligible for annual bonus incentive pay. 

      We offer a comprehensive package of benefits including paid time off, 11 holidays, medical/dental/vision insurance, generous 401(k) matching, lifestyle spending account and many other benefits to eligible employees. 

      Note: No amount of pay is considered to be wages or compensation until such amount is earned, vested, and determinable. The amount and availability of any bonus, commission, or any other form of compensation that are allocable to a particular employee remains in the Company's sole discretion unless and until paid and may be modified at the Company’s sole discretion, consistent with the law. 

       

      Skills Required

      • Bachelor's or Master's degree in Computer Science, Engineering, or related field
      • 5+ years of experience in data engineering, building and managing cloud-based pipelines
      • Proficiency in Python
      • Proficiency in SQL
      • Experience with PySpark and distributed data frameworks
      • Experience with Databricks, AWS Glue, or EMR
      • Experience with Snowflake or data warehouse technologies
      • Experience with workflow orchestration (Airflow) and containerization (Kubernetes)
      • Hands-on experience with AWS AI/ML and data services (Amazon Bedrock, Bedrock Agent Core, SageMaker)
      • Experience building and operationalizing ML/GenAI data foundations (feature pipelines, training/inference data, embeddings/vector stores)
      • Familiarity with GenAI tooling (foundation models, prompt engineering, RAG, embeddings, vector databases)
      • Working knowledge of cloud platforms (AWS or Azure)
      • Familiarity with NoSQL and relational databases and data modeling best practices
      • Understanding of compliance frameworks and secure data management (SOC 2, HIPAA, GDPR)
      • Experience working with healthcare datasets or knowledge of healthcare standards (HIPAA, HL7, FHIR)
      Am I A Good Fit?
      beta
      Get Personalized Job Insights.
      Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

      The Company
      HQ: Chicago, IL
      3,161 Employees
      Year Founded: 1910

      What We Do

      Blue Cross Blue Shield Association is a national federation of 34 independent, community-based and locally operated Blue Cross and Blue Shield companies that collectively provide health care coverage for one in three Americans. BCBSA provides health care insights through The Health of America Report series and the national BCBS Health Index.

      Similar Jobs

      Canonical Logo Canonical

      Software Engineer

      Cloud • Software
      In-Office or Remote
      7 Locations
      880 Employees
      2K-2K Annually

      Wells Fargo Logo Wells Fargo

      Operations Coordinator

      Fintech • Financial Services
      Hybrid
      Oak Brook, IL, USA
      205000 Employees
      21-29 Hourly
      Hybrid
      Chicago, IL, USA
      205000 Employees
      100K-179K Annually
      Hybrid
      Chicago, IL, USA
      205000 Employees
      159K-305K Annually

      Similar Companies Hiring

      Globe Life Thumbnail
      Insurance • Financial Services
      McKinney, TX
      3000 Employees
      MassMutual India Thumbnail
      Big Data • Fintech • Information Technology • Insurance • Financial Services
      Hyderabad, Telangana
      Granted Thumbnail
      Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
      New York, New York
      23 Employees

      Sign up now Access later

      Create Free Account

      Please log in or sign up to report this job.

      Create Free Account