Remote position with the ability to travel to our NJ and NY locations up to 25% of the time.
Position Summary: Data Engineer with previous experience in the insurance domain, design, build, and maintain scalable data pipelines and models that power analytics and Generative AI initiatives. The Data Engineer will collaborate closely with analysts and engineers, to understand data needs, develop efficient solutions, and ensure the integrity, performance, and security of our data systems.
Essential Duties and Responsibilities:- Analyze integration and system requirements by understanding business needs and designing effective data solutions, particularly for Guidewire Policy, Billing Center and Commercial P&C data domains.
- Design, develop, and optimize ELT pipelines to ingest, transform, and load data into a Delta Lakehouse platform.
- Design, Develop and maintain data models and schemas ensuring data quality and integrity.
- Build and maintain dashboards and reports delivering actionable business insights.
- Monitor pipeline and storage performance; troubleshoot and resolve data issues promptly.
- Collaborate with cross-functional teams, including analysts and business users, to deliver end-to-end insurance data solutions.
- Implement data governance, security, and compliance standards across platforms.
- Conduct root cause analysis for system failures and performance events; drive continuous improvements in enterprise data integration pipelines.
- Create and manage testing procedures (unit, scenario, end-to-end) to ensure pipeline reliability.
- Stay current with emerging technologies and recommend improvements to workflows.
- Mentor team members on data engineering tools, Guidewire data model concepts and best practices.
- Build and maintain data pipelines to process structured and unstructured data (like documents and text) for Generative AI tasks, including creating embeddings and working with vector databases to support AI search features.
- Prepare and clean large datasets to ensure high-quality inputs for training and fine-tuning Generative AI models.
- Collaborate with AI and data science teams to understand data requirements and deliver scalable solutions that support model training and inference.
- Implement processes to enrich data with metadata and context to improve AI model accuracy and relevance.
- Optimize data storage and retrieval methods to support fast, low-latency responses for AI-powered applications.
- Monitor data workflows for Generative AI projects, troubleshooting issues, and ensure continuous pipeline performance.
- Bachelor’s degree in relevant field of study required(e.g. computer science, data science, data analytics, applied mathematics, etc.)
- 5+ years of progressive work experience in IT or a related field required.
- 3+ years of experience in data engineering and analytics, with a solid foundation in data architecture and integration, including hands-on work with complex enterprise P& C Insurance data models required.
- 2+ years of experience with data-centric projects within the Guidewire ecosystem, including working with Guidewire Policy Center and related data structures required.
- In-depth understanding of relational database systems (e.g., Oracle, SQL Server, MySQL), including their features and performance optimization strategies.
- Solid grasp of ETL processes, data pipeline architectures, and data integration techniques, particularly for operational source systems such as Guidewire.
- 2+ years of hands-on experience with Azure Databricks, Azure data factory including developing and optimizing data pipelines using Apache Spark required.
- 2+ years of experience working with Power BI or other leading data visualization and reporting tools required.
- Proven expertise with Apache Spark, Delta Lakehouse, and data warehousing technologies required.
- Proficient in Microsoft Azure services, including:
- Azure SQL Database
- Azure Data Lake Storage Gen2
- Azure Event Grid
- Azure Key Vault
- Strong understanding of CI/CD pipelines and experience in Agile development environments.
- Demonstrated ability to troubleshoot system issues, identify root causes, and implement effective solutions quickly.
- Capable of managing multiple priorities with strong attention to detail and follow-through.
- Working knowledge of Generative AI frameworks and use cases in data engineering is a plus.
- Knowledge of data governance, metadata management, and data quality frameworks.
- Understanding of data security and privacy principles, including encryption, anonymization, and access control mechanisms.
- Proficient in Microsoft Office Suites
- Strong understanding of the Insurance Domain and Experience using Guidewire CDA data model to build use case specific datasets.
- Data Modeling
- Expertise in Azure Databricks, Azure Data Factory, Apache Spark, and data pipeline development for scalable data engineering solutions
- Collaboration with cross-functional groups
- Strong analytical and problem-solving skills, with the ability to translate business requirements into practical data solutions.
The salary range for this role is $86,800 - $160,700. The listed annual salary range posted for this position is subject to change and may vary depending on performance, education, experience, skills, geographic location, travel requirements, demonstrated proficiency in the competencies required for the role and business needs. Base pay is just one component of GNY’s total compensation package for employees. Other rewards include eligibility for an annual discretionary bonus based on performance.
Skills Required
- Bachelor's degree in Computer Science, Data Science, Analytics, Applied Mathematics, or related field
- 5+ years progressive IT or related work experience
- 3+ years data engineering and analytics experience with enterprise P&C insurance data models
- 2+ years experience with Guidewire ecosystem, including Policy Center and related data structures
- Hands-on experience with relational databases (Oracle, SQL Server, MySQL) and performance optimization
- Experience designing and implementing ELT/ETL pipelines and data integration for operational source systems
- 2+ years hands-on experience with Azure Databricks, Azure Data Factory, and Apache Spark
- Proven expertise with Apache Spark, Delta Lakehouse, and data warehousing technologies
- 2+ years experience developing reports and dashboards with Power BI or similar visualization tools
- Proficiency with Microsoft Azure services: Azure SQL Database, ADLS Gen2, Azure Event Grid, Azure Key Vault
- Strong understanding of CI/CD pipelines and experience in Agile development environments
- Experience troubleshooting system issues, performing root cause analysis, and implementing fixes
- Knowledge of data governance, metadata management, and data quality frameworks
- Understanding of data security and privacy principles (encryption, anonymization, access controls)
- Experience building pipelines for structured and unstructured data, creating embeddings and working with vector databases for AI use cases
- Ability to create and manage testing procedures (unit, scenario, end-to-end) for pipeline reliability
- Proficient in Microsoft Office Suites
- Working knowledge of Generative AI frameworks and use cases in data engineering
- Ability to mentor team members on data engineering tools and Guidewire data model concepts
What We Do
Greater New York Insurance Companies are leading providers of middle market property and casualty insurance focusing on commercial real estate, including condominiums, co-op apartment buildings, office buildings, restaurants, light manufacturing, and other small to mid-size commercial establishments. The lead company was formed as a mutual insurance company in 1914 as an outgrowth of a real estate trade association servicing New York City property owners. Now, over 100 years after its founding, GNY writes business in 15 states primarily in the Northeast, Mid-Atlantic and Midwest regions of the United States.









