Key Responsibilities:
- Design, develop, test, and maintain scalable ETL data pipelines using Python.
- Work extensively on Google Cloud Platform (GCP) services such as:
- Dataflow for real-time and batch data processing
- Cloud Functions for lightweight serverless compute
- BigQuery for data warehousing and analytics
- Cloud Composer for orchestration of data workflows (based on Apache Airflow)
- Google Cloud Storage (GCS) for managing data at scale
- IAM for access control and security
- Cloud Run for containerized applications
Should have experience in the following areas :
- API framework: Python FastAPI
- Processing engine: Apache Spark
- Messaging and streaming data processing : Kafka
- Storage: MongoDB, Redis/Bigtable
- Orchestration: Airflow
- Perform data ingestion from various sources and apply transformation and cleansing logic to ensure high-quality data delivery.
- Implement and enforce data quality checks, validation rules, and monitoring.
- Collaborate with data scientists, analysts, and other engineering teams to understand data needs and deliver efficient data solutions.
- Manage version control using GitHub and participate in CI/CD pipeline deployments for data projects.
- Write complex SQL queries for data extraction and validation from relational databases such as SQL Server, Oracle, or PostgreSQL.
- Document pipeline designs, data flow diagrams, and operational support procedures.
Required Skills:
- 7–10 years of hands-on experience in Python for backend or data engineering projects.
- Strong understanding and working experience with GCP cloud services (especially Dataflow, BigQuery, Cloud Functions, Cloud Composer, etc.).
- Solid understanding of data pipeline architecture, data integration, and transformation techniques.
- Experience in working with version control systems like GitHub and knowledge of CI/CD practices.
- Experience in Apache Spark, Kafka, Redis, Fast APIs, Airflow, GCP Composer DAGs.
- Strong experience in SQL with at least one enterprise database (SQL Server, Oracle, PostgreSQL, etc.).
- Experience in data migrations from on-premise data sources to Cloud platforms.
Good to Have (Optional Skills):
- Experience working with Snowflake cloud data platform.
- Experience in deployments in GKE, Cloud Run.
- Hands-on knowledge of Databricks for big data processing and analytics.
- Familiarity with Azure Data Factory (ADF) and other Azure data engineering tools.
Additional Details:
- Excellent problem-solving and analytical skills.
- Strong communication skills and ability to collaborate in a team environment.
Education:
- Bachelor's degree in Computer Science, a related field, or equivalent experience.
Top Skills
What We Do
Egen is a data engineering and cloud modernization firm partnering with leading Chicagoland companies to launch, scale, and modernize industry-changing technologies. We are catalysts for change who create digital breakthroughs at warp speed. Our team of cloud and data engineering experts are trusted by top clients in pursuit of the extraordinary.
Our mission is to be an enabler of amazing possibilities for companies looking to use the power of cloud and data. We want to stand shoulder to shoulder with clients, as true technology partners, and make sure they succeed at what they have set out to do. We want to be disruptors, game-changers, and innovators who have played an important part in moving the world forward.