Python/Pyspark Data Engineer (Must have Databricks Exp)
Jersey City, NJ (Hybrid 3 Days a Week)
12+ Months
Must have 10+ Years experience
Job responsibilities
experience in public cloud migrations of complex systems, anticipating problems, and finding ways to mitigate risk, will be key in leading numerous public cloud initiatives
Executes creative software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems
end-to-end platform issues & help provide solutions to platform build and performance issues on the AWS Cloud & ensure the deliverables are bug free Drive, support, and deliver on a strategy to build broad use of Amazon's utility computing web services (e.g., AWS EC2, AWS S3, AWS RDS, AWS CloudFront, AWS EFS, AWS DynamoDB, CloudWatch, EKS, ECS, MFTS, ALB, NLB) Design resilient, secure, and high performing platforms in Public Cloud using company best practices
Duties and responsibilities
● Collaborate with the team to build out features for the data platform and consolidate data assets
● Build, maintain and optimize data pipelines built using Spark
● Advise, consult, and coach other data professionals on standards and practices
● Work with the team to define company data assets
● Migrate CMS’ data platform into Chase’s environment
● Partner with business analysts and solutions architects to develop technical architectures for strategic enterprise projects and initiatives
● Build libraries to standardize how we process data
● Loves to teach and learn, and knows that continuous learning is the cornerstone of every successful engineer
● Has a solid understanding of AWS tools such as EMR or Glue, their pros and cons and is able to intelligently convey such knowledge
● Implement automation on applicable processes
Mandatory Skills:
● 5+ years of experience in a data engineering position
● Proficiency is Python (or similar) and SQL
● Strong experience building data pipelines with Spark
● Strong verbal & written communication
● Strong analytical and problem solving skills
● Experience with relational datastores, NoSQL datastores and cloud object stores
● Experience building data processing infrastructure in AWS
● Bonus: Experience with infrastructure as code solutions, preferably Terraform
● Bonus: Cloud certification
● Bonus: Production experience with ACID compliant formats such as Hudi, Iceberg or Delta Lake
● Bonus: Familiar with data observability solutions, data governance frameworks
What We Do
Driving Innovation through Advanced Data and AI Services.
We offer next-level data and AI services to help you transform your data into actionable insights for a competitive edge.
To augment your teams, we offer full-time/contract resources & consulting, design and development services for turnkey projects in the following areas:
01 DATA MODERNIZATION
Define Cloud strategy, architecture & roadmap
Identify current landscape & catalog data sources
Data warehouse design & setup
Develop data governance & data quality framework
Implement data management technologies
Build data analytics capabilities
Inculcate data driven culture
A future-ready framework to serve all business use cases
02 DATA INTEGRATION & CLOUD MIGRATION
Design & develop
Optimized data models and data pipelines
ETL/ELT workloads & monitoring
Processes that scale up and down without performance problems
Automate data migration between upstream/downstream (on-prem) systems and Cloud
03 DATA ANALYTICS & AI
Data analytics consulting – use-case exploration and building analytics roadmap
Data preparation and creating a single version of truth
Dashboard & report development
AI/ML model selection and development
AI/ML model tuning and validation
AI/ML model scaling, integration and deployment