The Role
Design, develop, and maintain scalable data pipelines (PySpark/Apache Spark); process large-scale structured and unstructured data; build real-time analytics for cloud and edge devices; performance-tune and debug Spark jobs; collaborate with data scientists, data engineers, and firmware teams; apply distributed computing and data warehousing/lake concepts.
Summary Generated by Built In
Location: This position will be based in Bangalore/Mumbai, India
Role and Responsibilities:
- Design, develop, and maintain scalable data pipelines using PySpark and Apache Spark.
- Process and analyze large-scale structured and unstructured datasets in distributed environments.
- Responsible for building real-time analytics on cloud and edge devices
- Solve challenging data and architectural problems using cutting edge technology
- Cross functional collaboration with data scientists / data engineering / firmware controls teams
Skills and Experience:
- Strong Java/ Scala programming/debugging ability and clear design patterns understanding, Python is a bonus
- Understanding of Kafka/ Spark / Flink / Hadoop / HBase etc. internals (Hands on experience in one or more preferred)
- Implementing data wrangling, transformation and processing solutions, demonstrated experience of working with large datasets
- Experience in performance tuning and debugging Spark jobs
- Good understanding of distributed computing principles
- Knowhow of cloud computing platforms like AWS/GCP/Azure beneficial
- Exposure to data lakes and data warehousing concepts, SQL, NoSQL databases
- Working on REST API’s, gRPC are good to have skills
- Ability to adapt to new technology, concept, approaches, and environment faster
- Problem-solving and analytical skills
- Must have a learning attitude and improvement mindset
Qualifications:
- MTech/M.S with emphasis in computational or decision sciences preferred
- 3+ years of relevant experience
Skills Required
- 3+ years of relevant experience
- Strong Java or Scala programming and debugging ability with clear design patterns understanding
- Python (bonus)
- Hands-on experience with Kafka, Spark, Flink, Hadoop, HBase (one or more)
- Implementing data wrangling, transformation and processing solutions for large datasets
- Experience in performance tuning and debugging Spark jobs
- Good understanding of distributed computing principles
- Knowhow of cloud platforms (AWS, GCP, Azure)
- Exposure to data lakes and data warehousing concepts, SQL and NoSQL databases
- Experience working with REST APIs and gRPC
- MTech/M.S with emphasis in computational or decision sciences
- Ability to adapt to new technology and a continuous learning mindset
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company