Job Summary:
The Data Lake DevOps Engineer is responsible for building, deploying, and maintaining data lake infrastructure to ensure efficient data ingestion, storage, processing, and retrieval. The engineer will collaborate with data engineers, software developers, and operations teams to create scalable data platforms and enable advanced analytics.
Key Responsibilities:
Data Lake Infrastructure Setup:
- Design, develop, and maintain scalable and high-performance data lake architectures.
- Set up and manage cloud-based or on-premises data lakes using platforms like AWS S3, Azure Data Lake Storage, Google Cloud Storage, or Hadoop-based systems.
- Configure and manage distributed storage and processing tools like Apache Hadoop, Spark, Kafka, Druid, Hive, and Presto.
Data Pipeline Automation:
- Develop automated data ingestion, ETL (Extract, Transform, Load), and data processing workflows using CI/CD (Continuous Integration/Continuous Deployment) pipelines.
- Implement infrastructure as code (IaC) using tools like Terraform, CloudFormation, or Azure Resource Manager.
- Ensure automated data backup, archiving, and disaster recovery.
Monitoring & Optimization:
- Monitor data lake infrastructure, identifying and resolving performance bottlenecks.
- Optimize storage and compute resources for cost efficiency and performance.
- Implement logging, monitoring, and alerting solutions using tools like Prometheus, Grafana, ELK Stack, or CloudWatch.
- Implement APM tool like AppD, Datadog, New Relic etc
Security & Compliance:
- Implement security best practices, including data encryption, identity, and access management (IAM), and network security.
- Ensure compliance with data privacy regulations such as GDPR, HIPAA, and CCPA.
- Conduct regular audits, vulnerability assessments, and security tests.
Collaboration & Documentation:
- Work closely with data engineers, software developers, and data scientists to understand requirements and provide appropriate infrastructure solutions.
- Document processes, configurations, and best practices for data lake management and operations.
- Provide training and support to teams for using the data lake infrastructure.
Skills & Qualifications:
Technical Skills:
- Strong experience with cloud platforms like AWS, Azure, or GCP, particularly in data storage and compute services (AWS S3, EMR, Azure Data Lake, BigQuery, etc.).
- Proficiency in infrastructure automation and configuration management tools (Terraform, Pulumi, Jenkins, Kubernetes, Docker).
- Hands-on experience with data processing and storage frameworks like Hadoop, Spark, Kafka, Druid, Hive and Presto.
- Strong programming skills in Python, Java, .NET, or similar languages.
- Familiarity with version control tools (Git) and CI/CD tools (Jenkins, GitHub, TeamCity, ArgoCD ).
Analytical & Problem Solving:
- Experience in optimizing large-scale data platforms for performance and scalability.
- Strong troubleshooting and debugging skills.
- Ability to analyze infrastructure costs and optimize cost savings.
Security:
- Solid understanding of data security best practices (IAM, VPC, encryption).
- Experience implementing role-based access control (RBAC) and secure data management.
Soft Skills:
- Strong communication skills and ability to work in a collaborative environment.
- Attention to detail and proactive problem-solving skills.
- Ability to manage multiple priorities in a dynamic, fast-paced environment.
- Self-driven and eager to learn new technologies.
Preferred Qualifications:
- Experience with streaming data tools like Apache Kafka, Kinesis, or Azure Event Hubs.
- Knowledge of machine learning platforms and tools.
- Certification in cloud platforms (AWS Certified DevOps Engineer, Azure DevOps Expert, GCP Professional Data Engineer, etc.).
Education & Experience:
- Bachelor's degree in Computer Science, Information Technology, or related field.
- 8+ years of experience in a DevOps or cloud engineering role, with a focus on data lake environment minimum 2 to 3 years.
About Picarro:
We are the world's leader in timely, trusted, and actionable data using enhanced optical spectroscopy. Our solutions are used in various applications, including natural gas leak detection, ethylene oxide emissions monitoring, semiconductor fabrication, pharmaceutical, petrochemical, atmospheric science, air quality, greenhouse gas measurements, food safety, hydrology, ecology, and more. Our software and hardware are designed and manufactured in Santa Clara, California. They are used in over 90 countries worldwide based on over 65 patents related to cavity ring-down spectroscopy (CRDS) technology. They are unparalleled in their precision, ease of use, and reliability.
At Picarro, we are committed to fostering a diverse and inclusive workplace. All qualified applicants will receive consideration for employment without regard to race, sex, color, religion, national origin, protected veteran status, gender identity, social orientation, or disability. Posted positions are not open to third-party recruiters/agencies, and unsolicited resume submissions will be considered free referrals.
Top Skills
What We Do
Empowering the world through timely, trusted and actionable data through enhanced optical spectroscopy