What You’ll Be Doing
- Design, build, and maintain scalable, cloud-native data pipelines for batch and streaming workloads using modern tools like Airflow, Kafka, and dbt.
- Ensure data reliability, observability, and trust through robust monitoring, testing, and quality enforcement.
- Build data models and infrastructure that serve analytics, business operations, and AI/ML workloads.
- Partner with ML, analytics, and product teams to support the deployment of AI-powered features, from data ingestion to feature engineering and model serving.
- Develop infrastructure that enables reproducible ML workflows and data versioning, in collaboration with Data Science and ML Engineering.
- Build systems that power intelligent search, personalized recommendations, dynamic pricing, and other predictive features.
- Lead efforts to modernize our legacy data systems into modular, scalable, and cost-effective architectures using technologies like Snowflake, S3, Glue, and Redshift.
- Define and drive best practices around data governance, privacy, and access control.
- Champion data discoverability and self-service access across internal teams.
- Working closely with the VP of Engineering to provide architectural guidance across the engineering organization for high-scale data solutions.
- Mentor team members and promote engineering excellence through code reviews, knowledge sharing, and system design.
- Evaluate and introduce emerging tools, frameworks, and patterns that align with our AI and data strategy.
What We’re Looking For
- 8+ years of experience in software/data engineering, with a focus on distributed systems, data platforms, or cloud infrastructure.
- Demonstrated success leading large-scale data infrastructure initiatives and mentoring engineers.
- Experience supporting analytics and ML workflows in a production environment.
- Proficient in Python and SQL; familiarity with JVM-based languages is a plus.
- Strong understanding of streaming and batch processing frameworks (e.g., Kafka, Spark, Airflow).
- Experience with modern data stack tools (e.g., dbt, Snowflake, Redshift, S3, Glue).
- Familiarity with AI agent workflows, including orchestration of multi-step, goal-driven agents using tools or frameworks like LangChain, Semantic Kernel, or custom-built solutions, is a plus.
- Deep understanding of data architecture patterns, including event-driven systems, data lakes, and warehouse modeling.
- Proven ability to lead cross-functional initiatives with Product, Data Science, and Engineering.
- Excellent communication skills and a collaborative, pragmatic mindset.
- Comfortable working in an Agile environment with CI/CD pipelines and DevOps practices.
Similar Jobs
What We Do
In 2016, we founded Provi as an innovative ordering solution for the beverage alcohol industry that would move beyond the constant chaos: the texts, paper stacks, missed phone calls and lost communication that has dominated the purchasing workflow between buyers and distributors for decades.
Today, Provi is better than ever. We’ve created a best-in-class ordering solution that better connects beverage alcohol professionals across 49 states and is growing, with more than 750,000 product listings that make up the most expansive and trusted database of U.S. distributor portfolios. We’re making it easier for thousands of industry professionals to better serve their customers every day. And it’s happening all in one place.
Why Work With Us
Provi is a people-oriented, high-growth unicorn. Our teams are passionate, determined and built around industry veterans from hospitality professionals, tech geeks, and animal lovers. At Provi, our culture encourages everyone to bring their whole selves, show up authentically and offer meaningful contributions.
Gallery


.png)





