Over the past two years, artificial intelligence has leaped from the confines of research and development labs — where data science experts crafted powerful yet often unheralded solutions — to the forefront of every product conversation.
To truly excel in building intelligent products, we must critically evaluate our methodologies to get there.
CRISP-DM Definition
CRISP-DM stands for cross-industry-standard-process for data mining and was conceived by a team of hungry data mining engineers. It has been the most widely used data analytics method for almost 30 years running.
How AI-Minded Data Scientists Clash With Scrum Purists
Traditional product development has long been engineering-centric, frequently overlooking the pivotal role of data science.
Having a portfolio of agile certifications on your LinkedIn profile is no longer a ticket to job security or relevance in the evolving landscape of product development. Product development teams’ use of AI surged from 40 percent in 2023 to a staggering 78 percent in 2024, according to Forbes.
There’s a significant difference, however, between using AI and mastering the art of building AI-infused products. Data scientists often find themselves at odds with agile purists, more specifically Scrum purists, as the exploratory nature of data science can be at odds with Scrum’s structured sprints and predictable cycles.
If you believe that the future hinges on artificial intelligence and solutions powered by data science, then it’s clear that Scrum needs an infusion to stay relevant, or perhaps to be replaced altogether.
So are there ways that Scrum and data science can play nice? Perhaps (wink, wink) there’s a methodology that actually embraces the benefits of iterative cycles, while giving data scientists the flexibility they need to innovate?
What Is CRISP-DM?
Enter CRISP-DM, an established method presented with a new opportunity — the Michael Keaton of methodologies. This time-tested framework offers a structured yet flexible approach tailored for data science projects, embracing the iterative experimentation and hypothesis testing that are the lifeblood of AI development.
At the heart of this transformation lies the unique nature of data science, its practitioners and the process. Unlike conventional software projects that follow a linear path, data science continuously thrives on iterative experimentation and hypothesis testing. Data scientists often dive into vast, evolving data sets, seeking unapparent patterns and insights, and the process is unpredictable.
You can’t just pull the rug out from under them after two weeks, say the sprint is over and investigate something else. It demands flexibility to pivot based on findings, work with the team to find new data sources and, sometimes, throw it all away and start from scratch when the data doesn’t give the sought after solution.
Data scientists can’t guarantee that a breakthrough will occur within a two-week window, nor can they always define what “done” looks like at the outset. The process of training models, tuning algorithms and validating results is often nonlinear and fraught with unexpected challenges. Rigid timelines can squash the creativity and flexibility that data science needs, leading to frustrated scientists and stifled innovation.
The 6-Phase Iterative CRISP-DM Process
Fundamentally, CRISP-DM is a six-phase iterative process that can provide a comprehensive roadmap from project start to finish.
- Business understanding: This first phase emphasizes the clear understanding of the project’s objectives and requirements from a business perspective. It ensures that data scientists align their efforts with organizational goals, laying a solid foundation for the project.
- Data understanding: Here we shift the focus to collecting data and familiarizing the team with its nuances. This involves exploratory data analysis to uncover initial insights, assess data quality and identify underlying patterns or anomalies.
- Data preparation: Likely the most time-consuming step, this involves cleaning and transforming raw data into a suitable format for modeling. Here we address issues like missing values, outliers and data normalization, which are critical for the success of subsequent modeling efforts.
- Modeling: The team selects and applies various modeling techniques with prepared data. This experimental phase may involve trying multiple algorithms, tuning parameters and iteratively refining models to improve performance.
- Evaluation: Before deployment, you must evaluate the models rigorously to ensure they meet the business objectives established in the first phase. Here we validate model performance, assessing whether all critical business issues have been sufficiently addressed and determining the next steps.
- Deployment: Finally, we deploy the model into a real-world environment. The deployment could mean integrating it into a software application, using it to inform decision-making or presenting findings to stakeholders. CRISP-DM recognizes that deployment is not the end but part of a continuous cycle that may loop back to earlier phases as new data or objectives emerge.
For organizations accustomed to working with iterative frameworks like Scrum, CRISP-DM provides structured approaches that complement those frameworks. They guide teams through discovery, data preparation and model development when you need clarity and rigor the most: at the onset of the process.
As products become more reliant on AI, machine learning and data science, the challenges of handling the uncertain outcomes of the exploratory work that comes with it intensify. A hybrid approach that allows teams to adapt while adhering to a structured, thorough process for data handling and model validation is critical.
Blending Scrum and CRISP-DM: A Use Case
Netflix uses data science to refine its recommendation algorithm, ensuring users receive personalized content suggestions based on viewing habits, preferences and engagement patterns, according to Elizabeth Mixson with the AI Data and Analytics Network.
CRISP-DM can provide the structure for the rigorous data handling, model training and evaluation required to achieve this. Netflix uses cross-functional teams using Scrum for continuous feature delivery to the same recommendation system, resulting in real-time improvements to its algorithm based on user behavior.
By training teams to understand both agile principles and data science methodologies, Netflix has created an environment where data scientists, engineers and product managers work together seamlessly, enabling the faster testing and deployment of new features that ensure their content personalization evolves alongside user needs.
So what’s the catch? We’ve discussed how hybrid approaches to Scrum and data science like CRISP-DM can work, and in some cases actually complement each other well. So why is it not more commonplace?
It really comes down to the expectations of stakeholders. Can stakeholders embrace a mindset that emphasizes the value of exploration and the learning that comes with it instead of expecting immediate results like in Scrum? Can stakeholders accustomed to rapid results in the Scrum-driven software space acknowledge and accept the experimentation that data science requires? That is the million dollar, if not billion dollar question.