In this article, we will look at the details of CRISP-DM methodology.
It stands for “CRoss- Industry Standard Process for Data Mining”. This is the process model that is base for data science projects. It is an open standard process model that describes common approaches used by data mining experts.
CRISP-DM was conceived in 1996 and became a European Union project under the ESPRIT funding initiative in 1997. This was led by five companies – Integral solutions ltd (ISL), Teradata, Daimler AG, NCR Corporation and OHRA Insurance company. It provides a non-proprietary, technology-agnostic and structured approach for fitting data mining project into general problem-solving strategy.
There are six stages for CRISP -DM depicted in the picture below
Sequence of phase is adaptive and next phase sequence is based on the outcome of the previous phase. Phase dependencies are indicated in arrows and at any time it can go back further back for further analysis & understanding of requirements.
Phase 1: Business Understanding
In this phase, needs of the business will be identified.
It focuses clearly articulating project objectives, goals & requirements and providing strategy to achieve those objectives.
Outcome:
- Define project plan along with clear objectives, goals & requirements
Phase 2: Data Understanding
In this phase, what data is needed will be understood and quality will be checked
It focuses on collecting the data, doing exploratory analysis to familiarise with the data to discover initial insights and check quality of the data.
Outcome:
- Data insights
Phase 3: Data Preparation
In this phase, focus is on organising data for modelling
It focuses on selection and preparation of final dataset by cleansing and transformation of data. Various tasks performed in this phase are
- Missing Values
- Data Types and Conversion
- Transformation
- Outliers
- Feature Selection
Outcome:
- Preparation of final dataset
Phase 4: Modelling
In this phase, focus is identifying what modelling techniques to apply
A model is nothing but representation of the data and its relationships in a given dataset.
It focuses on selecting and applying various modelling techniques. Different data mining models available are
- Classification
- Regression
- Association Analysis
- Clustering
- Outlier or Anamoly Detection
Outcome:
- Data Model
Phase 5: Evaluation
In this phase, focus is evaluating models that best suits business requirements
It focuses on evaluating models created and decision will be made how to use the results. It will be checked to see whether it achieves the expected objectives or not.
Outcome:
- Plan for deployment as next step
Phase 6: Deployment
In this phase, focus is on how stakeholders view the results
It focuses on deploying the final model in production and determine the usage of obtained knowledge & results. It organizes, report and present the gained knowledge when needed.
Model created doesn’t mean that its completion of the project until they are used. Deployment can be simple one like generating a report or initiate projects based on predictive outcomes of the model. This decision will be carried out by the business.
Outcome:
- Review project
Hope this article will be helpful for you.