In this article, we will look at the details of CRISP-DM methodology.

It stands for “CRoss- Industry Standard Process for Data Mining”. This is the process model that is base for data science projects. It is an open standard process model that describes common approaches used by data mining experts.

CRISP-DM was conceived in 1996 and became a European Union project under the ESPRIT funding initiative in 1997. This was led by five companies – Integral solutions ltd (ISL), Teradata, Daimler AG, NCR Corporation and OHRA Insurance company. It provides a non-proprietary, technology-agnostic and structured approach for fitting data mining project into general problem-solving strategy.

There are six stages for CRISP -DM depicted in the picture below

Sequence of phase is adaptive and next phase sequence is based on the outcome of the previous phase. Phase dependencies are indicated in arrows and at any time it can go back further back for further analysis & understanding of requirements.

Phase 1: Business Understanding

In this phase, needs of the business will be identified.

It focuses clearly articulating project objectives, goals & requirements and providing strategy to achieve those objectives.

Outcome:

  • Define project plan along with clear objectives, goals & requirements

Phase 2: Data Understanding

In this phase, what data is needed will be understood and quality will be checked

It focuses on collecting the data, doing exploratory analysis to familiarise with the data to discover initial insights and check quality of the data.

Outcome:

  • Data insights

Phase 3: Data Preparation

In this phase, focus is on organising data for modelling

It focuses on selection and preparation of final dataset by cleansing and transformation of data. Various tasks performed in this phase are

  • Missing Values
  • Data Types and Conversion
  • Transformation
  • Outliers
  • Feature Selection

Outcome:

  • Preparation of final dataset

Phase 4: Modelling

In this phase, focus is identifying what modelling techniques to apply

A model is nothing but representation of the data and its relationships in a given dataset.

It focuses on selecting and applying various modelling techniques. Different data mining models available are

  • Classification
  • Regression
  • Association Analysis
  • Clustering
  • Outlier or Anamoly Detection

Outcome:

  • Data Model

Phase 5: Evaluation

In this phase, focus is evaluating models that best suits business requirements

It focuses on evaluating models created and decision will be made how to use the results. It will be checked to see whether it achieves the expected objectives or not.

Outcome:

  • Plan for deployment as next step

Phase 6: Deployment

In this phase, focus is on how stakeholders view the results

It focuses on deploying the final model in production and determine the usage of obtained knowledge & results. It organizes, report and present the gained knowledge when needed.

Model created doesn’t mean that its completion of the project until they are used. Deployment can be simple one like generating a report or initiate projects based on predictive outcomes of the model. This decision will be carried out by the business.

Outcome:

  • Review project

Hope this article will be helpful for you.