In this article, we will look at basic concepts of database normalisation. Understanding of these basics, will help in designing the database.

Normalisation is the process of structuring the data in the database according to the rules (normal forms) defined to eliminate redundancies & improve data integrity. This was first proposed by Edgar F. Codd – British scientist.

Normalisation enables clarity with the data by splitting the large table into smaller ones & define relationship between them. In real world, normalization up to 3NF is applied in most of the applications as problem rarely exists to apply beyond that.

Advantages of normalization:

  • Better Organized data
  • Provides flexibility
  • Enforces data integrity
  • Reduce redundancy

Disadvantages of normalization:

  • Performance degrades applying higher normal forms – 4NF, 5NF
  • Improper decomposition of tables leads to bad database design
  • Queries will be complex with lots of table created
  • Maintenance will be tedious with large no. of tables

Below are the types of normal forms

  • First Normal Form (1NF): Eliminate Repeated Groups
    • Each column is unique meaning no repeated groups, every cell has unique values

Example:

StudentIDStudentNameClass
1NandaMaths, English
2RajEnglish
3ArjunMaths

After applying 1NF, it would be like below

Student table:

StudentIDStudentName
1Nanda
2Raj
3Arjun

Class table:

StudentIDClass
1English
1Maths
2English
3Maths
  • Second Normal Form (2NF): Eliminate partial functional dependency
    • Should be in 1NF
    • All entity depends on the unique identifier

Example:

Let’s take the above output of 1NF, there is no unique identifier for class table.

After applying 2NF, it would be like below

Student table:

StudentIDStudentName
1Nanda
2Raj
3Arjun

Class table:

ClassIDClassName
C1English
C2Maths

Student- Class table:

StudentIDClassID
1C1
1C2
2C1
3C2
  • Third Normal Form (3NF): Eliminate Transitive Dependency
    • Should be in 2NF
    • All columns in the table should be dependent on the key. If there are entries doesn’t not dependent on the key, then it should be moved to new table.
    • Database is considered normalised after applying 3NF.

Example:

Class table:

ClassIDClassNameFacultyName
C1EnglishVidya
C2MathsSanthi

     After applying 3NF, it would be like below

      Faculty table:       

FacultyIDFacultyName
F1Vidya
F2Santhi

      Class table:

ClassIDClassNameFacultyID
1EnglishF1
2MathsF2
  • Boyce-Codd Normal Form (BCNF) – Stronger version of 3NF
    • Should be in 3NF
    • For any functional dependency (A->B), A always the super key or candidate key and can’t be non-prime attribute.

Example:

StudentIDClassNameFacultyName
1EnglishVidya
2MathsSanthi
1HindiDharani
4MathsVidya

In this table, StudentID & ClassName are unique identifiers and this table follows 3NF.

After applying Boyce-Codd normal form, it would be

IDStudentIDFacultyName
11Vidya
22Santhi
31Dharani
44Vidya
         FacultyNameClassName
VidyaEnglish
SanthiMaths
DharaniHindi
VidyaMaths
  • Fourth Normal Form(4NF): Eliminate multi-values Dependency
    • Should be in Boyce-Codd Normal Form
    • For any functional dependency (A->B), A should contain only single value of B and it should not have multiple values of B

Example:

       Student- Class table:

StudentIDClassIDHobbies
1C1Swimming
1C2Swimming
2C1Singing
3C2Dancing

After apply 4NF, it would be

StudentIDClassID
1C1
1C2
2C1
3C2
StudentIDHobbies
1Swimming
2Singing
3Dancing
  • Fifth Normal Form(5NF): Eliminate Join Dependency
    • Should be in 4NF
    • There should not be any join dependency
    • Tables should be broken down into smaller ones in order to reduce redundancy

Example:

         FacultyNameClassNameYear
VidyaEnglishYear1
SanthiMathsYear1
DharaniHindiYear1
VidyaMathsYear2

After applying 5NF, it would be

Year         FacultyName
Year1Vidya
Year1Santhi
Year1Dharani
Year2Vidya
ClassName         FacultyName
EnglishVidya
MathsSanthi
HindiDharani
MathsVidya
YearClassName
Year1English
Year1Maths
Year1Hindi
Year2Maths
  • Sixth Normal Form(6NF): Decompose relation variables
    • Should be in 5NF
    • Decompose relation variables into irreducible components

Example:

         StudentIDClassIDMarks
1C190
1C280
2C3100
3C2100

             After applying 6NF, it would be

         StudentIDClassID
1C1
1C2
2C3
3C2
         StudentIDMarks
190
180
2100
3100