In this article, we will look at basic concepts of database normalisation. Understanding of these basics, will help in designing the database.
Normalisation is the process of structuring the data in the database according to the rules (normal forms) defined to eliminate redundancies & improve data integrity. This was first proposed by Edgar F. Codd – British scientist.
Normalisation enables clarity with the data by splitting the large table into smaller ones & define relationship between them. In real world, normalization up to 3NF is applied in most of the applications as problem rarely exists to apply beyond that.
Advantages of normalization:
- Better Organized data
- Provides flexibility
- Enforces data integrity
- Reduce redundancy
Disadvantages of normalization:
- Performance degrades applying higher normal forms – 4NF, 5NF
- Improper decomposition of tables leads to bad database design
- Queries will be complex with lots of table created
- Maintenance will be tedious with large no. of tables
Below are the types of normal forms
- First Normal Form (1NF): Eliminate Repeated Groups
- Each column is unique meaning no repeated groups, every cell has unique values
Example:
StudentID | StudentName | Class |
1 | Nanda | Maths, English |
2 | Raj | English |
3 | Arjun | Maths |
After applying 1NF, it would be like below
Student table:
StudentID | StudentName |
1 | Nanda |
2 | Raj |
3 | Arjun |
Class table:
StudentID | Class |
1 | English |
1 | Maths |
2 | English |
3 | Maths |
- Second Normal Form (2NF): Eliminate partial functional dependency
- Should be in 1NF
- All entity depends on the unique identifier
Example:
Let’s take the above output of 1NF, there is no unique identifier for class table.
After applying 2NF, it would be like below
Student table:
StudentID | StudentName |
1 | Nanda |
2 | Raj |
3 | Arjun |
Class table:
ClassID | ClassName |
C1 | English |
C2 | Maths |
Student- Class table:
StudentID | ClassID |
1 | C1 |
1 | C2 |
2 | C1 |
3 | C2 |
- Third Normal Form (3NF): Eliminate Transitive Dependency
- Should be in 2NF
- All columns in the table should be dependent on the key. If there are entries doesn’t not dependent on the key, then it should be moved to new table.
- Database is considered normalised after applying 3NF.
Example:
Class table:
ClassID | ClassName | FacultyName |
C1 | English | Vidya |
C2 | Maths | Santhi |
After applying 3NF, it would be like below
Faculty table:
FacultyID | FacultyName |
F1 | Vidya |
F2 | Santhi |
Class table:
ClassID | ClassName | FacultyID |
1 | English | F1 |
2 | Maths | F2 |
- Boyce-Codd Normal Form (BCNF) – Stronger version of 3NF
- Should be in 3NF
- For any functional dependency (A->B), A always the super key or candidate key and can’t be non-prime attribute.
Example:
StudentID | ClassName | FacultyName |
1 | English | Vidya |
2 | Maths | Santhi |
1 | Hindi | Dharani |
4 | Maths | Vidya |
In this table, StudentID & ClassName are unique identifiers and this table follows 3NF.
After applying Boyce-Codd normal form, it would be
ID | StudentID | FacultyName |
1 | 1 | Vidya |
2 | 2 | Santhi |
3 | 1 | Dharani |
4 | 4 | Vidya |
FacultyName | ClassName |
Vidya | English |
Santhi | Maths |
Dharani | Hindi |
Vidya | Maths |
- Fourth Normal Form(4NF): Eliminate multi-values Dependency
- Should be in Boyce-Codd Normal Form
- For any functional dependency (A->B), A should contain only single value of B and it should not have multiple values of B
Example:
Student- Class table:
StudentID | ClassID | Hobbies |
1 | C1 | Swimming |
1 | C2 | Swimming |
2 | C1 | Singing |
3 | C2 | Dancing |
After apply 4NF, it would be
StudentID | ClassID |
1 | C1 |
1 | C2 |
2 | C1 |
3 | C2 |
StudentID | Hobbies |
1 | Swimming |
2 | Singing |
3 | Dancing |
- Fifth Normal Form(5NF): Eliminate Join Dependency
- Should be in 4NF
- There should not be any join dependency
- Tables should be broken down into smaller ones in order to reduce redundancy
Example:
FacultyName | ClassName | Year |
Vidya | English | Year1 |
Santhi | Maths | Year1 |
Dharani | Hindi | Year1 |
Vidya | Maths | Year2 |
After applying 5NF, it would be
Year | FacultyName |
Year1 | Vidya |
Year1 | Santhi |
Year1 | Dharani |
Year2 | Vidya |
ClassName | FacultyName |
English | Vidya |
Maths | Santhi |
Hindi | Dharani |
Maths | Vidya |
Year | ClassName |
Year1 | English |
Year1 | Maths |
Year1 | Hindi |
Year2 | Maths |
- Sixth Normal Form(6NF): Decompose relation variables
- Should be in 5NF
- Decompose relation variables into irreducible components
Example:
StudentID | ClassID | Marks |
1 | C1 | 90 |
1 | C2 | 80 |
2 | C3 | 100 |
3 | C2 | 100 |
After applying 6NF, it would be
StudentID | ClassID |
1 | C1 |
1 | C2 |
2 | C3 |
3 | C2 |
StudentID | Marks |
1 | 90 |
1 | 80 |
2 | 100 |
3 | 100 |