Introduction to Normalization & 1st Normal Form

Hello everyone! Today we are going to discuss a new topic. It is Database Normalization. 



Normalization is a systematic approach of decomposing tables to eliminate data redundancy (repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables.

First of all, we need to indentify the different anomalies we have to face in databases. They are;
  • Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
  • Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved somewhere else.
  • Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization will remove all these anomalies and bring the database to a consistent state.

Ok! now we can move on to the steps of normalization process of databases.

First Normal Form

1st Normal Form is the Step 1 of the Normalization process. The 1st Normal form expects you to design your table in such a way that it can easily be extended and it is easier for you to retrieve data from it whenever required.

Note : If tables in a database are not even in the 1st Normal Form, it is considered as bad database design.
Rules for First Normal Form

The first normal form expects you to follow a few simple rules while designing your database, and they are:


  1. Rule 1: Single Valued Attributes - Each column of your table should be single valued which means they should not contain multiple values.
  2. Rule 2: Attribute Domain should not change - In each column the values stored must be of the same kind or type. For an example: If you have a column age to save ages of a set of people, then you cannot or you must not save 'names' of some of them in that column along with 'ages' of others in that column. It should hold only 'ages' for all the records/rows.
  3. Rule 3: Unique name for Attributes/Columns - This rule says that each column in a table should have a unique name. This is to avoid confusion at the time of retrieving data or performing any other operation on the stored data.
  4. Rule 4: Order doesn't matters - This rule says that the order in which you store the data in your table doesn't matter.

Let's do an example to understand the first normal form. Here is our table, with some sample data added to it.

Index_noNameSubject
101JohnOS, CN
103MarieJava
102VladimirC, C++
Our table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we have stored data in the order we wanted to and we have not inter-mixed different type of data in columns.

But out of the 3 different students in our table, 2 have opted for more than 1 subject. And we have stored the subject names in a single column. But as per the 1st Normal form each column must contain atomic value.
Let's solve this problem. 

Here is our updated table and it now satisfies the First Normal Form.

Index_noNameSubject
101JohnOS
101JohnCN
103VladimirJava
102MarieC
102MarieC++
By doing so, although a few values are getting repeated but values for the subject column are now atomic for each record/row.

Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique.

Now you know what is first normal form is. In the next tutorial I will discuss the second normal form. Till then Good Luck !

Thank you!

Kalpani

Comments