Normalization: It is the process of removing redundant data from your tables in order to improve storage efficiency, data integrity and scalability. This improvement is balanced against an increase in complexity and potential performance losses from the joining of the normalized tables at query-time.There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. Normalization is also called “Bottom-up-approach”, because this technique requires full knowledge of every participating attribute and its dependencies on the key attributes, if you try to add new attributes after normalization is done, it may change the normal form of the database design.

Redundancy:Dependencies between attributes within a relation cause redundancy.

Without Normalization Problems: Without Normalization, it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if Database is not normalized.There’s clearly redundant information stored here.

  • Insert Anomaly– Due to lack of data i.e., all the data available for insertion such that null values in keys should be avoided. This kind of anomaly can seriously damage a database
  • Update Anomaly– It is due to data redundancy i.e. multiple occurrences of same values in a column. This can lead to inefficiency.
  • Deletion Anomaly – It leads to loss of data for rows that are not stored elsewhere. It could result in loss of vital data.

On decomposition of a relation into smaller relations with fewer attributes on normalization the resulting relations whenever joined must result in the same relation without any extra rows. The join operations can be performed in any order. This is known as Lossless Join decomposition. The resulting relations (tables) obtained on normalization should possess the properties such as each row must be identified by a unique key, no repeating groups, homogenous columns, each column is assigned a unique name etc.

Functional Dependency: The attributes of a table is said to be dependent on each other when an attribute of a table uniquely identifies another attribute of the same table.If column A of a table uniquely identifies the column B of same table then it can represented as A->B (Attribute B is functionally dependent on attribute A).

  • Partial Function Dependency ; It is a form of Functional dependency that holds on a set of attributes.Let us assume a relation R with attributes A, B, C, and D. Also, assume that the set of functional dependencies F that hold on R as follows;

                  F = {A → B, D → C}.

From set of attributes F, we can derive the primary key. For R, the key can be (A,D), a composite primary key. That means, AD → BC, AD can uniquely identify B and C. But, for this case A and D is not required to identify B or C uniquely. To identify B, attribute A is enough. Likewise, to identify C, attribute D is enough. The functional dependencies AD → B or AD → C are called as Partial functional dependencies.

  • Trivial Dependency ; The dependency of an attribute on a set of attributes is known as Trivial Dependency if the set of attributes includes that attribute.

Consider a table with two columns Student_id and Student_Name.{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as Student_Id is a subset of {Student_Id, Student_Name}.  That makes sense because if we know the values of Student_Id and Student_Name then the value of Student_Id can be uniquely determined.Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial dependencies too.

  • Non-Trivial Dependency ; If a functional dependency X->Y holds true where Y is not a subset of X then this dependency is called Non-Trivial Dependency.

An employee table with three attributes: emp_id, emp_name, emp_address.
The following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)

On the other hand, the following dependencies are trivial:
{emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]

Normalization has Five Normal Forms:







a)1NF: A relation is considered to be in first normal form if all of its attributes have domain that are indivisible or atomic.

A table is in 1NF if and only if its satisfies the following five conditions:

  • There is no top-to-bottom ordering to the rows.
  • There is no left-to-right ordering to the columns.
  • There are no duplicate rows.
  • Every row and column intersection contains exactly one value from the applicable domain.
  • All columns are regular
  • Each attribute must contain only a single value from its predefined domain.


  • Table is in 1NF (First normal form)
  • No non-prime attribute is dependent on the proper subset of any candidate key of table.
  • Based on Fully Functional dependency.

An attribute that is not part of any candidate key is known as non-prime attribute.

c)3NF: A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies. For e.g.X -> Z is a transitive dependency if the following three functional dependencies hold true:

  • X->Y
  • Y does not ->X
  • Y->Z

A table design is said to be in 3NF if both the following conditions hold:

  • Table must be in 2NF
  • Transitive functional dependency  of non-prime attribute on any super key should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency X-> Y at least one of the following conditions hold:

  • X is a super keyof table
  • Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

d)BCNF:A relational schema R is considered to be in Boyce–Codd normal form (BCNF) if it is in 3NF, for every one of its dependencies X → Y, one of the following conditions holds true:

  • X → Y is a  non trivial functional dependency (i.e., Y is a subset of X)
  •  X is a superkey for schema R

BCNF is more restrictive than 3NF.While decomposing relation to make them in BCNF we may loose some dependencies i.e BCNF does not guarantee the dependency preservation property.

Note:A relation with only two attributes is always in BCNF


  • It should meet all the requirement of 3NF
  • Attribute of one or more rows in the table should not result in more than one rows of the same table leading to multi-valued dependencies.
  • Every relation in 4NF is in BCNF


Fifth normal form (5NF), also known as project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in it is implied by the candidate keys.

A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.