Sunday, 6 September 2009

what is normalization?

what is normalization?
Basically, it's the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminate redundant data (for example, storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen.

First normal form (1NF) sets the very basic rules for an organized database:

- Eliminate duplicative columns from the same table.
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second normal form (2NF) further addresses the concept of removing duplicative data:

- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors through the use of foreign keys.

Third normal form (3NF) goes one large step further:

- Remove columns that are not dependent upon the primary key.

Finally, fourth normal form (4NF) has one requirement:

- A relation is in 4NF if it has no multi-valued dependencies.

Rules of Data Normalization

Eliminate Repeating Groups - Make a separate table for each set of related attributes, and give each table a primary key.

Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key, remove it to a separate table.

Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of the key, remove them to a separate table.

Isolate Independent Multiple Relationships - No table may contain two or more 1:n or n:m relationships that are not directly related.

Isolate Semantically Related Multiple Relationships - There may be practical constrains on information that justify separating logically related many-to-many relationships.

Optimal Normal Form - a model limited to only simple (elemental) facts, as expressed in ORM.

Domain-Key Normal Form - a model free from all modification anomalies.

A small example:

DATA

Name

Company

Company_Address

urlA

urlB

mah

AAA

Street1

AAA.com

BBB.com

iham

BBB

strret2

AAA.com

BBB.com

After First Normal Form

Table1

Id

Name

Company

CompanyAddress

Host

1

mah

AAA

Street1

AAA.com

1

mah

AAA

Street1

BBB.com

2

iham

BBB

strret2

AAA.com

2

iham

BBB

strret2

BBB.com

After Second Normal Form

Table2

Id

Name

Company

CompanyAddress

1

mah

AAA

Street1

2

iham

BBB

strret2

Table3

P_Id

F_Id

Host

1

1

AAA.com

2

1

BBB.com

3

2

AAA.com

4

2

BBB.com

After Third Normal Form

Table4

Id

Name

RF_Id

1

mah

1

2

iham

2

Table5

PF_Id

Company

CompanyAddress

1

AAA

Street1

2

BBB

strret2

Table6

H_Id

F_Id

Host

1

1

AAA.com

2

1

BBB.com

3

2

AAA.com

4

2

BBB.com

Finally Data Relationships

Table7

Id

Name

RF_Id

1

mah

1

2

iham

2

Table8

PF_Id

Company

CompanyAddress

1

AAA

Street1

2

BBB

strret2

Table9

H_Id

Host

1

AAA.com

2

BBB.com

Table10

HID

FH_Id

F_ID

1

1

1

2

1

2

3

2

1

4

2

2


The output tables are Table7, Table8, Table9 and Table10. Finally we have four tables.