Understanding Database Normalization

The process of separating data into multiple tables, creating relationships between these tables, grouping all similar information within the same table, and ensuring no duplication of data is called Normalization.

Unlike spreadsheets, each database table should only contain data that pertains to that table, and depends on the Primary Key field. You can create a table that combines all of the information that you need, similar to a spreadsheet, but then you lose the advantages of a database over a spreadsheet.

  1. The first step in normalizing your data which is called First Normal Form requires that each field in a table contains only a single value and that there are no repeating groups of data (fields). (It is a simple process later to combine the field data for forms or reports using a query).
    • Each table should also have a unique identifier called a Primary Key.
    • This Primary Key must be unique for each record in the table. The Primary Key can be one field or a combination of multiple fields.
    • If there is a value needed for the table that would be repeated into multiple fields, this value should be created in a new lookup table.
    • Lookup tables are helpful to avoid insertion, update, and deletion anomalies. If your table contains Order Detail information and also includes a ProductID and Product_Description, the Product_Description would be duplicated many times leading to insertion, update, and deletion anomalies. How to fix this?
      • Create a new Product table.
      • Move the ProductID and Product_Description to the new Product lookup table.
      • Then the new Product table can be linked to the Orders table with a new Foreign Key ProductID. ProductID will be a combo box that you can customize using hidden fields to show whatever descriptor you wish for that field.
      • This will resolve the duplication error from the Orders table, and makes all of the data in the Product table available in a query with the Orders table.
      • Another example; in a typical Contact table you would have fields titled FirstName, LastName, StreetAddress, City, State, and Zip. The State field could be repeated many times if you have several contacts who live or work in the same state, so the State field data would be pulled from a 2nd lookup table named State. Since the state field data resides in the state table, and the Contact table only points to the State table to find the state data, repeated fields are avoided.
    • See a diagram of related tables.
  2. The second step in normalizing your data which is called Second Normal Form is moving any data not directly dependent on the current tables Primary Key into a new table.
    • Each table should only hold data that is related to one "subject". For example, the Order table should not hold any Client information. The Client table can be linked to the Order table using a Foreign Key, which allows all of the data in the Client table to be accessible in a query to the Order table.
    • Any values that apply to multiple records should be moved to a new table, and the Primary Keys for these tables would become Foreign Keys in the previous table to relate the data.
  3. The third step in normalization is called Third Normal Form, and requires that any fields that depend on a non-key field be removed.
    • In the Order Detail table example above by moving the ProductID and Product_Description to the new Product lookup table the non-key dependent field Product_Description would be removed from the Orders table and the Orders table would achieve third normal form.
    • This means that if you have a value that performs a calculation, or is dependent on a non-key field that these fields can be created in the form or report that you need them in, and created on the fly instead of saving the value into the database.
    • Saving these calculated or dependent values can increase errors because of this dependency.
    • For example, the field InvoiceTotal would be dependent on the non-key fields, InvoiceAmount + InvoiceTax + InvoiceShippingFee, so there would be no useful reason to save this value to the database. It would be very easy to create an unbound field named InvoiceTotal on your forms or reports, so you will have a freshly calculated value each time you need it.
    • There can be exceptions to this rule, but in most instances it can and should be followed. An exception that I can think of is where an outgoing payment amount is based upon an incoming payment amount, so this value would change with each incoming payment, would need to be reported on each value, and would need to be saved within the database. Or if there are calculations based on prices that could change this would also be an exception, and a reason to denormalize the table.

For most databases normalizing to Third Normal Form is sufficient to prevent insertion, update, and deletion anomalies.

For more information please see Microsoft's web site.