Denormalization of relational database design paradigm

1.  Normal Format (NF)

1.1 Definition: A set of relational patterns that conform to a certain level, indicating the degree of rationalization of the connection between various attributes within a relation

(1) Paradigm is a concept in discrete mathematics

(2) The paradigm goal is to minimize data structure redundancy under the premise of satisfying organization and storage

(3) The higher the level of the paradigm, the more standard the level of the table. The paradigm currently applied to the database has the following levels:

① The first normal form: the atomicity of 1NF data fields to enhance the usability of data

② The second normal form: 2NF removes partial dependence on fields, establishes data relevance, and reduces data redundancy

③ The third normal form: 3NF cancels the transitive dependence of fields, divides related entities independently, and reduces data redundancy

④ Denormalization: In order to improve the efficiency of data query, deliberately increase data redundancy ( the contradiction between disk space utilization and access efficiency )

Example:

The number of bytes used by the department is much less than the number of characters, and the optimization effect is achieved (the disk space occupied by N 1 and N armed departments is definitely different)

1.2 Summary:

(1) Paradigm is a mathematical theory used to reduce data redundancy in relational databases

(2) The more paradigms that are satisfied, the more in line with the high-standard table design

(3) The paradigm has 6 layers, but the design of the database usually only requires 3 layers to be satisfied (no matter how high it is, it will reduce the efficiency of the database)

2. First Normal Form 1NF

The first paradigm: minimum requirements for database design

2.1, 1NF concept: data field design must meet atomicity

(1) 1NF requires that the field data can be directly applied without splitting

(2) If the data needs to be split when it is used, it will violate 1NF

2.2. Design steps

(1) Whether the designed field needs to be split again when it is used

(2) Split the data into the smallest unit (used), and then design it into fields

(3) Meet 1NF

2.3. Case:

Special attention: Atomicity is for business. If the business needs to use the month and day separately. Then the start time is not yet atomic, and it must continue to be split into months and dates.

2.4 Summary

  • 1NF is to minimize the granularity of the field data to ensure that the data does not need to be split when it is taken out and used.
  • 1NF is the most basic specification to meet data sheet design

3. Second Normal Form 2NF

3.1, 2NF concept: field design cannot be partially dependent

(1) Partial dependence : First, there is a composite primary key in the table, and secondly, some fields do not rely on the entire primary key, but only a part of the primary key

(2) Partial dependency resolution: make all non-primary attributes depend on a candidate key

(2.1) The simplest way: Cancel the composite primary key (generally choose the logical primary key instead, but the essence is still the composite primary key), all non-primary attributes depend on the primary attribute (logical primary key)

(2.2) The correct way: separate some dependencies into tables

Note: The logical primary key is a new field added by itself to do some additional services, which is not in the data table itself.

3.2 Design steps

(1) Is there a composite primary key in the table?

(2) Are there any other fields that depend on part of the primary key?

(3) If there are partial dependencies, split the partial dependencies into tables independently

(4) Meet 2NF

3.3. Case:

There are two primary keys for students and courses. But only grades are completely dependent on these two primary keys. Other fields such as gender are partially dependent

Although it is a composite primary key, there is no partial dependence, but a complete dependence. problem solved.

3.4 Summary

(1) 2NF is based on the premise of satisfying 1NF

(2) The goal of 2NF is to cancel partial dependencies in the table

① Only when the primary attribute (primary key) is a composite primary key can it exist

② The solution is to make the existing relationships of partial dependencies separate into tables (student table and curriculum schedule), and there is no independent table of partial dependencies (student score tables).

(3) 2NF can achieve a large degree of data redundancy reduction

4. Third Normal Form 3NF

4.1, 3NF concept: field design cannot have transitive dependencies (1) Transitive dependency : a non-primary attribute of the field does not directly depend on the primary attribute, but is passed to the primary attribute by relying on some other non-primary attribute (2) Transitive dependency resolution: Let the fields that rely on non-primary attributes and dependent fields become independent tables 4.2 Design steps (1) Make sure that all fields in the table are dependent on the main attribute (2) If there is no direct dependence on the main attribute, but by relying on other attributes to generate dependence, an independent table is formed (3) Meet 3NF 4.3. Case:

Although it is a composite primary key, there is no partial dependence, but a complete dependence. problem solved.     4.4 Summary (1) 2NF is based on the premise of satisfying 1NF (2) The goal of 2NF is to cancel partial dependencies in the table ① Only when the primary attribute (primary key) is a composite primary key can it exist ② The solution is to make the existing relationships of partial dependencies separate into tables (student table and curriculum schedule), and there is no independent table of partial dependencies (student score tables). (3) 2NF can achieve a large degree of data redundancy reduction

5. Denormalization

5.1. The concept of denormalization : in order to improve the efficiency of data query and deliberately violate the rules of the paradigm

(1) The goal of denormalization is to improve the efficiency of data access

(2) The so-called denormalization is to reduce the associated query between tables (reduced efficiency), and deliberately increase data redundancy steps

5.2, application scenarios

(1) Part of the data in the table comes from other tables (usually only some simple data from other tables is needed)

(2) The current table will be queried frequently

(3) The data sheet has a large amount of data

(4) Consider using denormalization

5.3. Case:

5.4. Summary

(1) Denormalization is only a way of sacrificing disk space in order to improve query efficiency when the amount of data is large and the query efficiency is low."

(2) The design of the data table after de-normalization must not fully meet the requirements of the paradigm (2NF/3NF)