Database system concept notes-Chapter 8: Relational database design

Article Directory

Chapter 8: Relational Database Design

This chapter is more difficult, I did not learn very well, some content needs to be added at the end of the term

8.1 Features of good relationship design

8.1.1 Design choices: a larger model

In the textbook example, we consider combining the instructor table and the department table into a larger table, named inst_dept.

Insert picture description here


After synthesizing this new table, it seems to be beneficial. For example, some queries can be expressed using fewer joins.

However, the synthesized table also has drawbacks. For example, for teachers in each department, the department information (building, budget) must be repeated.
If the original two tables are used, then each budget amount will only be stored once.
This shows that the merged inst_dept table is not good, because it repeatedly stores the budget and has to assume that some users may update the budget amount of one tuple instead of all tuples, which will cause data inconsistency

There are other problems with this merged relationship, such as when a department without a teacher appears, null information will be inserted, etc.

8.1.2 Design choices: smaller models

Assuming from the previous large model inst_dept table, how do we specify that it should be divided into the instructor table and the department table?

We found that in the merged large model, budget will be duplicated, and there will be data redundancy.
In other words, there is a rule. If there is a relationship between dept_name->budget functions all the way, then there is a problem with this table, due to dept_name Not the main code, so the budget will be duplicated, and duplication is redundant

Therefore, due to the existence of this functional dependency, this table needs to be split into two tables, instructor and department

For many databases, it is much more difficult to find the correct decomposition due to a large number of attributes and multiple functional dependencies. Therefore, in order to deal with this situation, the following standard method is required

Note that not all decompositions are beneficial

For example, if employee is decomposed into the following two patterns.

Insert picture description here


Insert picture description here


After decomposing into these two tables, if the employee has the same name, then the result of the natural connection between the two tables will be different from the original data, and the problem of inconsistent information will occur. , So this decomposition is undesirable. The

Insert picture description here


above decomposition is called lossy decomposition . Conversely, if the data after decomposition is still consistent, it is called lossless decomposition

8.2 Atomic domains and first normal form

If a domain is considered an indivisible unit, then the domain is atomic

If all attributes of a relational pattern R are atomic , then this relational pattern R belongs to the first normal form (1NF)

Pay attention to the correct understanding of this atomicity, there are specific examples in the textbook

8.3 Decomposition using functional dependencies

First of all, what is the purpose of learning and using functional dependencies?

Using functional dependencies, existing relationships can be decomposed. Therefore, the essential function of functional dependencies is to decompose existing redundant relationship patterns.

About the introduction of subsequent related symbols

Insert picture description here


Insert picture description here

8.3.1 Code and functional dependencies

In the real world, data usually has various constraints, and instances that satisfy all such constraints are called legal instances

Constraints in the real world can be converted into codes in the database (super codes, candidate codes, master codes)

Reviewing the definition of hypercode. The definition of

Insert picture description here


functional dependence.

Insert picture description here


From another perspective, the dependency reflected by hypercode is a special case of functional dependence.

Insert picture description here


That is, functional dependence can be used to express constraints that cannot be expressed by hypercode.

Example: for the relational model inst_dept

inst_dept(ID,name,salary,dept_name,building,budget)

In this mode, the function depends on dept_name->budget to be established. The
attribute pair (ID, dept_name) constitutes a supercode of inst_dept

Two ways to use functional dependencies

  • Determine whether the instance of the relationship satisfies the given functional dependency set F
  • Explain the constraints on the set of legal relationships. If we want to consider only the relationship satisfying the functional dependency set F on R, we say that F is established on r®

Trivial function depends on

Insert picture description here


closure of F set

Insert picture description here

8.3.2 Boyce-Codd Paradigm

Boyce-Codd paradigm (BCNF) can eliminate redundancy based on functional dependency discovery

The condition for judging that the relational pattern R with functional dependency set F belongs to BCNF is that one of the

Insert picture description here


two conditions is satisfied.

Examples

Insert picture description here


Insert picture description here


of general rules for decomposition of relational schemas that do not belong to the BCNF schema

Insert picture description here

8.3.3 BCNF and stay dependent

In the ER diagram of the university database in Chapter 7, a student can only have one mentor. If we make a small change to the relationship model, so that a student can have multiple tutors, but a department can only have one at most

The modified ER diagram is as follows.

Insert picture description here


For the new ER diagram, the previous relationship mode remains unchanged, but the exported dept_advisor mode becomes

dept_advisor(s_ID,i_ID,dept_name)

Then the following two function dependencies are established on dept_advisor

  • i_ID->dept_name
  • s_ID,dept_name->i_ID

Note that dept_advisor does not belong to the BCNF mode. If you decompose according to the BCNF decomposition rules, you will get the following two tables

(s_ID,i_ID)
(i_ID,dept_name)

But after decomposing in this way, although the new model satisfies BCNF, it does not meet the dependencies we require

The above decomposition is not to maintain dependence , because we often want to maintain dependence, we need to consider a weaker paradigm than BCNF, called 3NF

8.3.4 Third Normal Form

The definition of the third paradigm

Insert picture description here


Note that any relational model that satisfies BCNF satisfies 3NF, so BCNF is a stricter paradigm than 3NF

8.3.5 Higher paradigm

Note that BCNF decomposition is not necessarily perfect, and there may be problems, such as the following example.

Insert picture description here


Therefore, solving these problems requires multi-value dependence and a higher paradigm.

8.4 Functional dependency theory

8.4.1 Closure of functional dependency set

Given the functional dependency F on the pattern, we can prove that some other functional dependencies also hold on the pattern. We call these functional dependencies "logically implied" by F.
When testing the paradigm, you can’t just consider the given set of functional dependencies, you also need to consider all the functional dependencies that are established in the model.

Definition of logical implication

Given a relational pattern r (R ), if every instance of r (R) that satisfies F also satisfies f, then the functional dependency f on R is logically implied by the functional dependency set F on r

Example:

Insert picture description here


Armstrong's axiom

Insert picture description here


calculation algorithm for F+

Insert picture description here

8.4.2 Closure of attribute set

Conceptual example of attribute set closure

Insert picture description here



Insert picture description here

8.4.3 Regular coverage

For the functional dependency set F of a relational model, whenever a user performs an update on the relation, the database system must ensure that this update does not destroy any functional dependencies

Irrelevant attributes

If removing an attribute in the functional dependency does not change the closure of the attribute dependency set, the attribute is said to be irrelevant

Formal definition of irrelevant attributes. Definition of

Insert picture description here


regular

Insert picture description here


coverage. Algorithm

Insert picture description here


for calculating regular coverage. Examples of finding regular coverage.

Insert picture description here


Note that in some cases, there may be more than one regular coverage.

8.4.4 Non-destructive decomposition

Definition of Lossless Decomposition

Insert picture description here


More generally speaking, lossless decomposition means that the result of the natural connection between the two after the decomposition is the same as before the decomposition.

select * from
r1 natural join r2

versus

select * from
r

The same result

For binary decomposition, the method of judging lossless connection

Insert picture description here

8.4.5 Stay Dependent

Remaining dependent after decomposition is called maintaining dependent

Keep the specific definition of dependency

Insert picture description here

8.5 Decomposition algorithm

8.5.1 BCNF decomposition

1. Determine whether a relational pattern belongs to the simplified way of BCNF 2. BCNF

Insert picture description here


decomposition algorithm

If a relational model does not belong to BCNF and needs to be converted to BCNF, it can be converted using the BCNF algorithm. The specific steps of the algorithm are as follows:

Insert picture description here


BCNF decomposition example

For a relational model

Insert picture description here


Insert picture description here


, the final decomposition result is obtained after continuous iteration of the algorithm

8.5.2 3NF decomposition

Convert the model to a 3NF lossless decomposition algorithm and maintain dependence

Insert picture description here

8.5.4 Comparison of BCNF and 3NF

Regarding the two paradigms of database design: 3NF and BCNF, one of the advantages of 3NF is that we can always get 3NF design under the premise of satisfying losslessness and maintaining dependence. But 3NF also has shortcomings. Sometimes it is necessary to use a null value to indicate some potentially meaningful connections between data items.

The design goal for the database on which the application function depends is

  • BCNF
  • Lossless
  • Stay dependent

Since these three requirements cannot always be met, we have to choose between BCNF and 3NF

8.6 Decomposition using multi-valued dependencies

The part of multi-value dependence is not important, and it is not involved in the exam, so it is omitted

Two, PPT supplementary content

1. Candidate code solution theory

Solving the candidate code of a relation is an NP problem, but there is a relatively simple method

For a given relational pattern R(U,F), its attributes can be divided into 4 categories

  • Type L: Functions that only appear in F depend on the attributes on the left
  • Type R: Functions that only appear in F depend on the attributes on the right
  • Type N: Attributes that do not appear on either side of the functional dependency of F
  • LR class: the function of F depends on attributes that appear on both sides


Insert picture description here


Examples of sufficient conditions for quickly solving candidate codes : Set the relational model R(U,F), U = {A, B, C, D}, F = {D->B, B->D, AD->B, AC ->D}, find the candidate code of R

A and C are attributes of type L, AC must be a member of R's candidate code, (AC)F+ = {ACDB}, so AC is the only candidate code of R

Inference

Insert picture description here


Theorem 2

Insert picture description here


Theorem 3

Insert picture description here


Example Questions

Insert picture description here


Inference

Insert picture description here

2. Judgment of lossless connection-table method

Use the following methods to determine whether it is a lossless connection.

Insert picture description here


Insert picture description here


Specific examples are available on the PPT.

3. Second Normal Form (2NF)

What kind of relationship belongs to the second normal form?

Insert picture description here