2021 soft exam database engineer review notes record

Basic knowledge of computer hardware
arithmetic unit: master! ! ☆☆☆
ALU arithmetic logic unit: processing data
AC accumulating register: providing a work area
DR data buffer register: temporarily storing the command
PSW status condition register.
Shorthand: Operation plus slow state (operation plus slow state)

Controller: ☆☆☆
IR instruction register: save the instruction
PC program counter: save the address of the next instruction
AR address register: save the address
ID of the memory unit Instruction decoder: translate mov and other instructions.

2. Memory:
Internal memory: fast speed, small capacity;
external memory.
3. Input devices and output devices

Base conversion ☆☆☆
Binary B, Octal O, Decimal D, Hexadecimal H.
Convert other bases to decimal: expand by right; convert from decimal
to other bases: integer remainder method (from the Internet).
Fast conversion of octal or hexadecimal to binary: each data in octal is converted into a 3-digit binary, and each digit in hexadecimal is treated as a decimal number and converted into a four-digit binary.

Classification of the
bus Data bus db: two-way, data transmission, determines the number of bits of data exchanged between the cpu and other devices each time.
Address bus ab: one-way, transfer address information. The width determines the maximum addressing capability of the cpu.
Control bus cb: transfer control signals, timing signals, and status information. The information transmission of each line is one-way, but the overall cb is two-way.

2. Computer architecture and storage system

Pipeline technology
Pipeline cycle: the slowest among the subtasks.
Pipeline execution completed n time = execution time + (n-1) * pipeline cycle

Throughput rate: the number of instructions executed per unit time. (Actual throughput rate)
Throughput rate (maximum throughput rate) = 1/pipeline cycle

The address mapping method in the high-speed cache:
(1) Direct mapping: The corresponding relationship between the block of the main memory and the block of the cache is fixed.
Main memory address: the address within the block number in the main memory area code.
Excellent: simple address conversion and fast access speed;
lack: high block conflict rate and insufficient use of Cache space.
(2) Fully connected image: Both the main memory and the Cache memory are divided into blocks of the same capacity. (That is, the capacity of each block of the main memory and the Cache is equal)
Excellent: Flexible, low block conflict rate, high Cache utilization, conflicts will only occur after the blocks in the Cache are fully filled;
Lack: complex transformation and high cost , The conversion speed is slow.
Main memory address: Main memory block number. Address in block
(3) Group concatenated image: a compromise between the first two methods. Regroup the blocks in the cache, the group adopts the direct mapping method, and the block adopts the fully connected method.
That is: Group 0 in any area of ​​the main memory can only exist in Group 0 of the cache. The blocks in the group can be stored in any block of the same group in the Cache.
Number of main memory address bits = area code + group number + main memory block number + address in block
Cache address bits = group number + block number in group + address in block

Cache performance analysis:
Cache memory equivalent weighted average access time = cache hit rate * cache access time + (1- hit rate) * main memory access time

Virtual memory (physically non-existent): actually a kind of logical memory;
associative memory is a memory that is accessed according to content.

Related calculations for addressing: 1 byte byte = 8 bit bit
☆☆☆The memory is addressed by byte byte ! !
Example: The memory is 8B (bytes), how many bits are needed for addressing?
Answer: 8=2^3, so it occupies 3 places.

Example: The main memory capacity is 1MB, the Cache capacity is 16KB, and the block size is 512B. Direct image.
Q: How many bits is the main memory address?
Answer: ①Calculate how many areas there are: 1MB/16K=64 areas, 64->2^6, so the main memory area number needs 6 digits when addressing (that is, it needs to be programmed from 000000 to 111111);
②Calculate again How many blocks are there in each area of ​​the main memory, and how many blocks are there in the cache: 16KB/512B=32 ->2^5, so the block number in the area needs to be addressed with 5 bits;
③The address in the block is 512B bytes ->2 ^9, so 9 bits are required for each block addressing.
Therefore, the main memory area number of this example needs 6+5+9=20 bits for addressing.

☆☆☆Security, reliability and system performance evaluation
①Symmetric encryption technology: DES (56-bit to 64-bit encryption), 3DES, RC-5, IDEA, AES and other algorithms;

Asymmetric encryption technology: use 2 keys, 1 public and 1 non-public private key.
② Asymmetric encryption has two different physiques: encryption model and authentication model.
Encryption model: receiver B's public key (encryption model) and private key.
Authentication model (the sender's identity can be confirmed): sender A's private key is encrypted, and A's public key is decrypted.

Asymmetric encryption technology: stands for algorithm RSA. Asymmetric encryption is suitable for the encryption of a small amount of data with good confidentiality. Encryption and decryption takes a long time.
Digital envelope technology (a kind of encryption model): to ensure the security of data during transmission.

Information digest algorithm
Hash function: MD2, MD4, MD5 are widely used hash functions.

The hash function can turn a variable-length input into a fixed-length output!
Example: MD5 hash function can convert input into 128-bit fixed-length output;
SHA-1 hash function can convert input into 256-bit fixed-length output.
The difference between digital signature and digital encryption:
digital signature. Only using asymmetric encryption algorithms can guarantee information integrity, identity authentication and non-repudiation, but cannot guarantee confidentiality. Sign it -> only confirm who sent it.
Digital encryption uses the recipient's key pair. As long as you know the recipient’s public key, you can send data to the recipient, but only the only person who has the recipient’s private key can decrypt it. Digital encryption uses a combination of asymmetric and symmetric encryption. Can guarantee the confidentiality of information.

the reliability of a computer system: refers to the probability R(t) of normal operation during the period from its start of operation (t=0) to a certain time t.
Computer system failure rate: refers to the ratio of the number of failed components per unit time to the total number of components λ.
Mean Time Between Failures (MTBF): The average value of the normal working time between two failures. MTBF=1/λ
The maintainability of a computer system MTRF: refers to the average time required from the occurrence of a failure to the repair of the machine.
The availability of computer systems: the efficiency of computer use A.
A=mean time between failures MTBF/(mean time between failures MTBF+computer system maintainability MTRF)

Calculation of computer reliability: ☆☆☆ ①Reliability of
series system:

R=R1 * R2 * R3 *… * Rn

②Reliability of parallel system : R=1-(1-R1)(1-R2)...(1-Rn) that is 1-probability of not working properly

Logical operation:
logical AND: all true is true.
Logical OR: True is true.
Logical negation: negate.
Logical XOR: Difference is true.

Overview of programming languages
High-level programming languages ​​must be translated before they can be understood by computers.
Common translation methods: compilation, interpretation and translation.
①Writing in assembly: Need to translate the assembler into the target program, and then execute the target program;
②Writing in high-level language: Need to translate the interpreter or compiler, and then run it.

Interpreter: execute while compiling.
Compiler: It is divided into compilation phase (source -> target) and run phase.

Compilation may achieve higher efficiency than interpretation (one compilation, multiple execution);
interpretation is more flexible than compilation ; interpretation is more

The basic components of programming language: data, calculation, control, and transportation . (20 years of
real questions) Data components of programming language:
variables and constants.
Global variables and local variables.
type of data:

Basic types (integer type, character type, real type, Boolean type),
special type (empty type),
user-defined type (enumeration type),
structure type (array, structure, union),
pointer type (type*),
abstract Data type (class type), etc.

The infix, prefix (Polish) and postfix expression (reverse Polish)
operators are in the middle, before, and after.
Three kinds of expression conversion.

Binary Tree
Full Binary Tree
Complete Binary Tree (All but the last level is full)

The storage structure of the binary tree: binary tree, binary linked list, trigeminal linked list.

The traversal of the binary tree: pre-order traversal, middle-order traversal, post-order traversal: the position of the root.

Optimal binary tree (Huffman tree): weight, path length (distance from node to root), weighted path length of node (weight multiplied by path length), weighted path length of tree (the length of all leaf nodes in the tree) The sum of the length of the right path).
The structure of the Huffman tree.

Binary search tree (the left side must be less than the root, and the right side must be greater than the root): So the result of the in-order traversal of the binary search tree is increment.

Definition of graph: It consists of two sets, V (a set of non-empty vertices) and E (a set of edges).
In the figure, the data elements in the data structure are represented by vertices, and the relationships between data elements are represented by edges.
The classification of graphs: directed graphs (directed edges are also called arcs, and the starting point is called arc tails), angle brackets, undirected graphs (indicated by parentheses).

Complete graph: If any two vertices in the graph have edges, it is called a complete graph. If it is undirected, it is an undirected complete graph. If there are n vertices, there will be n(n-1)/2 edges.

Related concepts of graphs: degree, in-degree and out-degree, paths, subgraphs, connected graphs (undirected graphs with paths between any two vertices), strong connected graphs (directed graphs with paths between any two vertices) Figure), net (edge ​​weight).

Graph storage structure: adjacency matrix representation, adjacency linked list representation.

The sorting algorithm is summarized in one sentence:
①Direct insertion sorting: Insert the keywords to be sorted in order, find the position in order when inserting, insert directly, and move backward in turn.
②Bubble sorting: compare two adjacent records in turn, and then exchange positions.
③Simple selection and sorting: select the smallest one each time and exchange with the first record that has not been sorted.
④ Hill sorting: divide a number of empty records into a group, perform direct insertion sort, and reduce the interval to 1 in turn.
⑤Quick sorting: Set two indicators to indicate the head and tail, starting from the tail, alternately comparing the pivot record (the first record) with the pivot record (the first record), and swapping positions.
⑥ Heap sorting: repeatedly build the sequence to be sorted into a pile, and take the pile items.
⑦Merge and sort: merge the two records into one group, then merge the four records into one group, and so on.

Find algorithm.
Discovery algorithm.
Related algorithms of the graph.

Process management:
basic concepts of operating systems,
process state transition
, synchronization and mutual exclusion between processes,
semaphore mechanism (pv operation)
banker algorithm,

P operation: S-1, apply for resources. S>=0, P operation can continue to be executed; S<0 is set to blocking state.
V operation: S+1, release resources. If S>0, it means that there is no process waiting, and the V operation can continue; if S<=0, the blocking state is set.

Producer-consumer problem:
single buffer;
multiple buffers: It can be imagined as a warehouse where the producer can put multiple products in the buffer.
S: The amount of mutual exclusion signal, the initial value is 1;
S1: The amount of synchronization signal, the initial value is n, representing the number of products that can be put in the buffer;
S2: The amount of synchronization signal, the initial value is 0, representing the buffer How many products are there in.

Deadlock: Refers to a phenomenon in which two or more processes require each other's resources and cannot continue to run.
There are a total of n resources sharing the same type of resources in the system. When each process requires K resources, how many resources are required at least to avoid deadlock?

Banker's algorithm.

Storage management:
paging storage;
segment management;
paging segment management.

I/O device management.

↓User process↑ Carry out I/O call, format I/O, Spooling
↓Device-independent software (system software)↑Name, protect, block, buffer, allocate
↓Device driver program↑Set device register, check status
Interrupt handler ↑ Wake up the driver when I/O is over
Hardware ↑ Perform I/O operations
(better memorize it!!)

Spooling technology:
purpose: to ease the contradiction between the high speed of the CPU and the low speed of the I/O device.

Disk scheduling algorithm:
Disk scheduling is divided into arm shift scheduling and rotation scheduling. The arm shift scheduling is performed first, and then the rotation scheduling is performed.
The goal of disk scheduling is to minimize the average seek time of the disk.

Time to read disk data = seek time + rotation delay + data transmission time

1. Arm shift scheduling algorithm:
first come, first serve;
shortest seek time first;
scanning algorithm (similar to elevator scheduling);
one-way scanning scheduling algorithm;

Document management and job management

Computer network overview and network basics
Summary of common network equipment knowledge:

Insert picture description here

Common test points: the difference between TCP and UDP, commonly used ports for application layer protocols, and whether it works on TCP or UDP.

Insert picture description here

☆☆☆☆☆Basic knowledge of relational database design
Function dependency, code, multi-value dependency

Functional dependency: Definition: Let R(U) be the relational pattern on the attribute set U, and X and Y are subsets of U. If for any possible relationship r of R(U), there can be no two tuples in r that have equal attribute values ​​on X but unequal values ​​on Y, then the X function determines Y or Y function dependence In X, denoted as: X->Y
Non-trivial functional dependency: if X->Y, but Y is not a subset of X. Generally speaking, non-trivial functional dependencies are discussed.
Trivial functional dependency: if X->Y, and Y is a subset of X.

Note: ① The definition of functional dependency requires that any possible r of the relational model R meets the above conditions. ②Functional dependency is a concept of semantic category. We can only determine functional dependency based on semantics.

Complete functional dependency: In R(U), if X->Y, and for any proper subset X'of X, X'cannot determine Y, then Y is fully functionally dependent on X, denoted as X-> Y.

Part of the function is dependent.

Transfer function dependence: In R(U, F), if X->Y, Y->Z, Y is not a subset of X, Y cannot determine X. It is said that Z is transitively dependent on X.
Candidate code: Let K be the combination of attributes in R(U, F). If K->U, and for any proper subset K'of K, there is K'that cannot determine U, then K is the candidate code of R.

Features: Candidate codes can determine the full code !
If there are multiple candidate codes, choose one as the main code (primary key).
Primary attribute: The attribute contained in any candidate code is called the primary attribute.
Non-primary attributes: attributes that are not included in any candidate code are called primary attributes.
Foreign code (foreign key).

Multi-valued dependency: In the relational model R(U), X, Y, and Z are subsets of U, and Z=UXY. If and only if any relation r to R(U), given a pair of (x, z) values, there is a set of Y values, this set of values ​​is only determined by the value of x and has nothing to do with the value of z, then it is called " Y multi-value depends on X" or "X multi-value determines Y" is established, denoted as: X->->Y.
Trivial multi-valued dependency: Z is the empty set. Non-trivial multi-valued dependency: Z is not an empty set.

Multi-value dependent attributes:

①Symmetry: X->->Y, Z=UXY, then X->->Z.
②Transitivity: X->->Y, Y->->Z, then X->->ZY.
③Functional dependence can be regarded as a special case of multi-value dependence. ④X->->Y, X->->Z, then X->->YZ, X->->Y∩Z, X->->ZY.

☆☆☆☆☆ Standardization of basic knowledge of relational database design
Full functional dependence:
Example: In the relationship of transcript (student number, course number, grade),
complete functional dependence: (student number, course number) → grade, true subset student number cannot The grade is determined separately, and the true subset course number cannot determine the grade alone, so (student number, course number) → grade is completely functionally dependent.
Partial functional dependence: In the relational model R(U), if X→Y, and there is a proper subset X0 of X, such that X0→Y, then Y is called partial functional dependence on X.

Candidate code: If the value of an attribute or attribute group in the relationship can uniquely identify a tuple, and its proper subset cannot uniquely identify a tuple, then this attribute or attribute group is called a candidate code.
That is: the candidate code can determine the full code (each attribute), and the proper subset cannot determine the full code (each attribute).
Attributes that have appeared in candidate codes are called primary attributes;
non-primary attributes are attributes that are not included in any candidate code.

1NF (First Normal Form): The attributes of each column in the table cannot be divided.
1NF problems: ①Data redundancy; ②Update abnormal (data is inconsistent after modification); ③Insert abnormal; ④Delete abnormal.

2NF (Second Normal Form): Eliminates part of the functional dependence of non-primary attribute pairing codes. In short, the attributes of each column in the table cannot be separated, and the non-primary attributes are completely dependent on the primary attributes.

3NF (Third Normal Form): Eliminates the transfer function dependence of non-primary attributes on the code. In short, the attributes of each column in the table cannot be divided, and the non-primary attributes are completely dependent on the primary attributes, and the transfer function of each non-primary attribute is not dependent on the primary attributes.

BCNF (Bacchus paradigm or BC paradigm): When 3NF eliminates the partial functional dependence and transfer function dependence of the main attribute pair code, it is the BCNF paradigm.
The nature of the BC paradigm: ①All non-primary attributes are fully functionally dependent on every code;
②All primary attributes are fully functionally dependent on every code that does not contain it;
③No attribute is completely functionally dependent on any non-code A set of attributes.

4NF (Fourth Normal Form): 4NF is a restriction that does not allow non-trivial and non-functional multi-valued dependencies between the attributes of the relational model.

Note: If only functional dependencies are considered, the highest degree of normalization of the relational model is BCNF, and if multi-valued dependencies are considered, the highest degree of normalization of the relational model is 4NF.

5NF (Fifth Normal Form): Not tested. Connection dependent.

Summary: 1NF 2NF 3NF BCNF 4NF 5NF
can be decomposed to convert a relational model of a lower-level paradigm into several relational models of a higher-level paradigm. This process is called normalization.

☆☆☆Armstrong Axiom System:

Armstrong axiom system supposes the relational model R<U,F>, where U is the attribute set and F is a set of functional dependencies on U, then there are the following inference rules:
①> A1 self-reflexive law: if Y⊆X⊆U, then X→Y is implied by F (that is: a large set can determine a small set);
②> A2 augmentation law: if X→Y is implied by F, and Z⊆U, then XZ→YZ is implied by F;
③ A3 Transmission law: if X→Y and Y→Z are implied by F, then X→Z is implied by F.
According to the above three inference rules, the following three inference rules can be derived:
④ Combination rule: If X→Y, X→Z, then X→YZ is implied by F;
⑤Pseudo transfer rule: If X→Y, WY→Z, Then XW→Z is implied by F;
⑥Decomposition rule: if X→Y, Z⊆Y, then X→Z is implied by F.

☆☆☆Candidate code solution

① Find out which attribute is L, R, LR, NLR;
② Combine the attributes of L and NLR to find the closure of the attribute (according to the functional dependency in the question);
③ If the attribute closure obtained in step ② If it is the complete set of attributes U, the attribute of ② is the candidate code; if the closure is not the complete set U, then add the attributes of LR to the union, and then continue to find the closure of the attributes of the union.
Lossless connection:
U1∩U2 -> U1-U2 ∈ F+ is established
U1∩U2 -> U2-U1 ∈ F+ is established, F+ is the original functional dependency set,

This means: the calculated functional dependency result is a subset of the original functional dependency, which is a lossless connection.

Minimal functional dependency set:
① The right part of any functional dependency in
F contains only one attribute; ② There is no such functional dependency X→A in F, so that F is equivalent to F-{X→A};
③> F There is no such functional dependency X→A, X has a proper subset Z such that F-{X→A}∪{Z→A} is equivalent to F.

PS: When the candidate code is a combination of attributes, there will be a partial functional dependency problem; when the candidate code is a single attribute, there is no partial functional dependency problem! ! !

Chapter 8 SQL
8.1 SQL Overview and Database Definition
8.2 Data Operations
8.3 Authorization and Triggers
8.4 Embedded SQL and Stored Procedures

Chapter 9 Non-relational database NoSQL

CAP theory: consistency, availability, partition tolerance.
BASE theory: BA: Basically usable, S: Soft state, E: Final consistency.

Classification of common NoSQL databases:

Insert picture description here

Chapter 10 System Development and Operation Knowledge
PERT Diagram and Critical Path (the longest) .

Module independent standards: coupling and cohesion. -> High cohesion and low coupling, the higher the independence of the module.

Six types of coupling. Coupling refers to the closeness of the connection between modules. The higher the coupling, the worse the independence of the modules.
① No direct coupling. There is no direct relationship between the two modules;
②Data coupling. There is a calling relationship between the two modules, and simple data values ​​are transferred;
③Mark coupling. The data structure is transferred between the two modules, in fact the address of the data structure is transferred;
④Control coupling. What is passed is the control variable, and the called module selectively executes a certain function in the block;
⑤Common coupling. Coupling between those modules that interact through a common data environment;
⑥ content coupling. The highest degree of coupling. One module directly uses the internal data of another module. Often appears in assembly language.
Six types of cohesion. Cohesion refers to the closeness of the connections between the elements within the module. The lower the degree of cohesion, the worse the independence of the module.
① Accidental cohesion. There is no connection between the processing elements in a module;
②Logical cohesion. Several logically similar functions are executed in the module, and the parameters are used to determine which function the module completes;
③Time aggregation. The module formed by combining the actions that need to be executed at the same time is the time aggregation module;
④Communication cohesion. Means that all processing elements in the module operate on the same data structure, or that each processing uses the same input data or produces the same output data;
⑤Sequential cohesion. It means that each processing element in a module is closely related to the same function and must be executed sequentially. The output of the previous function element is the input of the next function element;
⑥Functional cohesion. The strongest cohesion. Refers to all elements in the module to complete a function together, one is indispensable.

Software testing methods:
(1) Static testing: ①Manual testing; ②Computer-aided static analysis.
(2) Dynamic test:
1) Black box test method: functional test. Commonly used techniques: ① Equivalence class division; ② Boundary value analysis; ③ Wrong guessing; ④ Causality diagram, etc.;
2) White box testing method: structural testing. Commonly used techniques: ①Logic coverage; ②Circular coverage; ③Basic path testing.

The eleventh chapter database design
connection to the conversion of relational mode.

Chapter Twelve Business (※※※※※)
12.1. Basic Concepts of Business

The four characteristics of transactions: ACID: atomic, consistent, isolated, and permanent.

A transaction is a sequence of operations, and these operations "either do all or none".
Transaction definition:
BEGIN TRANSACTION: transaction start
END TRANSACTION: transaction end
COMMIT: transaction commit
ROLLBACK: transaction rollback.

Lost modification: Both transactions are modified. One of the changes was overwritten.
Non-repeatable read: the value read twice is different;
phantom read: the number of data read twice is different.
Dirty data: ROLLBACK after modification and read again.

Four isolation levels of transactions:

READ UNCOMMITTED: Read uncommitted. Nothing is resolved.
READ COMMITTED: Read has been submitted. Solve the problem of reading dirty data.
REPEATABLE READ: It can be read repeatedly. Solved the problem of reading dirty data and non-repeatable reading.
SERIALIZABLE: Serializable. Solve the problems of reading dirty data, non-repeatable reading, and phantom reading.

Three-level lockout agreement:

The first-level lockout protocol: solves the loss of modifications; the
second-level lockout protocol: solves the loss of modifications and read dirty data; the
third-level lockout protocol: solves the loss of changes, read dirty data, and cannot read repeatedly.

12.2 Concurrency control of the database

After the exclusive lock is added, no lock can be added;
after the shared lock is added, only the shared lock can be added.

If a transaction follows the two-stage lock protocol, it must be serializable.

12.3 Database backup and recovery

Transaction failure: automatic system recovery;
system failure (power failure, cpu broken or something): log, redo+undo
media failure: DBA participation, log + media

Checkpoint: Redo redo for commits after checkpoint, undo undo for uncommitted ones.

Chapter 13 Cloud Computing and Big Data Processing
Chapter 14 Mainstream Database Application Technology
Chapter 15 Standardization and Basic Knowledge of Intellectual Property

☆☆☆☆☆☆☆ afternoon topic☆☆☆☆☆☆☆

1. Data flow diagram (Chinese reading comprehension test)
[ PS: No test in 2020, 2021, failure recovery, check point test ]

Keep the balance between the parent map and the child map.
The input and output data flow of a certain process in the parent graph must be the same in quantity and name as the input and output data flow of its sub graph.
If an input (output) data flow in the parent graph corresponds to several input (or output) data flows in the sub graph, and all the data items that make up these data flows in the sub graph are exactly this one data flow in the parent graph, Then they are still balanced.

Keep data conservation.

The data in all output data streams of a process must be directly obtained from the input data stream of the process. Or it is the data that can be generated by this processing.
Each process must have both an input data stream and an output data stream.

2. ER diagram (test contact type)
Two-party contact: AB BA: the relationship between two entities; for
example: advertisers, advertisements (2013).
Sponsors, teams; sponsors, players (2015)

Three-party connection: AB BC: The relationship between the three entities. :
Three-party contact example: project, supplier, part.
Random check relationship among warehouse management, warehouse and clothing (2011 real question).
Prescription contact of patients, doctors and medicines (2012 real question).
The financial relationship between customers, account managers, and fund managers (2014).

Aggregation: Regarding the connection as an entity, making connections with other entities. That is, the two entities A and B are connected first, and then the connection itself is connected with C. The connection between A and B needs to be framed by a box.
Aggregation example:
①The sales relationship between the salesperson and the product, and then contact the store purchase associated with the order entity (2016. The salesperson sells the product before the store will be delivered to the door)
②Tenant, apartment, property manager (2017 rental The guest must first rent the apartment before signing the contract with the property manager); the
tenant contacts the apartment with the fault registration and then the maintenance worker has an aggregation relationship (2017).
③The order relationship between the dispatcher, the shipping company, the customer and the product, the three have a distribution relationship (2018, aggregation + tripartite contact)

PS: 2019 examines the attribute of connection; 2020 examines simple two-party connection.

Weak entity (double box rectangle): It depends on the existence of another entity. For example: the parent entity (weak entity) depends on the student entity to exist.

Fruit body (add a vertical line to the left and right in the rectangle) (old version: use a straight line or add a circle on the straight line): An entity set can be divided into several child entities according to certain characteristics. Example: Employees can be divided into sub-entities such as field affairs and choreography.

The difference between tripartite connection and aggregation: tripartite connection must involve the participation of three parties at the same time, and one is indispensable. The aggregation is sequential, the two entities are connected first, and then the third entity is connected.

Property: ellipse.
Multi-value attribute: double-line ellipse. Put another ellipse outside the ellipse.
Derived attribute: dotted ellipse.

Conversion of ER diagram to relational model:
(1) An entity can be converted into a relationship, the attribute of the entity is the attribute of the relationship, and the code of the entity is the code of the relationship;
(2) A connection can also be converted into a relationship, the attribute of the connection And the codes of the entities connected by the connection are converted to the attributes of the relationship, but the code of the relationship will change according to the type of connection:
1:1 connection, the codes of the entities at both ends become the candidate codes of the relationship;
1: *connection, n-end The code of the entity becomes the code of the relationship;
*:* connection, the combination of the entity codes at both ends becomes the code of the relationship.
The steps of drawing an ER diagram:
1. Judge according to the text description in the demand analysis;
2. Judge according to the logical structure design (relation mode);
3. Judge according to the actual life experience.

3. Relation standardization (relationship model, functional dependency, and paradigm)
Frequently asked questions:
1: Find candidate codes. The meaning of candidate codes: Candidate codes can determine all attributes in the relational pattern, and any proper subset of candidate codes cannot determine all attributes alone.

①Find the set of functional dependencies
②Judging candidate codes
③Distinguish primary attributes and non-primary attributes, determine the paradigm
④Decompose functional dependencies

See which attributes or combination of attributes can determine all attributes.
If there is a numbered ID, consider first. Generally, names, names, etc. are not used as candidate codes, except when there is no better choice.
Primary attribute: The attribute contained in any candidate code is the primary attribute.
Non-primary attributes: attributes not included in any candidate code.

The code refers to the candidate code or the main code.
The primary key is the attribute or attribute group that can uniquely identify a row in the table. A table can only have one primary key, but there can be multiple candidate indexes. When there are multiple candidate codes, one can be selected as the primary code, and the selected The candidate code is called the primary key.

2: Whether to meet a certain paradigm, or the highest paradigm that can be reached:

(1) Partial function dependencies of non-primary attribute pairing codes do not satisfy 2NF; (generally asked 2NF, candidate codes are attribute groups)
(2) Transfer function dependencies of non-primary attribute pairing codes do not satisfy 3NF;
(3) The part of the main attribute pairing code and the transfer function dependency does not satisfy BCNF;
(4) The multi-value dependency, like X->->Y, X and Y must be in the same relational mode, and only X And Y, there can be no other redundant attributes, if there is, it does not meet 4NF. (Answer when answering the question: Because of the embedded multi-valued dependency x->->y, 4NF is not satisfied)

3: Decompose the relationship model:

(1) There are partial functional dependencies: R(A,B,C,D,E,F) candidate code is a combination of AB, and the functional dependency set F={A->(C,D,E),( A, B)->F}
Decomposition method: Put the functional dependencies together and decompose them together (that is, the decomposition should keep the functional dependencies). Namely: R1 (A, C, D, E) and R2 (A, B, F)
(2) The transfer function is dependent.
(3) There is multi-value dependence.

[Eg: Does not meet 2NF. The candidate code of the relationship pattern is (personnel number, community number), and the functional dependency set is {personnel number -> person name, community number -> property manager name, (personnel number, community number) -> personnel responsibility}, therefore, There is a partial functional dependence of the code (personnel number, community number) of the non-primary attribute personnel's name and the property manager's name. Therefore, 2NF is not satisfied. 】
【Because ID number -> name, there is a partial functional dependence of non-primary attribute names on candidate codes (ID number, room number), so 2NF is not satisfied.】
【Problems that do not meet the standardization: there are insertion exceptions, deletion exceptions, and Modify the exception. 】

Fourth, SQL (tested with triggers and views)

cascade cascade
join ), INTERSECT (cross),
EXCEPT (bad)

After the table is built, add constraints:

alter table XX add constraint 约束名 primary key 主键字段名;
alter table xx add constraint 约束名 foreign key references 表(外键字段名);

Trigger required

CREATE TRIGGER <triggerName>
of <列名> ON <tableName> 

REFERENCING old row as XX,new row as XX

--do something

View required

CREATE VIEW 视图名(字段1,字段2,字段3...)

Five, two-stage lock protocol (together with cursors and stored procedures)

[Back] The two-stage lock protocol means that the data must be locked before any data is read or written; after a blockade is released, the transaction no longer applies for and obtains any other blockades.
[Back] The meaning of the so-called two paragraphs is: the transaction is divided into two stages. The first stage is to obtain a blockade, also known as the expansion stage. The second stage is to release the blockade, also known as the contraction stage.
Following the two-stage lock protocol, the transaction must be serializable; if it does not follow the two-stage lock protocol, it may or may not be serializable.
Using a two-stage lock protocol may also cause deadlock .

The serializability of concurrent scheduling:

① The concurrent execution of multiple transactions is correct. If and only if the result is the same as the result of executing them serially in a certain order, this scheduling strategy is called serializable scheduling.
②Serialization is the criterion for the correctness of concurrent transactions. That is: a given concurrent schedule is considered correct schedule if and only if it is serializable.
How to ensure the serialization of transactions? Answer: Follow the two-stage lock protocol.

Will be tested together with cursors and stored procedures.

Using a two-stage lock protocol may also cause deadlock.

Deadlock measures:
deadlock processing strategies include prevention strategies, avoidance strategies, and detection and removal strategies.

Cursor :

定义游标:EXEC SQL DECLARE 游标名 CURSOR FOR <select 语句>
打开游标:EXEC SQL OPEN 游标名
推进游标:EXEC SQL FETCH 游标名 INTO 变量表

Stored procedure: PROCEDURE (procedure)
trigger: TRIGGER (trigger)
cursor: CURSOR (cursor)

CREATE PROCEDURE 存过名 (IN/OUT 变量名1 变量类型 IN/OUT 变量名2 变量类型)
  DECLARE 游标名  CURSOR FOR <select 语句>//定义游标
  OPEN 游标名//打开游标
  FETCH 游标名  INTO 变量表 //推进游标
  CLOSE 游标名 /关闭游标

Types of data inconsistency:

Lost modification: Both transactions are modified. One of the changes was overwritten.
Non-repeatable read: the value of the two reads is different (the gap data in the two reads is modified by other transactions);
phantom read: the number of data reads is different in the two reads.
Dirty data: ROLLBACK after modification and read again.

SET TRANSACTION ISOLATION LEVEL [Set transaction isolation level]
Four isolation levels of the transaction:

> READ UNCOMMITTED:读未提交。啥也不解决。 
> READ COMMITTED:读已提交。解决了读脏数据问题。 
> REPEATABLE READ:可重复读。解决了读脏数据、不可重复读。
> SERIALIZABLE:可串行化。解决了读脏数据、不可重复读、幻读的问题。

6. Database failure recovery (together with checkpoint mechanism)

The last big questions in the afternoon of the 2020 and 2021 soft exam database engineers have been tested.

Types of database system failures:

1. Transaction failure: A failure
that causes unexpected and abnormal termination of the transaction due to program execution errors. Usually there are the following two types of errors that cause transaction execution failure:
(1) Logic errors. Transaction execution failure caused by illegal input, data not found, overflow, resource limit exceeded, etc.
(2) System error. The system enters a bad state (such as a deadlock), causing the transaction to be unable to continue execution.
2. System failure:
Refers to the impact of hardware failures and software (such as DBMS, OS or application) vulnerabilities, resulting in the loss of information in the memory, affecting the transaction being executed, but not destroying the information stored on the external memory.
3. Media failure:
refers to the failure of the storage media of the database, such as disk damage, instantaneous strong magnetic field interference, etc. This failure directly destroys the database and will affect all transactions that are reading this part of the data

database backup:

Data dump is the process of self-preserving the database on another disk or tape, also known as data backup.
(1) Static dump and dynamic dump. Static dump means that any access or modification operations to the database are not allowed during the dump; dynamic dump is to allow access and modification operations on the database during the dump. Therefore, dump and user transactions can be executed concurrently.
(2) Mass dump and incremental dump. Mass dump refers to dumping all data each time; incremental dump refers to dumping only the data that has been updated since the last dump each time.
(3) Log files. In the process of transaction processing, the DBMS writes each operation of transaction start, transaction end, and insertion, deletion, and modification of the database into the log file.
(4) Database mirroring. In order to avoid disk media failures affecting the availability of the database, many DBMSs provide database mirroring functions for database recovery.

Database recovery:

In order to make the database recoverable after a failure, redundant data must be established. After the failure, these redundant data are used to implement database recovery. Data dump and log files are commonly used.

1. Two operations for fault recovery:

(1) Undo transaction (UNDO): Undo the unfinished transaction to restore the database to the correct state before the transaction was executed.
The process of undoing a transaction : scan the log file backward (scanning from back to forward) to find the update operation of the transaction; perform the reverse operation of the update operation of the transaction, write the value before the update in the log file record to the database, and insert the record Delete from the database, and reinsert the deleted record into the database; continue to scan the log file backwards, find other update operations of the transaction and perform the reverse operation until the transaction start mark.
(2) Redo transaction (REDO): Re-execute the committed transaction.
The process of redoing a transaction : starting from the beginning of the transaction, the log file is scanned forward, and all operations on the database registered by the log file are re-executed until the end of the transaction is identified.

Specific failure recovery strategies:
(1) Recovery of transaction failures:

Transaction failure means that the transaction terminates before it runs to the normal termination point (CUMMIT or ROLLBACK). The log file has only the start identifier of the transaction and no end identifier. Recovery of this type of failure is usually done by undoing (UNDO) the transaction that caused the failure to restore the database to the correct state before the transaction was executed.
Specific methods:
1. Scan the log file in reverse to find the update operation of the transaction.
2. Perform the reverse operation on the update operation of the transaction.
3. Continue to scan the log file in the reverse direction, find other update operations of the transaction, and do the same processing until the beginning of the transaction is marked.
Note: The recovery of transaction failures is done automatically by the system and is transparent to users.

(2) Recovery of system failure:

System failure will cause inconsistent database data: First, the update of the database by the unfinished transaction may have been written to the database; second, the update of the committed transaction may not have time to write to the database in the buffer. Therefore, for system failures, the recovery operation is UNDO+REDO :
1. Undo the transactions that were not completed when the failure occurred (UNDO);
2. Redo the committed transactions (REDO);

(3) Recovery of media failure:

When the media fails, the database is destroyed and the database needs to be reinstalled . Generally , the participation of the DBA is required to load the latest backup before the failure and the copy of the log file before the failure, and then perform the undo (UNDO) and redo according to the recovery process of the system failure ( REDO) to recover.