[Daily clock in] Freshmen and job-hopping must-see JAVA interview questions-Database (1)

I. Introduction
  • The autumn recruitment is approaching, I have specially collected the latest interview questions, insist on studying and punching in every day, and the offer will come naturally!
  • The questions will continue to be updated and the quality is guaranteed. Looking forward to your participation, check in together and make progress together!

Insert picture description here


Two: interview questions


One: Mysql storage engine classification


InnoDB: Supports transactions, row locks and lock-free reads to improve the efficiency of concurrency. For data integrity, supports foreign keys


MyISAM: does not support transactions and foreign keys, table-level locks, the advantage is that the access speed is fast, generally used for read-only Or read-based data scenarios.


Memory: Store all data in the memory, which is used for fast query of non-critical data. The HASH index is used by default, but the data will disappear if the service is closed.


CSV: Its table is a comma-separated text file, which can be imported and exported in CSV format. It can interact with scripts and applications in the same format. All columns must not be null. Indexes are not supported. Data files can be edited directly. Save text file content

NDB: also known as NDBCLUSTER-this cluster data engine is suitable for applications that require the highest degree of normal operation and availability, but MySql5.6 does not support


setting the storage engine mode: create table table name()engine=innodb/myisam


Two: the difference between InnoDB and MyISAM


(1) Features of InnoDB:


  1. Support transaction processing and ACID transaction characteristics;


2. Implement four isolation levels of the SQL standard;


3. Support row-level locks and foreign key constraints;


4. Use transaction logs for data recovery.


5. The lock level is row lock. The advantage of row lock is that it is suitable for frequent table modification with high concurrency. High concurrency is better than MyISAM in performance. The disadvantage is that the system consumes a lot.


6. Index not only caches itself, but also caches data, which requires more memory than MyISAM.


(2) The characteristics of MyISAM:


  1. The lock level is table lock. The advantage of table lock is low overhead and fast locking; the disadvantage is that the lock granularity is large, the probability of lock impulse is higher, and the concurrency capacity is low. This engine is suitable for querying. Business.


2. This engine does not support transactions or foreign keys.


3. INSERT and UPDATE operations need to lock the entire table;


3. It stores the number of rows in the table, so when SELECT COUNT(*) FROM TABLE only needs to directly read the saved values ​​without performing a full table scan.


(3) Applicable scenarios


  MyISAM is suitable for: (1) Do a lot of count calculations; (2) Insert infrequently and query very frequently; (3) No transactions.


InnoDB is suitable for: (1) Reliability requirements are relatively high, or transactions are required; (2) Table updates and queries are quite frequent, and the chance of table locking is relatively large.


Three: How to choose Mysql storage engine


According to the business requirements of the system, you must first understand the characteristics of the index


InnoDB: If the data integrity requirements are relatively high, and in addition to inserting and querying, there are many update and delete operations. It is suitable for choosing InnoDB. InnoDB is also Mysql. The default storage engine is now.


MyISAM: Mainly read-only or insert operations, rarely update and delete operations, and do not require high data integrity can be selected.


Four: the execution order of database statements


(1): Execution order


  from -> on -> join -> where -> group by -> having -> count (aggregate function) -> select -> distinct -> order by -> limit


(2): Execution step explanation :


  (1), from: indicates the source of the data


(2), on: indicates the associated table of the data, a temporary table t1 is generated after execution, and is provided for the next operation to use


(3), join: join the data of the table Supplement to the temporary table t1 where on is executed, such as: left join will add the remaining data of the coordinates to the temporary table t1, if there are more than 3 joins, repeat the steps between on...join.


(4), where: According to the conditions carried, filter out the eligible data from the temporary table, and generate the temporary table t2.


(5) groub by: According to the conditions carried, the temporary table t2 is grouped accordingly, and the temporary table t3 is formed. If the statement contains group by, the fields after it must appear in the select or appear in the aggregate function , Otherwise it will report SQL syntax error.


(6) Having: Filter the data of the temporary table t3 after grouping, and get the temporary table t4.


(7) Aggregate functions such as count: Perform aggregate function operations of specified fields on the temporary table to form a temporary table t5.


(8), select: Filter out the data that needs to be returned from the temporary table to form a temporary table t6.


(9), distinct: the temporary table t6 is subjected to the specified de-duplication filter to form the temporary table t7.


(10), order by: sort the temporary table t7 to form a temporary table t8.


(11), limit: filter the number of data returned


If you want to know more about the execution process, you can check the previous article dedicated to analyzing the execution process: Do you really know how to use Group by?


Five: What is the difference between Mysql and PostGreSQL?


Answer ideas:

The interviewer asked this question. The reason may be that you have described the use of two different data in your resume, mainly examining two aspects.


One is to examine whether you are good at thinking at work . Generally, the selection of the database is the choice of the company's architect or team leader. You may be just a team member and only need to be responsible for using it, but if you can actively think Why you choose to use this database instead of other databases? To understand some of the differences between the two, this will add impression points to the interviewer and prove that you are good at thinking in your usual work .


The second aspect of the investigation is to see if you can explain some of the pros and cons of using the current database in combination with the project or the company's current business . This is also a plus point. After all, the selection of technology must finally consider the support of the business. Therefore, answering this question mainly from these two aspects will have very good results.


The first aspect:


  1. The text type in Mysql has different restrictions (both: small text middle text...), but Pg does not have such restrictions.


2. MySQL needs utf8mb4 to display the emoji pit, Pg does not have this pit.


3. MySQL does not support the OVER clause, but Pg does. The OVER clause can simply solve the problem of "taking top 5 for each group".


4. pg supports more data types such as jsonb array, etc., better support for geographic information processing expansion, and more data sources.


5. Under high concurrent reads and writes and the load is approaching the limit, PG's performance indicators can still maintain a hyperbolic or even logarithmic curve, and no longer decline after the peak, but MySQL obviously appears to decline after a peak.


Second aspect:


  can be combined with projects Some business scenarios to answer reflect the advantages of using this database. If you use PostgreSQL, the answer is as follows.


Because the technical selection of this project was selected by our company's architects, but I also learned some of the benefits of choosing a PG database through the project and the company's business. Our company's main project is a related system of public security, and the system involves For the processing of a lot of geographic information data, the PG database has good support for the storage and expansion of geographic information. This is also one of the reasons for choosing the PG database in our project, and so on.


Six: transaction isolation level and existing problems


(1): Read Uncommited


  1. Definition: You can read the contents of other uncommitted transactions.


2. Problems in concurrency: dirty reads, non-repeatable reads, phantom reads


(two): Read Committed


  1. Definition: The contents of other submitted transactions can be read.


2. Problems in concurrency: non-repeatable read, phantom read


(3): Repeatbale Read (repeatable read)


  1. Definition: The same transaction can be read repeatedly, and the data is the same.


2. Problems in concurrency: phantom reading (using the multi-version concurrency control (MVCC) mechanism to solve the phantom reading problem.)


(4): serialized


  1. Readable, not writable. Like a lock in java, writing data must wait for the end of another transaction.


2. There is no problem


Seven: Problems and solutions in the case of concurrent transactions


(1): Problems:


1. Loss of update: In concurrent transactions, multiple transactions may update the same record at the same time, causing the update of the previous transaction to be overwritten by the update of the subsequent transaction.


2. Dirty read: One transaction reads data


that is not committed by another transaction. 3. Non-repeatable read: In the same transaction, the data read before and after the same conditions are different (in the case of concurrent, another transaction pair The data has been modified)


4. Phantom read: Under the same transaction, the data read before and after is different (in the case of concurrency, another transaction deletes or adds to the data)


(2): Solution:


  1. Update lost update problem can be solved by the application layer, such as locking.


2. Dirty reads, non-repeatable reads, and phantom reads are processed through the isolation mechanism provided by the database. The method to achieve the isolation mechanism is as follows: Add read-write locks, and consistent snapshot reads are MVCC.


Eight: Understanding of the database paradigm


1. The first paradigm: each column cannot be split.


2. The second paradigm: on the basis of the first paradigm, non-primary key columns are completely dependent on the primary key, and cannot depend on a part of the primary key.


Example: For
  example, in the relational model (employee number, name, job title, project number, project name), employee number -> (dependent) name, employee number -> job title, and project number -> project name (project name depends on project number , But the item number is not the primary key in this relational model). Obviously, the dependency relationship does not satisfy the second paradigm. The commonly used solution is to split the table, such as the employee information table and the project information table.


3. The third paradigm : On the basis of the second paradigm, non-primary key columns only rely on the primary key, and do not rely on other non-primary keys (there is no transitive dependency). For


example
  : Student table (student number, name, age, gender, The above relationship exists in a table structure such as the institution where the institution is located, the address of the institution, and the institution's telephone. Student ID -> Home University -> (School Address, School Phone). We should disassemble it as follows:
(student ID, name, age, gender, home institution)-(home institution, institution address, institution telephone)


Summary:

The first paradigm: atomicity. The


second paradigm: primary key columns and non-primary key columns follow a complete functional dependency. The


third normal form: there is no transfer function dependency between non-primary key columns.


Nine: Types of constraints in the database


1. NOT NULL non-empty constraint


2. UNIQUE: space content cannot be repeated, a table can have multiple


3. PRIMARY KEY: a table can only exist one, and cannot be repeated, cannot be empty


4. FOREIGN KEY: used for related tables Linked fields to prevent illegal data from being inserted into foreign key columns


5. CHECK: used to control the value range of the field



1. Cross query (Cartesian product cross join)


2. Inner join


3. Outer join (left join/right join)


4. Union query (union/union all)


5. Full join (full join)-MYSQL not support


Eleven: the difference between exist and in in mysql


(1): Meaning


  The in statement in mysql is a hash connection between the outer table and the inner table, while the exists statement is a loop loop for the outer table, and the inner table is queried each time the loop loops.


(2): Features


  1. If the two tables to be queried are of the same size, there is little difference between in and exists.


2. If one of the two tables is smaller and the other is a large table, use exists for the larger subquery table and in for the smaller subquery table.


3. Not in and not exists: If not in is used in the query statement, then both the internal and external tables are scanned across the table, and the index is not used; the not extsts subquery can still use the index on the table. So no matter the size of the table, using not exists is faster than not in.


Twelve: why the default length of varchar is 255


1. MySQL requires that the defined length of a varchar row cannot exceed 65535 bytes . This size includes the space occupied by the field, except for large fields such as text and blob ( Note: The maximum limit for a single row refers to the set length of all fields in a table The sum must not exceed 65535 bytes )


2. The maximum prefix length of the table index of the InnoDB storage engine is: 767 bytes. If you need to create an index, it cannot exceed this length. In utf-8 encoding: 255 * 3 (a character occupies 3 bytes) = 765 bytes, which is the maximum value that can be indexed . The length of the single-column index is limited (the default in 5.6 cannot exceed 767bytes, and 5.7 does not exceed 3072bytes)


3. Summary : varchar(255) is not the optimal character length, the optimal needs to be based on the specific situation, but this length can ensure that you can A good default value with fewer errors.


Three: summary


Finally, thank you all for reading. The must-see interview question series will continue to be updated. If you find the article helpful, don't forget the one-click triple connection . Your support is my motivation to create more high-quality articles . Thank you very much .


Insert picture description here