Read and write separation and sub-database sub-table, distributed transaction

Read and write separation and sub-database sub-table, distributed transaction

  • MySql storage engine, table building specification, transaction level, sql optimization, read-write separation ideas, etc.
  • Have you understood the separation of reads and writes? You said that you read from the library when you read it. Now suppose that there is a table User who has done read and write separation, and then a thread first writes to the User table within a transaction range, and then reads it again. At this time The data has not been synchronized to the slave library. How to ensure that the latest data can be read when reading?
  • How do you ensure the stability of the system? Answer: Distributed links are generally very long, so we first analyze the entire link through a full-link stress test to find out which node has the bottleneck. If it is a bottleneck in the data layer, you can consider adding caching, separation of reads and writes, etc. to reduce the pressure on the database. If the short-term traffic is very large and a database can be filled in one day, then consider expanding the database. If there is no problem in the data layer and the bottleneck is in the application layer, then you need to analyze whether there is a problem with the application code, whether the jvm is tunable, whether the thread pool is tunable, whether the rpc timeout is set correctly, if the application code is okay, you can add Docker, for horizontal expansion. If the problem is not in the own link but in the service dependency, then the performance optimization of the other party should be promoted, and it is best to downgrade the plan. If the system has been optimized and cannot be optimized and still cannot withstand the flow, then only the current limit processing can be done. Then introduced some industry solutions such as traffic isolation.
  • How is the process of inserting and deleting a piece of data in the database executed at the bottom? The project is configured with read-write separation, and the implementation method and the underlying logic will be discussed in depth;
  • How is redis deployed? Master-slave deployment? Have you learned about redis cluster deployment? I said that I have not learned about redis cluster deployment, but I have learned about mysql cluster deployment, such as read-write separation deployment, master-slave replication, sub-database sub-table and other related solutions.
  • How to do when the database reads and writes are separated, and what kind of framework is there;
  • What should I do if the amount of MySQL data is too large? How to divide database and table into binlog, separate read and write, master-slave replication, do you understand the lock in MySQL?
  • How does the sub-database and sub-table aggregate query limit realize the non-stop expansion of top? Sub-table to avoid hot and cold? Expand the library without stopping the machine? Expand tables without stopping? Cross-database transactions? Why is the sub-database and sub-meter so designed? How to do data growth? How to expand? What to do if the data is uneven? How to separate hot and cold data? How to do aggregation? How to do cross-database aggregation and how to do query? How to do cross-database paging? Group mode on mysql? One master, multiple slaves? why is it like this? How to ensure strong consistency? In order to solve the separation of read and write? Is it for one master and multiple backups? What should I do if the main library crashes? From the library? How to do distributed transactions? What principle? How did it happen? Have there been any transaction inconsistencies? why? How to solve it? What to do with the surge in requests for access? How to relieve stress?
  • How to achieve MySQL, transaction isolation level, when to read dirty, when to read committed distributed transactions (often asked) 1, two-phase commit (2PC) The first stage: the transaction coordinator requires each database involved in the transaction Precommit this operation and reflect whether it can be submitted. The second stage: The transaction coordinator requires each database to submit data. Advantages: Try to ensure strong data consistency, suitable for key areas that require high data consistency. (In fact, it cannot be 100% guaranteed to be strong consistency) Disadvantages: The implementation is complicated, and availability is sacrificed, which has a greater impact on performance. It is not suitable for high-concurrency and high-performance scenarios. 2. Compensation transaction (TCC) For each operation, you must register a corresponding one Confirmation and compensation (withdrawal). Try, Confirm, Cancel Advantages: Compared with 2PC, the implementation and process are relatively simple, but the data consistency is also worse than 2PC. Disadvantages: The disadvantages are quite obvious, and it may fail in steps 2 and 3. TCC is a compensation method at the application layer, so programmers need to write a lot of compensation code when implementing it. In some scenarios, some business processes may not be well defined and processed with TCC. 3. Local message table (asynchronous guarantee) The core idea is to split distributed transactions into local transactions for processing. The message producer needs to build an additional message table and record the message sending status. The message table and business data must be submitted in one transaction, which means that they must be in a database. Then the message will be sent to the consumer of the message via MQ. If the message fails to be sent, it will be retried. Advantages: A very classic implementation that avoids distributed transactions and achieves final consistency. Disadvantages: The message table will be coupled to the business system. If there is no encapsulated solution, there will be a lot of chores to deal with. 4. MQ transaction message RocketMQ support, RabbitMQ and Kafka do not support, send a message and a confirmation message once, the producer needs to implement a check interface (confirm message or rollback) Advantages: Realize final consistency, no need to rely on local database Affairs. Disadvantages: Difficult to implement, mainstream MQ does not support, no. NET client, RocketMQ transaction message part of the code is not open source. 5. The long-running transaction of the Sagas transaction model. The core idea of ​​this model is to split the long transaction in the distributed system into multiple short transactions, or multiple local transactions, and then the Sagas workflow engine is responsible for coordination. If the process ends normally, even if the business is successfully completed, if the implementation fails during this process, the Sagas workflow engine will call the compensation operations in the reverse order and perform the business rollback again.
  • About distributed transactions and distributed transactions ------ about sub-database and sub-table (why sub-database and sub-table, which sub-database and sub-table middleware have been used)-the method of sub-database and table ----------Combined with the project, how to design the sub-library for dynamic expansion and shrinkage in vertical and horizontal splitting---How to generate the global ID of sub-library and sub-table
  • What dimensions are the sub-databases and sub-tables divided by? What is the partitioning algorithm, and will there be uneven data distribution. What is the granularity of locks supported by myisam and innodb?
  • 5. How to subscribe to Binlog of sub-database and sub-table? 6. If there is a primary key conflict in the data source of the sub-database and sub-table, how to solve it? 7. How to ensure the downstream consumption order of Binlog? 8. How to ensure transaction atomicity during consumption downstream?
  • If you use id to turn pages, how to design the database table? How to design the index? 5. If the volume is large, do you think you need to sub-database and sub-table? How to divide? 6. How to query and page after sub-database and sub-table? 7. Divide How to ensure that the primary key is still incremented after the database is divided into tables? 8. Now it is necessary to support deep paging, and the page number jumps directly, how to achieve it? 9. The instantaneous write volume is very large and the storage may be suspended, how to protect it?
  • Problems caused by sub-database sub-table.
  • How to do sub-database sub-table (requires plan)
  • How does mysql sub-database and table, have you encountered a cross-database query problem with a particularly large amount of data in a sub-database, how to solve the problem of slow mysql query, how to use explain, where to focus on sub-database and table, line How large is the amount of data on the database? How to design the timing task of the database connection pool, and will the amount of data be particularly large?
  • The difference between oracle and mysql, how to locate the slow query isolation level? How to optimize? What kind of case have you encountered and how did you solve it? The project uses sub-databases and sub-tables, will the sub-databases definitely improve performance? What is hot and cold data? What has been optimized? What if there is a sudden increase in data? Is there any way to expand the capacity? How to expand without perception? How to achieve real-time data consistency? The design of sub-database and sub-table? Have there been inconsistencies in distributed transactions? why? How to deal with it? Is there any way to avoid it? How to monitor? How to deal with the monitoring? When is manual access required? How does the sub-database and sub-table aggregate query limit realize the non-stop expansion of top? Sub-table to avoid hot and cold? Expand the library without stopping the machine? Expand tables without stopping? Cross-database transactions? Why is the sub-database and sub-meter so designed? How to do data growth? How to expand? What to do if the data is uneven? How to separate hot and cold data? How to do aggregation? How to do cross-database aggregation and how to do query? How to do cross-database paging? Group mode on mysql? One master, multiple slaves? why is it like this? How to ensure strong consistency? In order to solve the separation of read and write? Is it for one master and multiple backups? What should I do if the main library crashes? From the library? How to do distributed transactions? What principle? How did it happen? Have there been any transaction inconsistencies? why? How to solve it? What to do with the surge in requests for access? How to relieve stress?
  • Can mysql unique index be added to null (no) Innodb feature mysql return table mysql sub-database sub-table standard?
  • How to sub-database and sub-table, page query, query non-split field scheme; MySql index structure, why use B+ tree (compare Hash, B+ tree, B tree, AVL, red-black tree);
  • How to do sub-library and sub-table? Based on what dimension?
  • List the database sub-database sub-table strategies you can think of; after sub-database sub-table, how to solve the problem of full table query
  • Talk about the understanding of transactions (what is a transaction? The characteristics of a transaction?) What is the difference between the isolation level of InnoDB and MyISAM? How to optimize slow SQL? One hundred million tables, many complex query conditions, check the ten thousandth page, how to optimize? Sub-database sub-table query process? Two-stage, three-stage, TCC, seata
  • Design a system with 10 billion pieces of data every day, which needs to be displayed and searched in real time in the background. The general idea I answered at the time was nginx load balancing, message queue storage, multi-threaded reading, batch insertion, database sub-database sub-table. The interviewer derives a lot of questions based on my answers. What if the message queue is full? (That is, consumption can't keep up with production) What is the impact of a failure in batch insertion? How to deal with it? How should the sub-database sub-table be divided? How to solve the problem of data migration?
  • What is the realization principle of sub-database and sub-table? How does your business generally sub-database and sub-table? What is the corresponding logic?
  • The principle of mysql sub-database sub-table, why so many databases and so many tables, based on what considerations? Database 3. How to realize dynamic expansion?
  • Ask sub-database sub-table optimization
  • • What is the difference between optimistic locking and pessimistic locking? • How are these two types of locks implemented in Java and MySQL? What database is used? • What storage engine is used, and why use InnnoDB? •Is the order form split? How to split it? •Description of the query process after horizontal splitting•What if the data falling into a certain shard is very large? • Is there any problem with hash modulus? •How to solve the reading and writing pressure after sub-database and table? • How to ensure that the primary key is unique after splitting? • Is the ID generated by Snowflake globally incremented and unique? • How to achieve a globally incremented unique ID? • Mysql index structure, what is the difference between the primary key index and the ordinary index? • Where is the current bottleneck of your system? • How do you plan to optimize? Briefly talk about your optimization ideas. Do you have any questions?
  • 1. Database 1. What are the principles of using mysq1 index? What data structure is the index? What is the difference between B+tree and Btree? 2. What storage engines does mysq have? What are the differences? 3. How to design a high-concurrency system database level Design? What are the types of database locks? How to implement it? 4. What are the database transactions? 2. Sub-database sub-table 1. How to design a sub-database sub-table scheme that can dynamically expand and shrink the capacity? 2. Which sub-database sub-tables have been used Middleware, what are the advantages and disadvantages? 3. Tell me about the underlying implementation principle of the middleware for sub-database and table-based middleware? 4. I currently have a system that does not have database-based and table-based systems. How about sub-database and table-based systems in the future? Design, 5. Let the system with undivided databases and tables be dynamically switched to the system with tables and tables? 6. Do you know about distributed transactions? How do you solve it? TCC? Then if there is a network reason, what should I do if the network is not connected? 7. Why sub-database and sub-table? 8. What are the algorithms for distributed addressing? Do you know the consistent hash? 9. Write the java implementation code by hand? If you use the userId to get the slice, then I have to check the continuous segment What about the data in time? 10. How to solve the primary key problem of sub-database and sub-table? Is there any implementation plan?
  • 1. Distributed transaction 2. Difference between primary key index and unique index 3. Difference between hash index and B+ tree index and usage scenarios 4. Single column index and composite index usage scenarios 5. How to troubleshoot application memory overflow 6. How to check MYSQL execution plan, and Which fields should be paid attention to 7. When sub-database and sub-table, how to realize multi-table query 800, how to store billion-level data
  • How to do sub-database sub-table? How does the sub-database and sub-table have no duplication of data among different database tables.
  • How to select the sub-database and sub-tables. In the case of sub-database sub-tables, how to sort in general during query?
  • Can you give me a business example to talk about sub-database and sub-table? ----------This is for bottlenecks in the storage capacity, number of connections, and processing capacity of a single machine caused by excessive concurrency. Vertical segmentation is also divided into two measures: sub-database and sub-table. Vertical sub-database is stored in different databases according to the different data with low degree of business coupling, such as customer information database, product information database... In different libraries. The vertical sub-table is a way of splitting based on the original data table with too many fields. For example, the customer table has attributes such as personal identity attributes, address contacts, etc.... Horizontal segmentation is divided into internal sub-table and sub-database sub-table, and the data of the same table is distributed to multiple tables according to different conditions, such as ID parity sub-table. The sub-table in the library only solves the problem of too large data in a single table, and does not distribute the table to different machines, so in order to avoid competition for the same machine's CPU, memory, network, etc., it can be distributed to different libraries. What are the problems caused by the sub-database and sub-table? ---------Transaction consistency issues; cross-machine node association issues; cross-node paging and sorting issues; global primary key avoidance issues; data migration and expansion issues
  • How should the sub-database sub-table be divided? How to solve the problem of data migration?
  • About distributed transactions and distributed transactions ------ about sub-database and sub-table (why sub-database and sub-table, which sub-database and sub-table middleware have been used)-the method of sub-database and table ----------Combined with the project, how to design the sub-library for dynamic expansion and shrinkage in vertical and horizontal splitting---How to generate the global ID of sub-library and sub-table
  • Does your database use sub-database and sub-table? How do you do it? How is the global id generated after sub-database sub-table?
  • How does your company deal with sub-databases and tables to avoid hot spots? ----------This involves solving the bottleneck problem of the database, so it is necessary to combine the project to split the data vertically and horizontally. Horizontal sub-database: (you can draw a sketch to illustrate) When a user requests data through userId, analyze the userId to find out which database to operate (for example: A database is an even userId, B database is an odd userId. Not only Only through userId, there are other ways of sub-library, the structure of each library is the same, but the data is different). Level sub-table: The method is the same as the sub-library. Vertical sub-database: According to different business, it is divided into multiple databases (user database and product database...each one has its own). The vertical sub-table is based on uid as the core, and the fields are divided (for example, Table 1 stores personally identifiable information. Age..., Table 2 stores personal social information contact address...) In addition to sub-database and sub-table, what else do you know about MySQL optimization?
  • The principle of mysql sub-database sub-table-why so many databases and so many tables-based on what considerations? -How to implement dynamic database expansion?
  • Do you understand distributed transactions? What are the solutions?
  • mysql phantom reads and gap lock fragmentation to achieve transactions, mysql natively implements distributed transactions
  • What are the common distributed transaction solutions? (1) Two-stage submission scheme (2) eBay event queue scheme (3) TCC compensation mode (4) Final consistency of cached data
  • How to ensure the consistency of distributed transactions in your project
  • Why choose local messaging method for distributed transactions? What is TCC and its working process? What is the difference between TCC and XA? If you were to optimize XA, how would you optimize?
  • Do you understand distributed transactions? What distributed transactions are used in your project? What are the advantages and disadvantages?
  • Principles of distributed transactions, how to use distributed transactions
  • The spike system will involve the update of multiple database tables. How to solve distributed transactions? What I said is the final consistency of the message, asynchronous? Is there a better solution? Synchronous TCC mode, the principle of TCC mode? (Concrete realization of three stages)
  • From the spike system, several implementations of distributed transactions are also derived, such as two-stage, three-stage, compensation (TCC), and message queue implementation based on reliable message services. The focus is on the implementation and differences of these types, and it is required to draw the architecture diagram of the message queue based on the reliable message service to realize the distributed transaction, and how to ensure the reliability and consistency of the message by the upstream service and the downstream service.
  • In fact, in the final analysis, it is a data consistency solution for distributed transactions. How to roll back data if it fails
  • How to guarantee distributed transactions?
  • Principles of distributed transactions, how to use distributed transactions
  • And distributed transaction theory and solutions
  • MySQL index principle, joint index, index considerations, slow query troubleshooting Snow algorithm principle MySQL IN principle, how to optimize the sub-database and sub-table how to operate several forms of distributed transactions
  • Understanding of distributed transactions
  • 1. Database 1. What are the principles of using mysq1 index? What data structure is the index? What is the difference between B+tree and Btree? 2. What storage engines does mysq have? What are the differences? 3. How to design a high-concurrency system database level Design? What are the types of database locks? How to implement it? 4. What are the database transactions? 2. Sub-database sub-table 1. How to design a sub-database sub-table scheme that can dynamically expand and shrink the capacity? 2. Which sub-database sub-tables have been used Middleware, what are the advantages and disadvantages? 3. Tell me about the underlying implementation principle of the middleware for sub-database and table-based middleware? 4. I currently have a system that does not have database-based and table-based systems. How about sub-database and table-based systems in the future? Design, 5. Let the system with undivided databases and tables be dynamically switched to the system with tables and tables? 6. Do you know about distributed transactions? How do you solve it? TCC? Then if there is a network reason, what should I do if the network is not connected? 7. Why sub-database and sub-table? 8. What are the algorithms for distributed addressing? Do you know the consistent hash? 9. Write the java implementation code by hand? If you use the userId to get the slice, then I have to check the continuous segment What about the data in time? 10. How to solve the primary key problem of sub-database and sub-table? Is there any implementation plan?
  • 6. Solutions for distributed transactions? One, two-phase commit (2PC) Two-phase commit (Two-pha***mit, 2PC), through the introduction of a coordinator (Coordinator) to coordinate the behavior of the participants, and finally decide whether these participants really execute the transaction. Preparation stage: The coordinator asks the participant whether the transaction is executed successfully, and the participant sends back the result of the transaction. Commit phase: If the transaction is executed successfully on each participant, the transaction coordinator sends a notification to the participant to commit the transaction; otherwise, the coordinator sends a notification to the participant to roll back the transaction. 2. Compensation Transaction (TCC) TCC is actually the compensation mechanism adopted. Its core idea is: For each operation, a corresponding confirmation and compensation (revocation) operation must be registered. It is divided into three stages: The Try stage is mainly to check the business system and reserve resources. The Confirm stage is mainly to confirm and submit the business system. When the Try stage is executed successfully and the Confirm stage starts, the default Confirm stage will not make mistakes. . That is: As long as the Try is successful, Confirm must be successful. The Cancel phase is mainly to cancel the business executed in the state of business execution error and need to be rolled back, and the reserved resources are released. Third, the local message table (asynchronous guarantee) The local message table and the business data table are in the same database, so that local transactions can be used to ensure that the operations on these two tables meet the transaction characteristics, and the message queue is used to ensure the final consistency. After the side of the distributed transaction operation completes the operation of writing business data, a message is sent to the local message table, and the local transaction can guarantee that this message will be written into the local message table. After that, the messages in the local message table are forwarded to message queues such as Kafka. If the forwarding is successful, the message is deleted from the local message table, otherwise it continues to be forwarded again. The other side of the distributed transaction operation reads a message from the message queue and executes the operation in the message. 4. The prepared message in the first stage of MQ transaction message will get the address of the message. The second stage executes local affairs, and the third stage uses the address obtained in the first stage to access the message and modify the state. The answer is just some personal opinions, if there is something wrong, please point it out.
  • Various solutions for distributed transactions and your best solution
  • Distributed transactions (often asked) 1. Two-phase commit (2PC) The first phase: The transaction coordinator requires each database involved in the transaction to precommit this operation and reflect whether it can be committed. The second phase: The transaction coordinator requires each database to submit data. Advantages: Try to ensure strong data consistency, suitable for key areas that require high data consistency. (In fact, strong consistency cannot be guaranteed 100%) Disadvantages: The implementation is complicated, and availability is sacrificed, which has a greater impact on performance. It is not suitable for high-concurrency and high-performance scenarios. If a distributed system is called across interfaces, there is currently no implementation solution in the .NET world. 2. Compensation transaction (TCC) For each operation, a corresponding confirmation and compensation (revocation) must be registered. Try, Confirm, Cancel Advantages: Compared with 2PC, the implementation and process are relatively simple, but the data consistency is also worse than 2PC. Disadvantages: The disadvantages are quite obvious, and it may fail in steps 2 and 3. TCC is a compensation method at the application layer, so programmers need to write a lot of compensation code when implementing it. In some scenarios, some business processes may not be well defined and processed with TCC. 3. Local message table (asynchronous guarantee) The core idea is to split distributed transactions into local transactions for processing. The message producer needs to build an additional message table and record the message sending status. The message table and business data must be submitted in one transaction, which means that they must be in a database. Then the message will be sent to the consumer of the message via MQ. If the message fails to be sent, it will be retried. Advantages: A very classic implementation that avoids distributed transactions and achieves final consistency.
Welcome to search and pay attention to my official account [Wei Kan Technology], and summarized classified interview questions https://github.com/zhendiao/JavaInterview
file