C++ Programmer's Book

Article Directory

1. C++

1.1. Talk about the role of the static keyword

  1. Global static variables: static storage area, the entire program is running, uninitialized static global variables are automatically initialized to 0, and the scope is defined to the end of the entire file
  2. Local static variable: static storage area, same as above, when the function or statement block ends, the scope ends, but it is not destroyed, and resides in memory. Waiting for the function to be called here, the value does not change
  3. Static functions: function declarations and definitions are extern by default, visible in the declaration file, invisible in other locations, to prevent conflicts with the same name in other files
  4. Static members of the class: data sharing of multiple objects, shared members of all objects
  5. Class static function: same as static member, public use

1.2. Talk about the difference between C++ and C

Idea: Object-oriented, process-oriented structured programming language
Grammar: Overload inherits three characteristics of polymorphism, safer: mandatory type conversion
Support normal form, template class, function template, etc.


1.3. Pointers and references

  1. The pointer has its own space, and the reference is just an alias;
  2. Use sizeof to see that the size of a pointer is 4, and the reference is the size of the referenced object;
  3. The pointer can be initialized to NULL, and the reference must be initialized and must be a reference to an existing object;
  4. When passed as a parameter, the pointer needs to be dereferenced to operate on the object, and direct modification of the reference will change the object pointed to by the reference;
  5. There can be const pointers, but no const references;
  6. The pointer can point to other objects in use, but the reference can only be a reference to an object and cannot be changed;
  7. Pointers can have multiple levels of pointers (**p), ​​while references have only one level;
  8. Pointers and references have different meanings using the ++ operator;
  9. If you return dynamic memory allocated objects or memory, pointers must be used, and references may cause memory leaks.

1.4. Smart pointers in C++? Why use smart pointers

There are four smart pointers in C++: auto_ptr, shared_ptr, weak_ptr, and unique_ptr. The last three are supported by C++11, and the first one has been deprecated by 11.

  1. The function is to manage a pointer; the application space forgot to release at the end of the function, causing memory leaks, and using smart pointers, once the smart pointer exceeds the scope of the class, the class will automatically call the destructor to release resources, so the principle of the smart pointer After the function ends, the memory space is automatically released;
  2. auto_ptr p1 (new string ("I reigned lonely as a cloud.”));
    auto_ptr p2;
    p2 = p1; //auto_ptr will not report an error. At this time, p2 looted the ownership of p1. When using p1, the memory crashes
  3. unique_ptr p3 (new string ("auto"));
    unique_ptr p4;
    p4 = p3;// error, illegal, avoid memory crash
  4. shared_ptr shared ownership, multiple smart pointers can point to the same object, the object and its related resources will be released after the last reference is destroyed
  5. weak_ptr is a smart pointer that does not control the life cycle of the object. It points to an object managed by shared_ptr as a management pointer; in order to solve the memory leak caused by circular references, the constructor will not change the reference count and will not manage the object memory. It is like an ordinary pointer, but it will detect whether the managed object is released, so as to avoid memory leaks;
    **

1.5. Rewriting and overloading

  1. Rewrite: the child class calls the virtual function of the parent class,
  2. Overload: The function name is the same, but the parameter list is different, the return type is not required, in the same scope

1.6. Polymorphism

The realization of polymorphism is mainly divided into static polymorphism and dynamic polymorphism. Static polymorphism is mainly overloaded, which is determined at compile time; dynamic
polymorphism is realized by virtual function mechanism and is dynamically bound during operation.
Such as animal music contest, crows and dogs and cats sign up, but these three objects all point to the animal class (this is a base class), and use animal pointers to call crows, dogs, and cats, which is polymorphism

1.7. Why the destructor is a virtual function:

The parent class is set as a virtual function to ensure that when the new subclass is used, the parent class pointer is used to point to the subclass object. When the parent class pointer is released, the subclass space is automatically released to prevent memory leaks.

1.8. Implementation of map and set

  1. C++ associative container, red-black tree, map is KV pair, K index, V data, K in set is set;
  2. The map modifies V but does not change K, because the bottom layer of the red-black tree is arranged according to K to ensure order. If K can be modified, you first need to delete K to adjust the tree balance. Insert the modified K to adjust the balance, which will destroy the map and set. Structure;
  3. map supports subscript query, there is no default value, so use it with caution, it is recommended to find

1.9. What is the difference between pointers and arrays?

Save the data address; save the data
indirect access to the data, get the pointer content, as the address, get the data in the address; directly access the data
dynamic data structure; fixed number, the same data type
malloc allocates memory and free release; implicit allocation and deletion
point to anonymous Data, operation anonymous function; itself as the data name


1.10. The difference between defining strings

const char * arr = "123"; char * brr = "123"; const char crr[] = "123"; char drr[] = "123"; Difference between
const constant area
* brr address storage

1.11. Type conversion? cast

  1. reinterpret_cast: Conversion between pointers of any type, without any guarantee for the result of the conversion
  2. dynamic_cast: can only be used for coercive type conversion of parent-child relationships with virtual functions
  3. const_cast: Delete the const attribute of the variable to facilitate re-assignment
  4. static_cast: complete basic data types; conversion of types in the same inheritance system; conversion between arbitrary types and void*.
int i = 10;
double d2 = static_cast<double>(i); //相当于创建一个static_cast<double>类型的匿名对象赋值给d2
int* p2 = reinterpret_cast<int*>(i); // 任意类型转换
int *p = const_cast<int*>(&i);

1.12. What is the difference between new/delete and malloc/free

new/delete is a keyword of C++, and malloc/free is a library function of C language. The latter must specify the size of the requested memory space. For objects of class type, the latter will not call the constructor and destructor.


1.13. Allocator memory allocation and release?

  1. STL allocator encapsulation and the low-level details of STL container memory management;
  2. new (call operate new to configure memory, call object constructor to construct object content) delete (call destructor, release memory);
  3. The allocator distinguishes the two phases of operation. The memory configuration is responsible for alloc::allocate(), and the release of alloc::deallocate(); The object construction is responsible for construct, and the memory destruction is responsible for destroy;
  4. In order to improve the efficiency of memory management and reduce the problem of memory fragmentation when applying for small memory, STL adopts a two-level configurator. When the allocated size exceeds 128B, the first-level space configurator is used (malloc, realloc, free for memory management and memory space allocation and release) , Greater than 128B, second level (memory pool technology, manage memory through free linked list)

1.14. The principle of malloc

The malloc function is used to dynamically allocate memory; in order to reduce memory fragmentation and system call overhead, malloc uses a memory pool method. First, apply a large block of memory as a heap, and then divide the heap into multiple memory blocks, using the block as the basic unit of memory management; When the user applies for memory, a suitable free block is allocated directly from the heap area; malloc uses an implicit linked list structure to divide the heap into continuous blocks of different sizes, including allocated and unallocated blocks; at the same time, malloc uses a display linked list structure to manage All free blocks, doubly linked list, each free block records a continuous, unallocated address;
when memory allocation, Malloc will traverse all free blocks through the implicit linked list, select the block that meets the requirements for allocation; When merging, malloc uses the boundary marking method to decide whether to merge the blocks according to whether the blocks before and after each block have been allocated.
When Malloc applies for memory, it usually applies through brk or mmap system calls. When the requested memory is less than 128K, the system function brk will be allocated in the heap area; and when the requested memory is greater than 128K, the system function mmap will be allocated in the mapping area.

1.15. STL iterator deletes elements:

  1. For the sequence container vector, deque, use erase, the latter element is moved forward by one, and erase returns the next valid iterator;
  2. For map, set, use erase, the current element iterator is invalid, but because the structure is a red-black tree, deleting elements will not affect the next element iterator. Before calling erase, record the iterator of the next element.
  3. For list, use discontinuous allocation of memory, erase returns the next valid iterator

1.16. The difference between vector and list

VectorList
Continuous storage containers, dynamic arrays, allocate space on the heap, double the capacity growth, sequential memoryDynamic doubly linked list, space on the heap, each element deleted will release a space
Access: O(1) (random access); Insert: post-insertion fast, memory copy is needed in the middle, memory application and release; Delete: post-delete fast, memory copy is needed in the middleAccess: Random access is poor, only at the beginning and end; insertion and deletion are fast, constant overhead
Applicable scenarios: frequent random access, and infrequent insertion and deletion of non-tail nodesSuitable for frequent insertions and deletions
Here is the difference
ArrayDoubly linked list
Support random accessDoes not support random access
Sequential memoryDiscrete memory
Intermediate node insertion and deletion will cause copywill not
Allocate memory at one time, double expansionList will make a memory request every time it is inserted in a new node
Good random access performance, poor insertion performancein contrast

1.17. The role of STL iterators, why use iterators instead of pointers?

  1. Iterators provide a way to sequentially access the elements of an aggregate object without exposing the internal representation of the object; in other words, using this method, yes, we can directly ask questions according to a certain order rule without knowing the internal structure of the object Aggregate the elements of the object
  2. The difference with pointers: Iterators are not pointers, but class templates, behave like pointers, simulate pointer functions, overload pointer operators such as ->, *, ++, etc., which are equivalent to a smart pointer, based on different types of data Structure to achieve different operations
  3. The access method of the iterator class is to abstract the access logic of different collection classes, yes, without exposing the internal structure of the collection to achieve the effect of loop traversal;

1.18. Access permissions for class members in C++

C++ controls the access rights of member variables and member functions through three keywords: public, protected, and private. They represent public, protected, and private respectively, which are called member access qualifiers
. Inside the class, there is no distinction and no restriction.
Subclasses can access properties and methods
other than the parent class’s private properties and other classes can only access public

1.19. The difference between struct and class

In C++, you can use struct and class to define classes, both of which can be inherited. The difference is that the default inheritance permission and default access permission of structural are public, while the default inheritance permission and default access permission of class are private. In addition, class can also define template class parameters, such as template <class T, int i>

1.20. The process of C++ source text from text to executable file

  1. Preprocessing: header files contained in the source code file, precompiled statements, analysis and replacement, and generation of precompiled files
  2. Compilation stage: specific coding
  3. Assembling stage: Converting into machine code, relocating the target file
  4. Linking stage: multiple object files and required libraries are linked into the final executable file

1.21. The difference between include "" and include <>

  1. The path to find the header file in the preprocessing stage of the compiler is different
  2. Double quote search path: current header file directory, the header file path set by the compiler, the path specified by the system variable path path
  3. <>Search path: header files set by the compiler, system variables

1.22. Fork, wait, exec functions

The parent process generates a child process and uses fork to copy a copy of the parent process. At this time, only the page table of the parent process is copied. Both processes read the same block of memory. When a process writes, the real copy mechanism is used to allocate memory. The exec function You can load an elf file to replace the parent process. From then on, the parent process and the child process can run different programs. Fork returns the pid of the child process from the parent process, and returns 0 from the child process. The parent process that calls wait will be blocked until the state of the child process changes. Successful execution returns 0, and error returns -1. If exec executes successfully, the child process starts to run from the new program, no return value, if execution fails, it returns -1

1.23. What is the difference between map and set?

Implementation of map and set*3:

  1. C++ associative container, red-black tree, map is KV pair, K index, V data, K in set is set;
  2. The map modifies V but does not change K, because the bottom layer of the red-black tree is arranged according to K to ensure order. If K can be modified, you first need to delete K to adjust the tree balance. Insert the modified K to adjust the balance, which will destroy the map and set. Structure;
  3. map supports subscript query, there is no default value, so use it with caution, it is recommended to find

1.24. The difference between resize and reserve in STL

  1. resize(): change the number of elements contained in the current container
    vectorv;
    v.resize(20);
    v.push_back(2); // at this time 2 is the 21 position
  2. reserve(len): Change the maximum capacity of the current container, no elements will be generated; if the reserve is greater than the capacity, the object space of len is reallocated, and the original object is copied

1.25. BSS side and other six paragraphs: C++ memory management?

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-jhYUtGYU-1623204463133)(07_operating system/code storage structure.jpg)]
In C++, the virtual memory is divided Six parts: code segment, data segment, BSS segment, heap area, file mapping area, and stack area

  1. Code segment: including read-only memory area (string constant) and text area (machine code of the program), read-only
  2. Data segment: initialized global variables and static variables in the storage program; belong to static memory allocation
  3. BSS segment: stores global variables and static variables (local + global) that are not initialized or initialized to 0; belong to static allocation, and static variable resources are automatically released by the system after the program ends.
  4. Heap area: When the new/malloc function is called, memory is dynamically allocated in the heap area, and delete/free needs to be called to manually release the requested memory. Frequent malloc free causes discontinuity in the memory space and generates fragmentation, so the heap is less efficient than the stack
  5. Mapping area: store dynamic link library and file mapping by calling mmap function
  6. Stack area: store the return address, return value, parameters, local variables of the function; automatically released by the compiler,

1.26. Memory leaks

  1. Heap memory leak, if the memory allocated by malloc, new, realloc from the heap, the memory is not released due to a program error, generated
  2. Leakage of system resources: The program uses system resources: bitmap, handle, socket forget to release, which will lead to poor system performance and stability
  3. The base class destructor is not defined as a virtual function. After the base class pointer points to the subclass object, when the base class is released, the subclass resources will not be released correctly

1.27. Determine memory leaks:

  1. Memory leak reason: usually call malloc/new and other memory application operations, lack of corresponding free/delete
  2. To determine whether the memory is leaking, you can use the memory leak detection tool in the Linux environment, or you can add memory application and release statistics when writing the code, and calculate whether the application and release are consistent, so as to determine the memory leak
    varglind, mtrace detection

1.28. How to deal with high concurrency in a single-threaded way?

I/O multiplexing asynchronous callback

1.29. Big endian little endian?

Big-endian means that the low byte is stored at the high address; little-endian storage is that the low byte is stored at the low address. We can judge whether the system is big-endian or little-endian based on the consortium. Because the union variable is always stored from the lower address.

1.30. Design a server to implement multiple client requests

Multithreading, thread pool, IO reuse

1.31. How many kinds of locks do you know about C++?

Locks include mutexes, condition variables, spin locks, and read-write locks.
Producer-consumer problems can be easily solved by using mutexes and condition variables. Condition variables here serve as a substitute for semaphores.

2. Algorithm

2.1. Unordered array of n integers, find a number larger than him after each element, the time complexity is O(N)

Method: Pay attention to the question: Find a number greater than him after each element. For
example, [1,3,2,6,9,3]
then [3,6,6,9,-1,-1]
if the stack Empty, the current element is not greater than the top of the stack, the element is pushed into the stack, the
stack is not empty, the current element is greater than the top of the stack, popped, the element is comparing the current top of the stack, and the element is not greater than the top of the stack

	vector<int> res(nums.size(), 0);
	stack<int> temp;
	for(int i=0; i<nums.size(); i++){
        while(!temp.empty() && nums[temp.top()] < nums[i]){
            res[temp.top()] = i-temp.top();
            temp.pop();
        }
		temp.push(i);
	}

3. Operating System

3.1. Processes and threads?

3.1.1. Definition

The process is the encapsulation of the runtime program. It is the basic unit of the system for resource scheduling and allocation. It realizes the concurrency of the operating system. The thread is the subtask of the process. It is the basic unit of CPU scheduling and dispatching. It is used to ensure the real-time of the program. It can realize the concurrency within the process; the thread is the smallest unit of execution and scheduling recognizable by the operating system. Each thread occupies a virtual processor: its own register set, instruction counter, and processor state. Each thread completes different tasks, but shares the same address space (that is, the same dynamic memory, mapped files, object code, etc.), open file queues and other kernel resources.

3.1.2. Difference

  1. A thread can only belong to one process, and a process can have multiple threads, but at least one thread. Thread depends
    on the process to exist.
  2. A process has an independent memory unit during execution, and multiple threads share the memory of the process. (Resources are allocated to the process, all threads of the same process share all the resources of the process. Multiple threads in the same process share code segments (code and constants), data segments (global variables and static variables), extended segments (heap storage) .But each thread has its own stack segment, the stack segment is also called runtime, used to store all local variables and temporary variables.)
  3. The process is the smallest unit of resource allocation, and the thread is the smallest unit of CPU scheduling;
  4. System overhead: When creating or canceling a process, the system must allocate or reclaim resources for it, such as memory space, IO devices, etc. Therefore, the cost of the operating system will be significantly greater than the cost of creating or canceling threads. Similarly, when performing process switching, it involves the preservation of the entire current process CPU environment and the setting of the CPU environment of the newly scheduled process. However, thread switching only needs to save and set the contents of a small number of registers, and does not involve operations in memory management. It can be seen that the overhead of process switching is much greater than the overhead of thread switching.
  5. Communication: Since multiple threads in the same process have the same address space, the realization of synchronization and communication between them has also become easier. Inter-process communication IPC, threads can directly read and write process data segments (such as global variables) for communication-process synchronization and mutual exclusion means are needed to ensure data consistency. In some systems, thread switching, synchronization and communication do not require the intervention of the operating system kernel.
  6. . Process programming and debugging is simple and reliable, but the creation and destruction overhead is high; the thread is the opposite, the overhead is small, and the switching speed is fast, but the programming and debugging is relatively complicated.
  7. The processes will not affect each other; the hang of one thread will cause the whole process to hang
  8. Process is suitable for multi-core and multi-machine distribution; thread is suitable for multi-core

3.1.3. Why do we need threads when we have a process?

The process can execute multiple programs concurrently, improving resource utilization and system throughput, but the process can only do one thing at the same time. Process blocking, the entire program blocking
thread is the basic unit of concurrent execution, reducing the time and space overhead of the program during concurrent execution. Improve concurrency, at the same time, thread resource consumption is small, shared memory
and CPU is more efficient, when the number of threads is not greater than the number of CPUs, different threads run on different CPUs to
improve the program structure, consider dividing a complex and long process into multiple threads , Run independently or semi-independently, the program is easy to be interpreted by the compiler

3.1.4. The way of inter-process communication

Inter-process communication mainly includes pipes, system IPC (including message queues, semaphores, signals, shared memory, etc.), and sockets

Pipes: Pipes mainly include unnamed pipes and named pipes: pipes can be used for communication between parent and child processes with kinship. In addition to the functions of pipes, famous pipes also allow communication between unrelated processes
a. Ordinary pipes, Half-duplex (data can only flow in one direction), with fixed reading and writing,
only kinship, inter-process communication
can be regarded as a special file, reading and writing use read and write functions, but it is not a normal file, it only exists in memory Read and write operations in
b. Named pipes: FIFO, irrelevant inter-process communication, special forms of data exchange exist in the file system

Message queue: a linked list of messages, stored in the kernel, a message queue and an identifier, processes with read and write permissions can add new messages to the message queue according to the rules, and processes with read and write permissions on the message queue can be from the message queue Read message

Semaphore: Counter, control the access of a process to shared resources, realize mutual exclusion and synchronization between processes, not used to store inter-process communication data, combine shared memory to transfer data, based on the PV operation of the operating system, the program performs the semaphore It is an atomic operation, each PV operation is not limited to adding one or subtracting one to the semaphore, and can add and subtract any positive integer, supporting semaphore groups

Signal: Notify the receiving process that an event has occurred

Shared memory: Multiple processes access the same memory space, and different processes see the data update in the shared memory in the other process in time. This method relies on some kind of synchronization operation, such as mutexes and semaphores

Socket: inter-process communication, but it can be used between different hosts

3.1.5. Communication between threads

  1. Critical section: Multi-threaded serial access to common resources or a piece of code, fast, suitable for controlling data access
  2. Mutex: Mutex object mechanism, only the mutex object has access to public resources, because there is only one mutex object, which can ensure that public resources will not be accessed by multiple threads at the same time
  3. Semaphore: Designed to control user resources with a priority number, allowing multiple threads to access the same resource at the same time, but generally need to limit the maximum number of threads that access this resource at the same time
  4. Event: By means of notification operation to keep multi-thread synchronization, you can also easily implement multi-thread priority comparison operations;

3.1.6. How to synchronize mutexes, semaphores, and condition variables between threads?

  1. Semaphore: PV operation, semaphore SV is greater than 0, P is reduced by one, if SV is equal to 0, the thread is suspended;
    other threads are awakened because they are waiting for the semaphore SV, SV+1
  2. Mutex: Threads are mutually exclusive, sequential access is not guaranteed, and can be synchronized with the conditional lock. The thread enters the critical section, first obtains the mutex, and after leaving, unlocks to wake up other threads waiting for the mutex; guarantee any time , Only one thread accesses the object;
  3. Condition variable: Synchronize the value of shared data between threads, shared data reaches a certain value, wake up another thread waiting for this shared data

3.1.7. Thread lock:

Thread locks are usually used to achieve thread synchronization and communication. On single-core machines, multi-threaded programs still have the problem of thread synchronization; because in preemptive operating systems, each thread is usually allocated time slices, time slices are exhausted, and operations The system will hang and enter another thread. If two threads share some data without thread locking, it may cause conflicts due to shared data modification

3.1.8. Five basic states of the process

  1. Creation status: the process is being created
  2. Ready state: the process is added to the ready queue waiting for the CPU to be scheduled to run
  3. Execution status: the process is being run
  4. Waiting for blocking state: The process is temporarily unable to run for some reason, such as waiting for I/O and waiting for the device.
  5. Termination status: the process has finished running

3.2. Virtual memory

In order to prevent the contention and trampling of physical memory by different processes running in physical memory at the same time, virtual memory is adopted.
The virtual memory technology makes different processes in the running process, what it sees is that it owns the 4G memory of the current system alone. All processes share the same physical memory, and each process only maps and stores the virtual memory space it currently needs to the physical memory. In fact, when each process is created and loaded, the kernel just "creates" the virtual memory layout for the process. Specifically, it initializes the memory-related linked list in the process control table. In fact, it does not immediately put the program data in the corresponding location of the virtual memory. And the code (such as the .text.data section) is copied to the physical memory, just to establish the mapping between the virtual memory and the disk file (called memory mapping), wait until the corresponding program is run, it will pass the page fault exception , To copy data. Also, during the process of running, it is necessary to dynamically allocate memory. For example, when malloc, only virtual memory is allocated, that is, the page table entry corresponding to this virtual memory is set accordingly. When the process actually accesses this data, the defect is caused. The page is abnormal.
The request paging system, request segmentation system and request segment paging system are all aimed at virtual memory, and the information replacement between memory and external memory is realized through request.

3.2.1. Benefits of virtual memory

  1. Expand the address space;
  2. Memory protection: Each process runs in its own virtual memory address space and cannot interfere with each other. Virtual memory also provides write protection to specific memory addresses, which can prevent malicious tampering of code or data.
  3. Fair memory allocation. After using virtual storage, each process is equivalent to having the same size of virtual storage space.
  4. When the process communicates, it can be realized by virtual memory sharing.
  5. When the program needs to allocate contiguous memory space, it only needs to allocate contiguous space in the virtual memory space, instead of the contiguous space of actual physical memory, and fragments can be used

3.2.2. The cost of virtual memory

  1. The management of virtual memory requires the establishment of many data structures, which take up additional memory
  2. The conversion of virtual addresses to physical addresses increases the execution time of instructions
  3. Page swapping in and out requires disk IO, which is time-consuming
  4. If there is only part of the data in a page, memory will be wasted

3.3. The memory structure of the program in the operating system, or the memory management of C++?

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-yfIG1Gb1-1623204463135)(07_operating system/code storage structure.jpg)]
In C++, the virtual memory is divided Six parts: code segment, data segment, BSS segment, heap area, file mapping area, and stack area

  1. Code segment: including read-only memory area (string constant) and text area (machine code of the program), read-only
  2. Data segment: initialized global variables and static variables in the storage program; belong to static memory allocation
  3. BSS segment: stores global variables and static variables (local + global) that are not initialized or initialized to 0; belong to static allocation, and static variable resources are automatically released by the system after the program ends.
  4. Heap area: When the new/malloc function is called, memory is dynamically allocated in the heap area, and delete/free needs to be called to manually release the requested memory. Frequent malloc free causes discontinuity in the memory space and generates fragmentation, so the heap is less efficient than the stack
  5. Mapping area: store dynamic link library and file mapping by calling mmap function
  6. Stack area: store the return address, return value, parameters, local variables of the function; automatically released by the compiler,

3.4. Page fault interrupt

  1. Reason: memory allocation functions such as malloc and mmap only establish the process virtual address space during allocation, and do not allocate the physical memory corresponding to the virtual memory. When the process accesses these virtual memory that does not have a mapping relationship, the processor automatically triggers a page fault Abnormal, the operating system will find the missing page in the external memory according to the external memory address in the page table and transfer it into the memory;
  2. Steps:
    protect the CPU site; analyze the cause of the interrupt; interrupt the processing program; restore the CPU site and continue execution

3.5. The difference between fork and vfork:

Fork: Create a process that is the same as the current process image through the fork() system call: it is almost exactly the same as the process that called fork( ), and both processes will continue to run

vfork() will suspend the parent process until the child process terminates or a new executable image is run. In this way, vfork() avoids page-by-page copy of the address space. In this process, the parent process and the child process share the same address space and page table entries. In fact, vfork() only accomplishes one thing: copying the internal kernel data structure. Therefore, the child process cannot modify any memory in the address space.

The fork child copies the parent's data segment and code segment, and vfork shares the data segment with the parent

The execution order of the father and son of fork is uncertain, v ensure that the father and the son are the first

v After calling exec or exit, the parent process is scheduled to run

3.6. Concurrency and parallelism?

  1. Concurrency: At the macro level, two programs run at the same time, single-core CPU multitasking, at the micro level, two program instructions are interleaved and run, and only one instruction is run in a single clock cycle. This concurrency cannot improve PC performance, but can only improve efficiency
  2. Parallel: physical simultaneous operation, multi-core CPU, two programs running on two cores at the same time complementary interference, this parallel improves PC efficiency

3.7. Deadlock

3.7.1. The cause of deadlock:

During the execution of two or more processes, the phenomenon of waiting for each other due to competition for resources

3.7.2. Four necessary conditions and deadlock resolution:

  1. Mutually exclusive conditions: The resources allocated by the process are not allowed to be plundered by other processes. If other processes need to access the resource, they can only wait
  2. Request retention: After the process obtains a certain resource, it makes a request for other resources, but the resource is occupied by other processes, the request is blocked, and the process will not release its own resources
  3. Non-lootable: The process acquires resources, unused, non-lootable, and can only be released after use
  4. Loop waiting: When the process is deadlocked, there must be a ring link between the process and the resource (A has a and b and B has b and a)

Break any condition

  1. Allocate resources at one time to ensure that there is no deadlock before allocating resources
  2. Lootable resources: the process is not satisfied, release the occupied resources, destroy the condition 3
  3. Orderly allocation of resources: each process requests in increments of sequence number, breaking the loop waiting

3.8. Memory overflow and memory leak

  1. Memory overflow: When applying for memory, there is not enough memory for the applicant to use
  2. Memory leak: program error, resulting in failure to release resources in time

3.9. System calls:

Reading and writing of files, open, write

3.10. IO model?

  1. Blocking IO: call a function, wait for the function to return, do not do any tasks during the period
  2. Non-blocking IO: Check whether the IO time is ready every period of time, and do other tasks if it is not ready
  3. Signal-driven IO: signal processing function, the process continues to execute, when the IO time is up, the process receives the signal, and then processes the IO time
  4. IO multiplexing: The select/poll function is implemented, and multiple IO operations are blocked at the same time, and multiple read and write operations are detected at the same time. If data is detected to be readable or writable, the IO operation function is called
  5. Asynchronous IO, call the aio_read function to tell the kernel, what is needed, the file, the notification method, when the kernel copies the data to the buffer, the application is notified

3.11. Infinite loop + the method of creating a new thread when connecting is a bit inefficient, how to improve?

Create a thread pool in advance, use the producer-consumer model to create a task queue, the queue is a critical resource, when there is a new connection, it hangs on the task queue, the queue is empty and all threads sleep. Improve infinite loop: use technology like select epoll

4. Computer Network

4.1. The difference between TCP and UDP and their respective application scenarios?

the differenceTCPUDP
connectionConnection-oriented transport layer control protocolnot connected
ObjectPoint-to-point, a TCP connection can only have two endpointsOne-to-many, many-to-one, and many-to-one interactive communication, in short, is broadcast.
Congestion controlCongestion control and flow control to ensure the safety and stability of data transmissionNo, network congestion does not affect the sending efficiency of the source host
reliabilityReliable delivery, no errors, no loss, no duplication, and arrival in orderBest effort delivery
Message lengthDynamic message length, determined according to the receiver's window size and current network congestion8 bytes (source port, destination port, data length, checksum)
Adapt to the sceneBecause TCP is reliable but slow in transmission, if the integrity of communication data is required, it is recommended to transfer files and use it in important states.If real-time communication is important, such as video transmission, real-time communication.

4.2. How does TCP guarantee reliability?

4.2.1. Serial number, confirmation response, timeout retransmission

When the data reaches the receiver, the receiver needs to send a confirmation response, indicating that the segment has been received, and the confirmation sequence number indicates the data sequence number that it needs to receive next. If the sender does not receive the confirmation response, the sent data may be lost, or it may be a confirmation The response is lost. At this time, the sender waits for a certain period of time, and then retransmits after a timeout;

4.2.2. Window control, tell retransmission control, fast retransmission

TCP will use window control to increase the transmission speed, which means that the size of a window does not have to wait until the response is transmitted in the next sequence. The window size is the maximum value that can continue to send data without waiting for confirmation. If window control is not applicable, each is confiscated The received response data needs to be retransmitted, using window control, if the serial number is 1001-2000 lost and the receiver does not receive it, every time the subsequent data is transmitted, the confirmation response will continue to send a response with the serial number 1001, which means I Need to receive 1001-2000 data, the sender receives the same response 3 times, it will immediately resend; another situation is that the receiver receives the data, but the response is lost, this situation will not be resent, because the sender knows , If the data segment is lost, the receiving end will always request data;

4.2.3. Congestion Control

If the window is set too large and the sender continuously sends a large amount of data, it may cause network congestion or even network paralysis. Therefore, TCP sets congestion control to prevent this situation

  1. Slow start: Define the congestion window, set it to 1 at the beginning, and then every time a confirmation response is received, window*2
  2. Congestion avoidance: Set the slow start threshold. When the congestion window reaches this value, the congestion window does not increase in *2, but linearly increases. Each time the congestion window is +1, congestion is avoided
  3. Once the timeout occurs, the sender sets the threshold to half of the current window, the window size is set to 1, and the slow start process is entered
  4. Fast retransmission: When three repeated confirmation responses are encountered, it means that three message segments have been received, but if one is lost before, retransmit immediately, and then the threshold is set to half of the current window, and the window size is set to the slow start threshold +3 the size of
  5. Purpose: During TCP communication, the network throughput gradually increases, and as congestion reduces the throughput, it slowly increases, and the network is not prone to paralysis;

4.3. TCP connection establishment and disconnection process:

Three-way handshake: client syn, server ack, client syn
. Reason for the third handshake: two handshake, after the server sends the ack, it is not known whether the client has received the connection confirmation.
Four-way handshake: can be combined into a three-way handshake , The second and third time can be put in the same message

Data transmission: client write socket, server read
four waves: C->S; S->C; S->C; C->S; (full duplex, stop receiving, the other party can still send)
four times The reason for waving: Because the TCP connection is full-duplex, each direction must be closed separately. The principle is that when one party completes the data sending task, it sends a FIN to terminate the connection in one direction, that is, it will not accept data, but The other party of TCP can send, only when the other party also terminates the connection,

4.4. Tell me about the whole process of the local host accessing Baidu web pages?

Involved: network protocol, software, hardware, security, high concurrency, high availability
URL request target IP, local browser DNS cache, host file, local DNS (telecommunications), root domain name server -> Baidu server, return target IP, above query Based on the transport layer UDP protocol, then http establishes a get request, TCP three-way handshake, IP routing jumps to the target IP, packs the request message, opens up space, transmits data, parses and displays it on the page

4.4.1. url analysis

https://www.baidu.com Uniform Resource Locator

  • https confirm to access the web browser
  • www.baidu.com determines the name of Baidu web server
  • index.html is the resource name. The
    above three parses generate http request information, and after sending it, the server response message is obtained and displayed on the browser web page

4.4.2. Domain name resolution

  1. Browser cache: If the browser has Baidu DNS, it will cache the DNS, and it will not go down.
  2. System cache: Look up the corresponding Baidu domain name and IP from the host file, without forwarding
  3. Router cache: no turn down
  4. ISP DNS cache, telecommunications, China Unicom, mobile
  5. If there is none of the above, check the corresponding IP of the domain name from the root domain name server, and the root domain name request will be transferred to the next level until the IP is found

4.4.3. Server processing

  • The web server accepts the user's request and hands it to the website code

4.4.4. Website handling MVC

4.4.5. Browser processing

4.4.6. Hierarchical analysis

Application layer: http, DNS
Transport layer: TCP / UDP
Network layer: IP
data link: router
Physical layer: cable, optical cable

4.5. What is the difference between http and https?

  1. http plaintext, port 80,
  2. https, encrypted by TLS, port 443 is highly secure, encrypted after the TCP three-way handshake; you need to apply for a certificate with the server, and the browser can install a certificate for normal use to ensure that the data is sent correctly.
  3. Disadvantages of https: high handshake delay, high deployment cost, high resource consumption and high CPU;

4.6. IP/MAC:

The logical address assigned by each host in each network of the Internet, shielding physical differences, network layer control MAC is the hardware, network device location, data link layer control

4.7. Interrupt:

Response to an event: Pause to save the current, to execute the interrupt, and then return to restore, external (IO, clock) internal (illegal, overflow, out of bounds) system call, hardware response, software processing

4.8. The difference between get/post:

The get request sends the http header and data together, the server responds with 200, post request, sends the header first, responds with 100, sends data, responds with 200
get url request, post puts the request in the request body, get parameter length limit in the url, post No
get is insecure, url is plaintext, generally
get browser active cache, post multiple encodings through encryption , get complete save browser records, post request will not be
essentially tcp, get one request package, post two

4.9. Socket programming:

Server: socket-bind-listen-accept
Client: socket-connect

5. Database related

5.1. Could you talk about database indexes?

An index is a structure for sorting the values ​​of one or more columns in a database table. Using an index can quickly access specific information in a database table. If you want to find a specific employee by his or her last name, the index helps to obtain information faster than searching all rows in the table.
One of the main purposes of the index is to speed up the method of retrieving the data in the table, that is, to assist the information searcher to find the auxiliary data structure of the record ID that meets the restricted conditions as soon as possible.

Index classification:
ordinary index: create index index_name on table(column);
unique index: similar to ordinary index, the value of the index column must be unique (it can be empty, which is different from the primary key index) create unique index index_name on table(column) ;
Primary key index: Special unique index, which is not allowed to be empty, there can only be one. Generally, the primary key (column)
composite index is specified when the table is built : Indexes are created on multiple fields, following the principle of the leftmost prefix. alter table t add index index_name(a,b,c);
Full-text index: Mainly used to find keywords in the text, not directly compared with the value in the index, like a search engine, used with match against, now only Full-text indexes can be created on char, varchar, and text.

When to use indexes
MySQL uses only one index per query. Rather than saying that "database queries can only use one index", it is more time-consuming to analyze two index B+ trees than a full table scan. So where A=a and B=b is the best way to use the combined index of (A, B), and the B+ tree is sorted according to (A, B).

When not applicable to index
Table records are too few;
data is repeated and evenly distributed fields (only columns with few data values);
tables that are frequently inserted, deleted, and modified should reduce indexes;
text, image, and other types should not be indexed, these The data volume of the column is large (if the first 10 characters of text are unique, the first 10 characters of text can also be indexed);
MySQL can estimate that the full table scan is faster than the index, and the index is not used;

When does the
index fail? The leftmost prefix is ​​not used for the combined index, for example, the index is not used for the combined index (A, B), where B=b;
the leftmost prefix is ​​not used for like, where A like'%China'; when
searching an index Order by on another index, where A=a order by B, only use the index on A, because the query only uses one index;
or will invalidate the index. If the query fields are the same, the index can also be used. For example, where A=a1 or A=a2 (effective), where A=a or B=b (invalid)
If the column type is a string, use quotation marks. For example, where A='China', otherwise the index will fail (type conversion will be performed);
operations on the index column, functions (upper(), etc.), or,! =(<>), not in, etc.;


5.2. Please talk about database transactions

Database Transaction (Database Transaction) refers to a series of operations performed as a single logical unit of work, either completely executed or not executed at all. Transaction processing can ensure that unless all operations in the transactional unit are successfully completed, data-oriented resources will not be permanently updated. By combining a set of related operations into a unit that either all succeeds or all fails, you can simplify error recovery and make your application more reliable. For a logical unit of work to become a transaction, it must meet the so-called ACID (atomicity, consistency, isolation, and durability) properties. Transaction is the logical unit of work in database operation, and the transaction management subsystem in the DBMS is responsible for transaction processing.


5.3. Database transactions and four major characteristics and isolation levels

  1. Transaction: A series of operations performed by a single logical unit of work, either completely work/not performed at all, to ensure that all operations are completely successful, or data resources will not be updated. The error recovery mechanism makes the application more reliable.

5.3.1. Four characteristics

  1. Atomicity: Atomicity means that all operations included in the transaction are either all successful or all failed rollback (bin-log); transfer failure rollback
  2. Consistency: A transaction must be in a consistent state before and after execution. The money transferred between the two will always remain 5000
  3. Isolation: Isolation means that when multiple users access the database concurrently, for example, when operating the same table, the transaction opened by the database for each user cannot be interfered by the operation of other transactions, and multiple concurrent transactions must be isolated from each other. . B cannot transfer money to this card before the process of A withdrawing money is over
  4. Persistence: Persistence means that once a transaction is submitted, the change to the data in the database is permanent; the data change must be completed when the transfer is successful

5.3.2. Different isolation levels:

  1. Read Uncommitted (read uncommitted content): the lowest isolation level, nothing needs to be done, one transaction can read the uncommitted results of another transaction. All concurrent transaction problems will occur.
  2. Read Committed (read submitted content): Only after the transaction is committed, the update results will be seen by other transactions.
  3. Repeated Read (repeatable read): In a transaction, the result of reading the same piece of data is always the same, regardless of whether other transactions operate on the data, and whether the transaction is committed. Can solve dirty reads, non-repeatable reads.
  4. Serialization: The transaction is executed serially, with the highest isolation level, sacrificing the concurrency of the system. Can solve all the problems of concurrent transactions.

5.4. What is an index? Will it be okay to add more indexes?

1. Index
database index is an identifier attached to the table fields in order to increase the query speed. It is a structure for sorting the values ​​of one or more columns in the database table.

When DB executes a SQL statement, the default method is to scan the entire table according to the search conditions, and join the search result set when it encounters a matching condition. If we add an index to a field, we will first locate the number of rows with a specific value in the index list at a time when querying, greatly reducing the number of matching rows traversed, so the query speed can be significantly increased.

  1. Advantages:
    By creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.
    Can greatly speed up the data retrieval speed, which is also the main reason for creating an index.
    It can speed up the connection between the table and the table, especially in the realization of the referential integrity of the data is particularly meaningful.
    When using grouping and sorting clauses for data retrieval, it can also significantly reduce the time for grouping and sorting in the query.
    By using the index, you can use the optimization hider in the query process to improve the performance of the system.
  2. Disadvantages: It
    takes time to create and maintain indexes, and this time increases as the amount of data increases.
    The index needs to occupy physical space. In addition to the data space occupied by the data table, each index also occupies a certain physical space. If a clustered index is to be established, the space required will be larger.
    When the data in the table is added, deleted, and modified, the index must also be dynamically maintained, which reduces the maintenance speed of the data.
  3. The principle of adding indexes
    The columns that are rarely used or referenced in the query should not be indexed. This is because, since these columns are rarely used, indexing or no indexing does not improve the query speed. On the contrary, due to the increase of the index, the maintenance speed of the system is reduced and the space requirement is increased.
    Indexes should not be added to columns with few data values. This is because, because these columns have very few values, such as the gender column of the personnel table, in the result of the query, the data rows of the result set account for a large proportion of the data rows in the table, that is, the data that needs to be searched in the table The proportion of rows is large. Increasing the index will not significantly speed up the retrieval speed.
    Columns defined as text, image, and bit data types should not be indexed. This is because the amount of data in these columns is either quite large or has few values.
    When the modification performance is far greater than the retrieval performance, an index should not be created. This is because modification performance and retrieval performance are contradictory. When the index is increased, the retrieval performance will be improved, but the modification performance will be reduced. When the index is reduced, the modification performance will be improved and the retrieval performance will be reduced. Therefore, when the modification performance is far greater than the retrieval performance, an index should not be created.

5.5. The three paradigms of databases

First normal form: When all the attributes of the relational pattern R can no longer be decomposed into more basic data units, it is said that R satisfies the first normal form, that is, the attributes are inseparable. The
second normal form: If the relational pattern R satisfies the first normal form, and R is All non-primary attributes are completely dependent on each candidate key attribute of R. It is said that R satisfies the second normal form and the
third normal form: Let R be a relational model that satisfies the conditions of the first normal form, and X is any attribute set of R. If X is not Transmit any candidate key that depends on R, it is said that R satisfies the third normal form, that is, non-primary attributes are not transitively dependent on the key code


5.6. The four isolation states of mysql

Transaction isolation levelDirty readNon-repeatablePhantom reading
Read-uncommittedYesYesYes
Non-repeatablenoYesYes
RepeatablenonoYes
Serializationnonono

5.7. MVCC mechanism of mysql

MVCC is a multi-version concurrency control mechanism. It is a specific way for MySQL's InnoDB storage engine to achieve isolation levels. It is used to achieve the two isolation levels of commit read and repeatable read. MVCC implements this mechanism by saving a snapshot of the data at a certain point in time. It saves two hidden columns after each row of records, and saves the creation version number and the deleted version number of this row respectively, and then Innodb's MVCC uses it. The snapshot is stored in the Undo log, which connects all the snapshots of a data row through a rollback pointer.


5.8. What are the SQL optimization methods?

  • Optimize queries through indexing
  • To optimize the query, full table scans should be avoided as much as possible

5.9. MySQL engine and difference?

5.9.1. MySQL Engine

The database engine is the core service used to store, process and protect data. The database engine can be used to control access rights and quickly process transactions, so as to meet the requirements of most applications that need to process large amounts of data in the enterprise. Use the database engine to create a relational database for online transaction processing or online analysis and processing of data. This includes creating tables for storing data and database objects (such as indexes, views, and stored procedures) for viewing, managing, and protecting data security.
MySQL storage engines mainly include: MyIsam, InnoDB, Memory, Blackhole, CSV, Performance_Schema, Archive, Federated, Mrg_Myisam.
But the most commonly used are InnoDB and Mylsam.

5.9.2. InnoDB Engine

InnoDB is a transactional storage engine with row-level locking and foreign key constraints.

  1. The Innodb engine provides support for database ACID transactions.
    The engine also provides row-level locks and foreign key constraints. Its design goal is to deal with large-capacity database systems.
    It itself is actually a complete database system based on the MySQL background,
    MySQL runtime Innodb will establish a buffer pool in memory for buffering data and indexes.
    But the engine does not support FULLTEXT type indexes, and it does not save the number of rows in the table. When SELECT COUNT(*) FROM TABLE, it needs to scan the entire table.
    When you need to use database transactions, this engine is the first choice.
  2. Applicable scenarios:
    frequently updated tables, suitable for handling multiple concurrent update requests.
    Support affairs.
    Can recover from disasters (via bin-log logs, etc.).
    Foreign key constraints. Only he supports foreign keys.
    Support automatically increase the column attribute auto_increment.
  3. Index structure:
    InnoDB is also a B+Treee index structure. Innodb's index file itself is a data file, that is, the data field of B+Tree stores actual data, and this kind of index is a clustered index. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.
    InnoDB's auxiliary index data field also stores the value of the primary key of the corresponding record instead of the address, so when searching with the auxiliary index, the primary key will be found first according to the auxiliary index, and then the actual data will be found according to the primary key index.

5.9.3. Mylsam engine

MyIASM is the default engine of MySQL, but it does not provide support for database transactions, nor does it support row-level locks and foreign keys. Therefore, when INSERT or UPDATE data is written, the entire table needs to be locked, and the efficiency will be lower. The MyIsam storage engine is independent of the operating system, that is, it can be used on windows, or it can be relatively simple to transfer data to the linux operating system.

Applicable scenarios
Does not support transaction design, and
does not support foreign key table design.
The query speed is very fast, it is more suitable if the database insert and update operations are more.
MyISAM places great emphasis on fast read operations.
Disadvantages: it is not possible to actively restore data after the table is damaged

Index structure
MyISAM index structure: MyISAM index uses B+ tree to store data, the pointer of MyISAM index points to the address of the key value, and the address stores the data. The content stored in the data field of B+Tree is the address of the actual data, that is to say, its index is separated from the actual data, but the actual data is pointed to by the index. This kind of index is the so-called non-clustered index.

5.9.4. The difference between InnoDB and Mylsam:

  1. Transaction: The MyISAM type does not support advanced processing such as transaction processing, while the InnoDB type supports advanced database functions such as transaction support and external keys.
  2. Performance: The MyISAM type table emphasizes performance, and its execution is several times faster than InnoDB type.
  3. Row count storage: InnoDB does not save the specific row count of the table, that is, when executing select count() fromtable, InnoDB scans the entire table to calculate how many rows there are, but MyISAM simply reads the saved row count That's it. Note that when the count() statement contains the where condition, the operation of the two tables is the same.
  4. Index storage: For fields of type AUTO_INCREMENT, InnoDB must contain only the index of the field, but in MyISAM tables, joint indexes can be built together with other fields. MyISAM supports full text index (FULLTEXT), compressed index, InnoDB does not support.

The InnoDB storage engine is fully integrated with the MySQL server. The InnoDB storage engine maintains its own buffer pool for caching data and indexes in main memory. InnoDB stores its tables & indexes in a table space, which can contain several files (or raw disk partitions). This is different from MyISAM tables, for example, each table in a MyISAM table is stored in a separate file. InnoDB tables can be of any size, even on operating systems where the file size is limited to 2GB.

Server data backup: InnoDB must export SQL for backup. The load table from master operation does not work for InnoDB. The solution is to first change the InnoDB table to MyISAM table, and then change it to InnoDB table after importing the data, but for the additional use The InnoDB features (such as foreign keys) are not applicable to tables.

MyISAM responds to the data recovery speed caused by wrong coding. MyISAM data is stored in the form of files, so it will be very convenient in cross-platform data transfer. During backup and recovery, operations can be performed on a table separately.

InnoDB copies data files, backs up binlog, or uses mysqldump, which is relatively painful when the data volume reaches dozens of gigabytes.


5.10. Please talk about inner join and left join

left join (left join) returns including all the records in the left table and the records in the right table with the join field equal right join (right join) returns including all the records in the right table and the records in the left table with the same join field
inner join (etc. Value concatenation) Only return rows with equal join fields in two tables


5.11. The difference between mongodb and redis?

In terms of memory management mechanism: Redis data is all stored in memory and written to disk regularly. When the memory is insufficient, the specified LRU algorithm can be selected to delete the data.
MongoDB data is stored in memory, which is implemented by the Linux system mmap. When the memory is not enough, only the hotspot data is stored in the memory, and other data is stored on the disk.
Supported data structure: Redis supports rich data structures, including hash, set, list, etc.

MongoDB has a relatively simple data structure, but supports rich data expression, indexes, most similar to relational databases, and supports very rich query languages

5.12. Could you please tell me about the mysql engine and its difference?

In Mysql database, the commonly used engines are Innodb and MyIASM. Innodb is a transactional storage engine with row-level locking and foreign key constraints. It provides support for database ACID transactions and implements the four isolation levels of the SQL standard. , That is, read uncommitted, non-repeatable read, repeatable read, and serial. The target is a database system that handles large data capacity. The MyIASM engine is the default engine of Mysql. It does not provide support for database transactions, nor does it support row-level locks and foreign keys. Therefore, the entire table needs to be locked during write operations, which is less efficient. However, it saves the number of rows in the table. When Venus select count(*) form table, you can directly read the saved value without having to perform a full table scan. Therefore, when the table has more read operations than write operations, and transaction support is not required, MyIASM can be preferred


5.13. Could you please talk about how the timing mechanism of Redis is implemented?

Redis server is an event-driven program. The server needs to process the following two types of events: file events (the server's abstraction of socket operations) and time events (the server's abstraction of timing operations). The timing mechanism of Redis is realized by means of time events.
A time event is mainly composed of the following three attributes: id: time event identification number; when: record the arrival time of the time event; timeProc: time event handler, when the time event arrives, the server will call the corresponding processor to process time. A time event judges whether it is a timed event or a periodic event according to the return value of the time event processor.
A time event is mainly composed of the following three attributes: id: time event identification number; when: the arrival time of the recording time event; timeProc: time Event processor, when the time event arrives, the server will call the corresponding processor to process the time. A time event judges whether it is a timed event or a periodic event based on the return value of the time event processor.


5.14. Redis is single-threaded, but why is it so efficient?

Although the Redis file event processor runs in a single-threaded manner, by using an I/O multiplexing program to monitor multiple sockets, the file event processor not only implements a high-performance network communication model, but also can work well. Interfacing with other modules in the Redis server that also run in a single thread, which maintains the simplicity of the single-threaded design inside Redis.


5.15. What are the data types of Redis, and how to implement the bottom layer?

  1. String: integer value, simple dynamic string encoded by embstr, simple dynamic string (SDS)
  2. List: compressed list, double-ended linked list
  3. Hash: compressed list, dictionary
  4. Collection: integer collection, dictionary
  5. Ordered collections: compressed lists, skip lists, and dictionaries

5.16. How does Redis rehash do, why do you need to rehash incrementally, and how does incremental rehash implement it?

Because redis is single-threaded, when there are many K, if all key-value pairs are rehashed at one time, the huge amount of calculation will affect server performance, and may even cause the server to stop serving for a period of time. It is impossible to complete the entire rehash operation in one step, so redis is a gradual rehash that is divided into multiple times. There are two types of progressive hashing:
1) When operating redis, do an extra step of rehash

When reading, inserting, and deleting redis, the linked list at the position of table[dict->rehashidx] will be moved to the new dictht, and then rehashidx will be incremented by one and moved to the next slot.

2) The background timing task calls rehash

The background timing task rehash call chain, and the rehash call frequency can be controlled through server.hz


5.17. What is the difference between Redis and memcached?

1) Data type: redis has rich data types, supporting set liset and other types; memcache supports simple data types, requiring the client to handle complex objects by itself
2) Persistence: redis supports data landing persistent storage; memcache does not support data persistent storage. )

3) Distributed storage: redis supports master-slave replication mode; memcache can use consistent hash for distributed.

4) The value size is different: memcache is a memory cache, the length of the key is less than 250 characters, and the storage of a single item is less than 1M, which is not suitable for virtual machine use

5) Different data consistency: Redis uses a single-threaded model to ensure that data is submitted in order; memcache needs to use cas to ensure data consistency. CAS (Check and Set) is a mechanism to ensure concurrency consistency, which belongs to the category of "optimistic locking"; the principle is simple: take the version number, operate, compare the version number, operate if they are consistent, and give up any operation if they are inconsistent

6) CPU utilization: Redis single-threaded model can only use one CPU, and multiple redis processes can be opened

5.18. Basic sentences

# 用户操作
mysql -h 10.15.10.113 -P 3306 -u root -p 回车后输入密码 # 远程
create user zhao[email protected] identified by 'zhao'; # 创建用户
select user,host from mysql.user; # 显示用户
drop user "zhao"@"localhost" #删除用户zhao, 需要在root用户下
rename user 'zhao'@'localhost' to 'zjq'@'localhost';  # 用户重命名
grant select/insert/delete/update on zhao.* to 'zhao'@'localhost';  # /*给予增删改查权限*/
show grants for 'zhao'@'localhost'; # 查看zhao的权限
revoke delete on zhao.* from 'zhao'@'localhost'; # 删除用户zhao的删除权限

# 数据库操作
create database submission default character set gbk; # 创建数据库 这里默认编码gbk,也可以是utf8
drop database submission ; # 删除数据库
show databases; # 显示当前用户下包含的数据库
use submission; # 进入数据库submission
show variables like 'character_set_database'; # 显示当前数据库中的字符集
mysqldump -u root -p mydatabase > save.sql # 导出数据库mydatabase到save.sql
mysqldump -u root -p mydatabase t1> mydatabase_t1.sql # 导出表t1到
mysql -u用户名 -p密码 数据库名 < 数据库名.sql # 导入数据库, 注意数据库需要提前创建
mysql -u用户名 -p密码 -d 数据库名 表名 < 导出的文件名.sql # 导入库的表, 库需要提前创建爱

# 表操作
create table t1(age int, name varchar(10)) default charset=utf8; # 创建表
drop table t1; # 删除表
desc t1; # 显示表的参数和类型, 字段
select age,name from t1; # 显示数据表的全部年龄和姓名
insert into t1(age,name)  values(10,"丁丁"),(11,"泥泞"); # 插入数据
alter table t1 add column addr varchar(50); # 给上表增加列 
update t1 set age=12 where name="丁丁";  # 修改name为丁丁的年龄

# 查询
select distinct age from boy; # 去重查询
update boy set name='学习' where id=3; # 修改id为3的
SELECT * from boy where name='彪子'; # 查询彪子行
delete from boy [where] id=3]; # 删除

# 查询age在25-30之间的
select * from boy where age between 25 and 30; 

# 查询age 25和30的数据 固定范围
select * from boy where age in (25,30);

# 查询name中, 有四的列出来, %通配符,
select * from boy where name like "%四%";
select * from boy where name like "%四";
select * from boy where name like "彪%";


# 区分大小写的查询
select * from boy where binary name="鬼小四"; 


select * from boy order by age; # 排序输出
select * from boy order by age desc; # 降序输出


select * from boy limit 2; # 显示前两个


select age,count(1) from boy group by age; # 统计age数量,  age,count(1)是表头


# 多表查询
内连接
select a.*,b.* from a[inner] join b on ab链接条件   # 显式查找
select user.*,orders.* from user join orders on user.id=orders.id;

select a.*,b.* from a,b where ab链接条件           # 隐式
select user.*,orders.* from user,orders where user.id=orders.id;

外连接
左
select user.*,orders.* from user left join orders on user.id=orders.id;
select a.*,b.* from a left [outer] join b on 链接条件  
意思:先展示join左边表的全部数据,根据条件关联查询join右边的表,符合条件展示出来,不符合
     条件以null展示。
右
select a.*,b.* from b right join orders on user.id=orders.id;
 意思:先展示join右面边表的全部数据,根据条件关联查询join左边的表,符合条件展示出来,不符合
     条件以null展示。     

子查询
select id from user where username = '张三';
select * from orders where user_id = 3; # 
select * from orders where user_id = (select id  from user where username = '张三');
  

select user.*,orders.* from user,orders where orders.price>300;

select user.*,temp.* from user,(select * from orders where  price>300 )as temp where user.id 
 

6. Multi-table query

create table `user` (                                  
	`id` int auto_increment primary key,                
	`username` varchar(50)  -- 用户姓名                                                
);
create table `orders` (                                                  
	`id` int  auto_increment primary key,                                  
	`price` double,                                           
	`user_id` int                                       
);

insert into user values(3,'张三');
insert into user values(4,'李四');
insert into user values(5,'王五');
insert into user values(6,'赵六');
insert into orders values(1,1314,3);
insert into orders values(2,1314,3);
insert into orders values(3,15,4);
insert into orders values(4,315,5);
insert into orders values(5,1014,null);

desc user;
desc orders;
select * from user WHERE id=3;
SELECT * from orders WHERE price=1314;
select DISTINCT price as "heh" from orders;

select user.username, orders.price from user join orders on user.id=orders.user_id; # 内连接, 显式
select user.*, orders.* from user, orders where user.id=orders.user_id;  # 内连接隐式

select user.*,orders.* from user left join orders on user.id=orders.id; # 展示left的全部数据, 然后在关联右边的
select user.*,orders.* from user right join orders on user.id=orders.id; # 展示right的全部数据, 然后在关联左边的 

# 子查询
select * from user where username = '张三';
select * from orders where user_id = 3; # 
select * from orders where user_id = (select id  from user where username = '张三'); # 选择username为张三对应的id是3, 在找user_id为3的orders的全部
select user.*,orders.* from user,orders where orders.price>300; # 交叉一张表, 全部都保证price大于300

7. Project related

7.1. Please answer the difference between Merge and rebase in git

Merge will automatically perform a three-way merge based on the common ancestor of the two branches and the latest commit of the two branches, and then generate a new commit from the content modified in the merge, that is, merge merges the two branches and generates a new commit, and still Then save the commit record of the original branch


8. Design Patterns

8.1. Singleton mode

The singleton mode mainly solves the problem of frequent creation and destruction of a globally used class. In the singleton mode, you can ensure that there is only one instance of a certain class, and you can instantiate and provide this instance to the entire system. The singleton pattern has three elements: one is that a certain class can only have one instance; the other is that it must create this instance by itself; the third is that it must provide this instance to the entire system by itself.

Multi-thread safety of
singleton mode : In the implementation of singleton mode, if no measures are taken, it is not safe under multi-threading, and multiple instances may be created at the same time. Therefore, in order to ensure the thread safety of the singleton mode in multithreading, the following ways are generally used to implement the singleton mode:
Hungry style: instantiate when the class is loaded, causing garbage objects;
lazy style: double lock mechanism

8.2. Factory Mode

The factory model mainly solves the problem of interface selection. Define an interface for creating objects in this mode, let its subclass decide which factory class to instantiate by itself, and delay the creation process to the subclass.

8.3. Observer Mode

Define a one-to-many dependency relationship between objects. When the state of an object changes, all objects that depend on it are notified and automatically updated.

8.4. Decorator mode

Decorate some existing classes to extend some functions, thereby dynamically adding new functions to an object. The decorator pattern is a technique used to replace inheritance, which can extend the new functions of an object without adding subclasses through inheritance. Use the association relationship of the object instead of the inheritance relationship, which is more flexible and avoids the rapid expansion of the type system.