Comprehensive preparation for Java school recruitment interview knowledge

Article Directory

Self-introduction

Self-introduction requires careful preparation, highlighting the awards won, projects experienced, and different experiences from others, highlighting one's passion for learning.

Online experience accumulation

shopee|One Side Cool Sutra|2021|

hr noodle preparation

Java basics

Java is both interpreted and executed as well as compiled and executed.

  • The .java file is compiled to generate a .class file. The .class file cannot be run directly on the computer. It needs to run on the JVM of the computer, which can be regarded as interpreted and executed by the JVM.
  • The current JVM has some JIT (Just-In-Time Compiler) optimizations for efficiency. It will compile the binary code of .class into local code, which can be run directly by the computer. From this perspective, Java is compiled and executed.
  • Compiled language, after compiling, it generates a binary file and runs directly on the computer. The execution speed is fast. The disadvantage is that it cannot be cross-platform. Binaries compiled on Windows cannot run on Mac OS because of different operating systems. Interpreted languages ​​cannot be run directly on the computer after compilation. They must rely on an intermediate platform for interpretation and execution, which is equivalent to being a translator. The disadvantage is that the execution speed is slow. For the Java language, the JVM is the intermediate platform. As long as the corresponding JVM is installed on each platform, you only need to compile and generate a .class file to run on the JVM s of each platform. This can be achieved Cross-platform on various platforms.

StringBuilder is not safe, StringBuffer is safe

Empty construction method

Convenience fonts initialize the parent class. When the super keyword is not shown to specify which construction method needs to be called for the parent class, the empty construction method of the parent class is called by default. If the parent class has no empty construction method, only the parameter construction method will cause the parent class to fail to initialize, and an error will be reported during compilation.

In system performance analysis, CPU, memory and IO are the main concerns

Benchmarks

Use JMH framework, just use maven to import.

In server mode, JIT (Just-In-Time Compiler) will compile a piece of code into native code after 10,000 executions, while in client mode it will be 1,500 times later. We need to eliminate the noise in the early stage of code execution to ensure that the statistical data actually sampled conforms to its stable operating state.

JIT (Just-In-Time) compiler: also known as a dynamic compiler. JIT can compile hot code into machine code at runtime. In this case, part of the hot code is compiled and executed instead of interpreted and executed. Up.

IO

BIO (Blocking IO) and NIO (No-Blocking IO) one is blocking and the other is non-blocking. Both blocking and non-blocking are for threads. Blocking means that the thread will block until the data arrives, and until the data arrives, the thread has been blocked there and can't do anything. Non-blocking means that the thread does not need to wait for the data to arrive, there will be a corresponding mechanism to notify the thread, when the data is ready, the thread will come to work, and wait until the data arrives, the thread can go to its own business.

NIO: Because NIO is actually synchronous non-blocking IO, it is a thread that processes events synchronously. After a group of things are processed by the channel, check whether there is a channel that can be processed. This is synchronous + non-blocking. Synchronization means that the processing of each prepared channel is carried out in sequence, non-blocking, which means that the thread will not wait for a read stupidly. Only when the channel is ready will it proceed. Then there will be such a problem, when each channel is a time-consuming operation, because it is a synchronous operation, there will be a backlog of many channel tasks. Then you need to perform similar load balancing operations on NIO, such as using a thread pool to manage reads and writes, and assign channels to other threads for execution, so as to make full use of each thread without stacking up in one thread. , Waiting for execution.

NiO

ConcurrentHashMap

Java 1.7

java1.7

After Java 1.8, the segment is cancelled, and the synchronization granularity is smaller

java1.8

Concurrent programming

What exactly is a thread

From the perspective of the operating system, it can be simply considered that thread is the smallest unit of system scheduling . A process can contain multiple threads. As the real operator of the task, it has its own stack (Stack), register (Register), and local storage (Thread). Local), etc., but will share file descriptors, virtual address space, etc. with other threads in the process. In the specific implementation, threads are also divided into kernel threads and user threads. Java's thread implementation is actually related to the virtual machine. For the Sun/Oracle JDK we are most familiar with, its threads have also undergone an evolutionary process. Basically after Java 1.2, the JDK has abandoned the so-called Green Thread, which is the user-scheduled thread. The current model is one-to-one mapping. To the operating system kernel thread.

Daemon thread (Daemon Thread), sometimes the application needs a long-resident service program, but you do not want it to affect the application exit, you can set it as a daemon thread, if the JVM finds that only the daemon thread exists, it will end the process .

Thread daemonThread = new Thread();
daemonThread.setDaemon(true);
daemonThread.start();

What happens if a thread calls the start() method twice?

Answer: IllegalThreadStateException will appear, which is a runtime exception.

After Java 5, the thread state is defined in java.lang.Thread.State. They are: new (NEW), ready (RUNNABLE), blocked (BLOCKED), waiting (WAITING), and timed waiting (TIMED_WAIT).

ThreadLocal memory leak.

Each thread has a ThreadLocalMap map whose key is a ThreadLocal reference, which is a WeakReference. Value is the value that needs to be stored in the map.

A a = new A(); // When a = null, the a object will be recycled by the GC
B b = new B(); // When b = null, the b object will be recycled by the GC
Consider this situation:
C c = new C(b); // when c = null
b = null; //When b = null, because the c object still holds a reference to the b object, and c is a strong reference, the b object will not be recycled by gc. In this case, only when c=null will the b object be guaranteed to be recycled. If the b object is WeakReference, then it will be recycled when b=null. WeakReference w = new WeakReference(b);

JVM

Monitor GC

Use jstatcase view of the GC.

Monitoring heap (heap)

The jmap command generates a heap dump file, which is a memory snapshot of the Java process at a certain point in time

Use jmap -heap pidview the process of heap usage

Use jmap -histo pidthe information to see examples of the corresponding class of the Java process

Monitoring thread stack, deadlock

use jstack

Use JMC, JConsole and other monitoring tools

It is not recommended to use JMC in a production environment, use JFR (Java Flight Recorder) to generate .jfr files, and then use jmc for analysis in a development environment.

Spring

AOP

Aspect, logical concept

Join Point, Aspect can cut into the point, in the unit of method

Advice defines the actions that can be taken in the aspect. If you look at the Spring source code, you will find

Pointcut, which is responsible for specifically defining which Join Points the Aspect is applied to, can be achieved by specifying specific class names and method names, or regular expressions can also be used to define conditions. Specify which methods under which classes to perform aop operations

 <aop:pointcut id="p1" expression="execution(* io.simon.aop.InterfaceSchool.*(..))"/>

A pointcut is defined, which indicates that this is a section, and then you need to specify where to cut this section and which method to cut into (Join Point). After the aspect is defined, you need to specify how to cut (advice), cut before and after a certain method, or wrap this method around, and then explain that it needs to be executed when cutting in What action is to perform what method.

Why do I need AOP?

Answer: The implementation of aspect programming to software engineering is actually for better modularization, not just to reduce duplication of code. Through mechanisms such as AOP, we can extract the code that spans multiple different modules, making the module itself more cohesive, and business developers can focus more on the business logic itself. [High cohesion: As far as possible, each member method of the class can only accomplish one thing (maximum aggregation); Low coupling: reduce the internal class, one member method calls another member method. 】

Bean's life cycle

  • The singleton bean always exists in the application context, and will only be destroyed when the application ends.
  • Compared with other scopes, Spring does not manage the complete life cycle of prototype instances. After instantiating, configuring, and assembling objects to the application, Spring no longer manages them. As long as the bean itself does not hold on to another resource (such as a database connection or Session object), as long as all references to the object are deleted or the object is out of scope, garbage will be collected immediately
  • Request: Each client request will create a new bean instance, once the request is over, the instance will leave the scope and be garbage collected
  • Session: If the user ends his session, then this bean instance will be GC

The life cycle of a Bean is completely managed by the container. From property settings to various dependencies, the container is responsible for injecting and handling other issues at various stages. The Spring container defines a clear life cycle communication interface for application developers. .

Safety

Injection attack

SQL injection, operating system command injection, xml injection (xml can contain dynamic content)

Threats caused by program flaws

Use Hash collision to launch a denial of service attack. The attacker can construct a large amount of data with the same hash value in advance, and then send it to the server in the form of JSON data. When the server constructs it into a Java object, it is usually stored in the form of Hastable or HashMap. Hash collisions The hash table is severely degraded.

DoS (Denial of Service) denial of service attack (consuming server resources, making it unable to provide services to normal users)

DoS is a common network attacks, it was also called flooding . The most common manifestation is that a large number of machines are used to send requests, which exhausts the bandwidth or other resources of the target website, causing it to fail to respond to normal user requests. Similar to hash collision attacks, the other party can easily consume the limited CPU and thread resources of the system. From this perspective, computationally intensive tasks such as encryption, decryption, and graphics processing must be prevented from being maliciously abused to prevent attackers from consuming system resources through direct calls or indirect triggers.

DDoS (Distributed Denial of Service) is a type of DoS attack. Combine multiple computers as an attack platform to launch DDoS attacks on one or more targets, thereby exponentially increasing the power of denial of service attacks.

HTTPS protocol

HTTPS = HTTP + SSL (Secure Sockets Layer)) / TLS (Transport Layer Security, TLS is actually SSL, TLS1.0 is actually SSLv3.1, currently the most widely used TLS is 1.2)

In the fifth layer of the OSI model (session layer)

image1

HTTPS complete process

Complete HTTPS protocol request process

Linux

namespace

Cgroup

The full name of Cgroups is Control Groups, which is a physical resource isolation mechanism provided by the Linux kernel. Through this mechanism, resource restriction, isolation, and statistics functions for Linux processes or process groups can be realized. For example, you can limit the resource usage of a specific process through a cgroup, such as using a specific number of cpu cores and a specific size of memory. If the resource exceeds the limit, it will be suspended or killed.

Cgroup was introduced by Google in the 2.6 kernel. It is the technical cornerstone of the Linux kernel to realize resource virtualization. The resource isolation technology used by LXC (Linux Containers) and docker containers is precisely Cgroup.

algorithm

1 billion int (about 4G space) integer number, and a machine with 1GB of available memory, the time complexity requires O(n), count the numbers that only appear once?

Answer: 1. Using bitmaps, one bit identifies an integer. 2. Hash bucket, 3. Divide and conquer, and then sort. Using bucket sorting, 1 billion int is about 4G, divided into m buckets, and each bucket stores K = 1 billion/m <data. Quick row is used in each barrel, and the complexity is: O(K*logK). The time complexity of m buckets is O (m * 1 billion / m * log 1 billion / m). When the number of buckets is relatively large, the smaller the log n / m, the closer the overall time complexity is to O (n)

MySQL database

index

Hash index, not suitable for range query

B + index

Index on field k

mysql> create table T(
id int primary key, 
k int not null, 
name varchar(16),
index (k))engine=InnoDB;

Consistent hash. The database is migrated horizontally, using a consistent hash, and the hash space is a ring, so that every time a machine is added or reduced, only the data of two machines will be affected.

Read-write separation and high availability

The read-write separation is to speed up the access speed, achieve higher concurrent access efficiency, and at the same time achieve load balancing, so that different read requests can be evenly sent to different servers according to the strategy. The content of MySQL master-slave synchronization is the binary log (Binlog). Although it is called the binary log, it actually stores one event after another. These events correspond to database update operations, such as INSERT, UPDATE, and DELETE.

The master-slave structure is for high availability. High availability is for reducing database downtime. When the master node becomes unavailable, the system elects the available node as the master node.

Master-slave synchronization, if the goal is only high concurrency of the database, you can first consider optimization from the aspects of SQL optimization, indexing, and Redis cache database, and then consider whether to adopt the master-slave architecture.

Group replication technology, referred to as MGR (MySQL Group Replication), uses the Paxo distributed consensus algorithm (distributed consensus algorithm, which aims to make multiple nodes reach agreement in a distributed scenario) for synchronization.

Horizontal sub-library sub-table and vertical sub-library sub-table

Vertical split (disassemble database, split table), split the data in one database into two or more databases, the structure of the table will change, which will have a great impact on the business, similar to business splitting to make microservices. The original one complex SQL has also become several simple SQL.

Horizontal split, the same query may be routed to different databases. The data is fragmented, and the database and table operations are performed. The structure of the table is not changed, only the capacity of the database is changed, and the data volume of a single node is reduced. For example: a single table (orderDB database t_order table) with 1 billion records. We divide the single library into 32 libraries orderDB_00…31 according to the user id divided by 32 to take the modulus; then divide the order id by 32 to take the modulus, and each library is divided into 32 tables t_order_00…31. In this way, there are a total of 1024 sub-tables, and the data volume of a single table is only 100,000.

When splitting horizontally, whether to divide the database or the table, how to choose? [A database instance can be regarded as a file. 】Generally, if the read and write pressure of the data itself is high, and the disk IO has become a bottleneck, then the database scoring is better than the scoring table. Sub-database disperses data to different database instances and uses different disks, so that the parallel data processing capabilities of the entire cluster can be improved in parallel. In the opposite case, you can consider splitting tables as much as possible to reduce the amount of data in a single table, thereby reducing the time of single table operations, and at the same time, you can use parallel operations on multiple tables on a single database to increase processing capacity.

index

Create an index when building a table

create table T(
	id int primary key, 
	k int not null, 
	name varchar(16),
	index index_k(k) --在k字段上建立索引,索引名为k。
)engine=InnoDB;

Primary key index, leaf nodes store the entire row of data, try to use auto-incrementing primary key as

Non-primary key index, the leaf node stores the primary key ID

Rebuilding the index can achieve the purpose of saving space, because the index also needs to take up space

For non-primary keys, you can write:

alter table T drop index k;
alter table T add index(k);

For the primary key, you can write like this. It should be noted that it is not reasonable to rebuild the primary key index. Whether it is deleting the primary key or creating the primary key, the entire table will be rebuilt.

alter table T drop primary key;
alter table T add primary key(id);

The use of the index: 1. Covering the index, which field needs to be queried on which field to query. 2. The leftmost matching principle is to establish a joint index KEY name_age( name, age) , then the index can be used for queries on the column a, and the indexes cannot be used for queries on the b column. For MySQL, for example, when a and b are indexed, the index is first created with a. At this time, b is unordered, and then the index of b is created on the subtree of a after the establishment of a, so for the entire For b+ tree, a must be in order, but b is not necessarily in order. 3. Index pushdown, MySQL 5.6 introduces index condition pushdown optimization (index condition pushdown), you can make judgments on the fields contained in the index during the index traversal process, and directly filter out the records that do not meet the conditions, reducing the number of return to the table .

Create a unique index

CREATE TABLE `t1` ( 
    `id` int(11) NOT NULL, 
    `a` int(11) DEFAULT NULL, 
    `b` int(11) DEFAULT NULL, 
    PRIMARY KEY (`id`), 
    UNIQUE KEY `a` (`a`)
) ENGINE=InnoDB;

Distributed transaction

Strongly consistent use of XA (distributed transaction protocol), with RA (Transaction Manager) and RM (Resource Manger) mainstream open source XA distributed transaction solutions: Atomikos, narayana, seata (Alibaba)

BASE flexible transaction (feeling very similar to UDP, reliability is handed over to the application layer for processing), 1. Basically available (participants in distributed transactions may not be online at the same time), 2. Soft state (the system state update has a certain delay), 3. Eventual consistency. BASE flexible transaction TCC (try- confirm -cancel). [TCC mode divides each service business operation into two phases. The first phase checks and reserves related resources. The second phase operates according to the Try status of all service businesses. If all are successful, the Confirm operation is performed. If any If an error occurs in a Try, all Cancel] BASE flexible transaction. For ACID, the atomicity of A during execution can be guaranteed under normal circumstances. Consistency and isolation cannot be guaranteed at a certain point in time. Durability is the same as a local transaction. Can be guaranteed.

There are RA, RM, and TC (transaction coordinator) in seata. TC is an independently deployed service . TM and RM are deployed together with business applications in the form of jar packages. They establish a long connection with TC and maintain remote communication during the entire transaction life cycle. TM is the initiator of the global transaction, responsible for the opening, submission and rollback of the global transaction. RM is a participant in the global transaction , responsible for reporting the execution results of branch transactions, and committing and rolling back branch transactions through the coordination of TC.

computer network

  • The role of the ISN in the handshake message, confirm the starting sequence number, the sequence number cannot be guessed, and resist replay attacks
  • Common ports and corresponding protocols. There are 65536 ports (0~65535).
    The HTTP protocol corresponds to port 80. The FTP protocol corresponds to port 21. HTTPS/SSL corresponds to port 443. SSH remote connection corresponds to port 22. Port 110 and 125 are used for sending mail

Principles of Computer Organization

  • File system: The file system divides the hard disk space in units of blocks, each file occupies several blocks, and then a file control block FCB records the hard disk data block occupied by each file. RAID, Redundant Array of Disks, uses the characteristics of the file system to write data into different data blocks in the hard disk, and treats the free space on multiple hard disks as a whole to improve disk access speed. Hadoop distributed file system HDFS, single server can be realized through RAID, and multiple servers can form a file system cluster to provide external file services.

Microservice

Paxos algorithm. Distributed consensus algorithm. Consensus is that each distributed node to reach an agreement for a value

Once more than half of the nodes in the system have completed the state transition, it can be considered that the data changes have been correctly stored in the system. In this way, a small number (usually no more than half) of nodes can be tolerated, so that increasing the number of machines can be used to improve the overall availability of the system. In distributed, this idea is called Quorum mechanism.

Paxos algorithm is divided into nodes in a distributed system 提案节点, 决策节点and 记录节点three

In a distributed scenario, we need to make the number of nodes an odd number, so that when making a decision, we can ensure that the decision is made as much as possible.

RPC, Remote Procedure Call

Redis

Application scenarios of Redis

Cache avalanche problem

Nginx

The role and application scenarios of Nginx

Insert picture description here

Advantages of Nginx
Apache server is a process connected to a multi-core CPU is not friendly, in the case of a single CPU's Moore's Law has failed, the current development is toward the direction of multi-core CPU, but Apache is not born for multi-core The CPU scenario was born. On a multi-core CPU server, the performance of the Apache server is far inferior to the Nginx server.

Insert picture description here

Comprehensive problem

Given a set containing 4 billion unique integers in the interval [0, 2^32-1], how to quickly determine whether a certain number is in the set?

Answer: The amount of data is large. Use bitmap bitmap, one bit represents a number, which saves space. For specific content, please refer to this link, https://www.jianshu.com/p/b09bb3e7652e