The most detailed Hadoop article series on the whole network, it is strongly recommended to collect and pay attention!
Later updated articles will list historical article directories to help you review the key points of knowledge.
table of Contents
This series of historical articles
The most detailed big data notes on the entire network in 2021 will easily take you from entry to proficiency. This column is updated daily to summarize knowledge sharing
Detailed explanation of the operating mechanism of MapReduce
1. MapTask working mechanism
A brief overview: inputFile is logically divided into multiple split files through split, and the content is read line by line through Record to map (implemented by the user) for processing. After the data is processed by the map, it is delivered to the OutputCollector collector and the result is key Partition (the default is hash partition), and then write to the buffer. Each map task has a memory buffer that stores the output results of the map. When the buffer is almost full, the data in the buffer needs to be converted to a temporary file. When the entire map task is finished, all the temporary files generated by the map task in the disk are merged to generate the final official output file, and then wait for the reduce task to pull the data.
1. First, the read data component InputFormat (default TextInputFormat) will use the getSplits method to logically slice the files in the input directory to obtain splits, and start as many MapTasks as there are splits. The correspondence between split and block is one-to-one by default.
2. After the input file is divided into splits, it is read by the RecordReader object (the default LineRecordReader), with \n as the separator, read a line of data, and return <key, value>. Key represents the offset value of the first character of each line, and value represents the text content of this line.
3. Read split and return <key, value>, enter the Mapper class inherited by the user, and execute the map function rewritten by the user. The RecordReader is called once here to read a line.
4. After the map logic is completed, collect data for each result of the map through context.write. In collect, it will be partitioned first, and HashPartitioner is used by default.
MapReduce provides the Partitioner interface. Its function is to determine which Reduce task the current pair of output data should ultimately be processed by according to the number of Key or Value and Reducers. By default, the Key Hash is then modulo the number of Reducers. The default modulus is The method is just to average the processing power of the Reducer. If the user has a demand for the Partitioner, it can be customized and set on the Job
5. Next, the data will be written to the memory. This area in the memory is called a ring buffer. The function of the buffer is to collect Mapper results in batches and reduce the impact of disk IO. Our Key/Value pairs and Partition results will be Write to the buffer. Of course, before writing, the Key and Value values will be serialized into byte arrays
The ring buffer is actually an array. The array stores the serialized data of Key and Value and the metadata information of Key and Value, including Partition, the starting position of Key, the starting position of Value, and the length of Value. The ring structure is An abstract concept
The buffer has a size limit, the default is 100MB. When the output of Mapper is too much, it may burst the memory, so it is necessary to temporarily write the data in the buffer to disk under certain conditions, and then reuse this buffer This process of writing data from memory to disk is called Spill, which can be translated as overflow writing in Chinese. This overflow writing is done by a separate thread and does not affect the thread that writes the Mapper result to the buffer. The overflow writing thread should not be started when it is started. Prevents the output of Mapper's results, so the entire buffer has an overflow ratio spill.percent. This ratio is 0.8 by default, that is, when the data in the buffer has reached the threshold buffer size * spill percent = 100MB * 0.8 = 80MB, overflow The thread is started, the 80MB of memory is locked, and the overflow process is executed. The output of Mapper can also be written in the remaining 20MB of memory, without affecting each other
6. When the overflow thread is started, the keys in the 80MB space need to be sorted (Sort) . Sorting is the default behavior of the MapReduce model, and the sorting here is also the sorting of the serialized bytes
If the Job has set the Combiner, then it is time to use the Combiner. Add up the values of the Key/Value pairs that have the same Key to reduce the amount of data overflowed to disk. Combiner will optimize the intermediate results of MapReduce, so it is in the entire Will be used multiple times in the model
7. Merge overwrite files, each time overwrite will generate a temporary file on the disk (judge whether there is a Combiner before writing), if the output result of the Mapper is really large, there are multiple such overflow writes, and the corresponding on the disk There will be multiple temporary files. When the entire data processing is over, the temporary files in the disk will be merged, because there is only one final file, written to the disk, and an index file is provided for this file to record The offset of the corresponding data for each reduce
Set the size of the memory value of the ring buffer
Set the proportion of overwrite
Overwrite data directory
Set how many overflow files to merge at once
Two, ReduceTask working mechanism
Reduce is roughly divided into three stages: copy, sort, and reduce, with emphasis on the first two stages. The copy phase includes an eventFetcher to obtain the completed map list, and the Fetcher thread copies the data. During this process, two merge threads are started, namely inMemoryMerger and onDiskMerger, to merge the data in the memory to the disk and the disk respectively. The data is merged. After the data copy is completed, the copy phase is completed, and the sort phase is started. The sort phase is mainly to perform the finalMerge operation, the pure sort phase, and after the completion is the reduce phase, the user-defined reduce function is called for processing.
1. In the Copy phase, simply pull the data. The Reduce process starts some data copy threads (Fetcher) and requests maptask to obtain its own files through HTTP.
2. In the Merge stage, the merge here is like the merge action on the map side, except that the values copied from different map sides are stored in the array. The copied data will be put into the memory buffer first, where the buffer size is more flexible than the map side. There are three forms of merge: memory to memory; memory to disk; and disk to disk. The first form is not enabled by default. When the amount of data in the memory reaches a certain threshold, the merge from the memory to the disk is started. Similar to the map side, this is also the process of overwriting, and then a large number of overwriting files are generated on the disk. The second merge method has been running until there is no data on the map side, and then the third disk-to-disk merge method is started to generate the final file.
3. Merge and sort, after merging scattered data into one big data, the merged data will be sorted again.
4. Call the reduce method on the sorted key-value pairs. The reduce method is called once for the key-value pairs with the same key. Each call will generate zero or more key-value pairs, and finally these output key-value pairs are written to HDFS File.
Three, M apReduce shuffle process
How the data processed in the map phase is passed to the reduce phase is the most critical process in the MapReduce framework. This process is called shuffle.
shuffle: shuffle, deal with cards-(core mechanism: data partitioning, sorting, Combiner, grouping and other processes).
Shuffle is the core of Mapreduce, which is distributed in the map phase and reduce phase of Mapreduce. Generally, the process from the output of Map to the process before Reduce gets data as input is called shuffle.
1. Collect stage : Output the results of MapTask to a ring buffer with a default size of 100M, and save key/value, Partition information, etc.
2. Spill stage: When the amount of data in the memory reaches a certain threshold, the data will be written to the local disk. Before the data is written to the disk, the data needs to be sorted. If the combiner is configured, The data with the same partition number and key will be sorted.
3. Merge stage: Perform a merge operation on all overflowed temporary files to ensure that only one intermediate data file is ultimately generated by a MapTask.
4. Copy stage: ReduceTask starts the Fetcher thread to copy a copy of its own data to the node that has completed MapTask. These data will be saved in the memory buffer by default. When the memory buffer reaches a certain threshold, it will The data will be written to the disk.
5. Merge stage : While the ReduceTask is copying data remotely, two threads will be started in the background to merge the data files from the memory to the local.
6. Sort stage : While merging the data, it will perform a sorting operation. Since the MapTask stage has already sorted the data locally, the ReduceTask only needs to ensure the final overall validity of the Copy data.
The size of the buffer in Shuffle will affect the execution efficiency of the mapreduce program. In principle, the larger the buffer, the fewer the number of disk io, and the faster the execution speed.
The size of the buffer can be adjusted by parameters, parameter: mapreduce.task.io.sort.mb default 100M
The big data series of articles in this blog will be updated every day, remember to collect and pay attention~