MySQL 8.0.22 executor source code analysis HashJoin-the detailed steps of some initialization functions

table of Contents

Let us continue with the previous step to analyze the details of the BuildHashTable function , these functions are all in the hash_join_iterator.ccfile.

InitRowBuffer (line 101~126)

Two steps to pay attention to

1. Initialize the row buffer and use a seed. I don’t know the specific hash calculation algorithm specifically, so I won’t go into it here.

kHashTableSeed is a meaningless number, it is the key used to calculate the hash. A seed is hashed in the memory hash table, and a seed calculation is used to determine the row where the chunk file should be placed. If two operations use the same seed, the block file will be loaded into the hash table, which is wrong.

2. After initializing the row buffer, you need to adjust the direction of the iterator to point its head and tail to the end of the row buffer.

if (m_row_buffer.Init(kHashTableSeed)) {
     DBUG_ASSERT(thd()->is_error());  // my_error should have been called.
     return true;

m_hash_map_iterator = m_row_buffer.end();
m_hash_map_end = m_row_buffer.end();
return false;

InitProbeIterator (line 142 to line 153)

1. Initialize the m_probe_input iterator (for a RowIterator)

2. Determine whether m_probe_input_batch_mode is true (the default is false, which means that the batch mode is not turned on), and if it is true, call StartPSIBatchMode

  if (m_probe_input->Init()) {
    return true;

  if (m_probe_input_batch_mode) {
  return false;

Init of HashJoinIterator (line 155~240)

1. Ready to read row data from the input of build (driver table) to the hash table for building the hash table


Function call stack:


When it comes to using the primary key to find rows, you must use the primary key field extension to read the bitmap

Then initialize the iterator of the build (as a RowIterator)

2. Initialize some messy variables

m_hash_join_type = HashJoinType::IN_MEMORY;		
//不需要把被驱动表读出来的行数据写到saving files中。因为只有当哈希连接生成块不能完全放到内存中 或者 哈希连接不能延伸到磁盘时,即probe输入行需要多次Read才需要开启probe行保存
m_write_to_probe_row_saving = false;
//很显然此时build迭代器读取的buffer中是有行数据的,当build input数据被消耗掉,停止hash join 迭代器请求更多行
m_build_iterator_has_more_rows = true;
//如过外连接溢出到磁盘上操作,那么probe行可以和我们未看到的build input中的row匹配,若匹配为true。这里默认是内存里操作,所以初始化为false
m_probe_row_match_flag = false;

3. Calculate the memory size occupied by build row and probe row

Calculate how much byte a given form row data needs to occupy, and record the upper bound, so that the data length of pack will always be shorter than this length afterwards.

upper_row_size is the maximum value of build row and probe row.

  size_t upper_row_size = 0;
  if (!m_build_input_tables.has_blob_column()) {
    upper_row_size =

  if (!m_probe_input_tables.has_blob_column()) {
    upper_row_size = std::max(

4. An unintelligible operation, reverse a string type variable

if (m_temporary_row_and_join_key_buffer.reserve(upper_row_size)) {
    my_error(ER_OUTOFMEMORY, MYF(0), upper_row_size);
    return true;  // oom

5. If a table contains a geometry column, make sure that the data is copied to the row buffer instead of just setting a pointer to the data. Because when the hash join overflows to the disk, a row needs to be read back from the block file, and the row data is stored in a temporary buffer. When the temporary buffer is used for other purposes, the data pointed to by the field becomes invalid.


6. Chunk array, flag clear operation

//build 和 probe 当前从hash join chunk中读取的是第0行
m_build_chunk_current_row = 0;
m_probe_chunk_current_row = 0;
//现在不使用任何一种hash join chunk
m_current_chunk = -1;

7. For each given table, if necessary, request to fill in the row ID (equivalent to calling file->position())


8. Build a hash table

// Build the hash table
if (BuildHashTable()) {
    DBUG_ASSERT(thd()->is_error() ||
                thd()->killed);  // my_error should have been called.
    return true;

9. When there is no row data in build and probe, return.

  if (m_state == State::END_OF_ROWS) {
    // BuildHashTable() decided that the join is done (the build input is
    // empty, and we are in an inner-/semijoin. Anti-/outer join must output
    // NULL-complemented rows from the probe input).
    return false;

10. If it is an anti-join and the join case array is empty and there is no other case, there is still data in the row buffer at this time. Explain that there is no need to output anything at this time, because all the data is in the hash table.

  if (m_join_type == JoinType::ANTI && m_join_conditions.empty() &&
      m_extra_condition == nullptr && !m_row_buffer.empty()) {
    // For degenerate antijoins, we know we will never output anything
    // if there's anything in the hash table, so we can end right away.
    // (We also don't need to read more than one row, but
    // CreateHashJoinAccessPath() has already added a LIMIT 1 for us
    // in this case.)
    m_state = State::END_OF_ROWS;
    return false;

11. Initialize the probe iterator

return InitProbeIterator();

InitializeChunkFiles (line 364 to line 401)

Initialize the hashjoinchunk of the two inputs.

First estimate how many blocks are needed. One source is estimated by the planner. In addition, it can be assumed that the current row buffer represents the overall row density, and the estimated number of remaining rows/the number of rows currently read can get the number of chunks.

1. Rows in the reduced hash table = reduction factor * number of rows in the hash table This is a protective measure, because we would rather get one or two additional blocks instead of re-reading the probe input multiple times.

constexpr double kReductionFactor = 0.9;
const size_t reduced_rows_in_hash_table =
    std::max<size_t>(1, rows_in_hash_table * kReductionFactor);

2.The number of remaining rows = max (the number of rows in the hash table, the number of join rows estimated by the planner)-the number of rows in the hash table

const size_t remaining_rows =
      std::max(rows_in_hash_table, estimated_rows_produced_by_join) -

3. The number of chunks needed = max (the number of remaining rows / the number of rows in the reduced hash table)

 const size_t chunks_needed = std::max<size_t>(
      1, std::ceil(remaining_rows / reduced_rows_in_hash_table));

4. The real number of chunks = min (the maximum number of chunk files, the number of chunks needed) Limit the number of blocks for each input, so that you don't risk reaching the server's limit on the number of open files.

const size_t num_chunks = std::min(max_chunk_files, chunks_needed);

5. Make sure that the number of chunks is an even number, because when we join, we join according to a pair of chunks of probe and build

  const size_t num_chunks_pow_2 = my_round_up_to_next_power(num_chunks);

6. Adjust the number of chunks to an even number; then initialize each pair of chunks, one is buildchunk and the other is probechunk

  for (ChunkPair &chunk_pair : *chunk_pairs) {
    if (chunk_pair.build_chunk.Init(build_tables, /*uses_match_flags=*/false) ||
                                    include_match_flag_for_probe)) {
      my_error(ER_TEMP_FILE_WRITE_FAILURE, MYF(0));
      return true;

Explanation for the two parameters of Init:

tables – 行数据存储在哪个input
uses_match_flags – 是否应在每行前面加上匹配标志,表示该行是否有匹配行。


Mark that probe row saving is enabled and prepare probe row saving file for writing

m_write_to_probe_row_saving = true;
  return m_probe_row_saving_write_file.Init(m_probe_input_tables,
                                            m_join_type == JoinType::OUTER);

As for the HashJoinChunk::Init function, we will not go into details, it involves IOcache operations.


Mark us to read data from probe row saving files. Then rewind the saving files to the beginning

m_probe_row_saving_read_file = std::move(m_probe_row_saving_write_file);
m_probe_row_saving_read_file_current_row = 0;
m_read_from_probe_row_saving = true;
return m_probe_row_saving_read_file.Rewind();

1. Pass the contents of the write file to the read file, and use move at the same time to avoid copy assignment and only change the pointer to point to

2. Initialize the number of rows that should be read to 0, indicating that the reading has not yet started

3. Determine that we should read data from probe row saving read files

4. The Rewind function clears the file buffer to make room for new data to be read