MySQL 8.0.22 executor source code analysis HashJoin-BuildHashTable function detailed steps

BuildHashTable function detailed steps

The function position is in hash_join_iterator.ccline 403 ~ 560

step1 : If the driven table iterator has no more rows, update m_state to EOR, and then return false, indicating that the hash table creation failed

if (!m_build_iterator_has_more_rows) {
    m_state = State::END_OF_ROWS;
    return false;
}

**step2: **Restore the last line inserted into the line buffer. If the build input is a nested loop with a filter inside, then this is required. I don't understand very much here

if (m_row_buffer.Initialized() &&
    m_row_buffer.LastRowStored() != m_row_buffer.end()) {
    hash_join_buffer::LoadIntoTableBuffers(
        m_build_input_tables, m_row_buffer.LastRowStored()->second);
}

step3 : Clear the row buffer and re-point it with multiple iterators. If the initialization is successful, return true directly.

if (InitRowBuffer()) {
    return true;
}

step4 : initialized two variables

reject_duplicate_keyswithstore_rows_with_null_in_join_key

const bool reject_duplicate_keys = RejectDuplicateKeys();
const bool store_rows_with_null_in_join_key = m_join_type == JoinType::OUTER;

RejectDuplicateKeys()If the function returns true, it means that duplicate keys in the hash table are rejected. When encountering a semi-join or anti-join with the same key value, only one result needs to be returned, and no extra case is required.

For anti-connection and semi-connection, please refer to: semi-connection & anti-connection

Indicates that the current jointype is an outer join JoinType ::OUTER

step5 : SetNullRowFlagClear the input of the driven table . This is to prevent init from being called multiple times when hashjoin is used for independent subqueries, otherwise this flag will be contaminated by the previously executed hashjoin operation.

m_build_input->SetNullRowFlag(/*is_null_row=*/false);

step6 : Start reading data from m_build_input loop through the iterator,

1. If the thread is killed, return true.

2. When the build input is empty, the results of internal connection and semi-connection will also be empty, but the output of anti-connection will be all rows of probe input

3. When the last line of the build iterator is read, it means that we will no longer read data in the probe iterator. At this time, we need to prohibit probe row from saving data

PFSBatchMode batch_mode(m_build_input.get());
for (;;) {  // Termination condition within loop.
    int res = m_build_input->Read();
    if (res == 1) {
      DBUG_ASSERT(thd()->is_error() ||
                  thd()->killed);  // my_error should have been called.
      return true;
    }

    if (res == -1) {
      m_build_iterator_has_more_rows = false;
      // If the build input was empty, the result of inner joins and semijoins
      // will also be empty. However, if the build input was empty, the output
      // of antijoins will be all the rows from the probe input.
      if (m_row_buffer.empty() && m_join_type != JoinType::ANTI &&
          m_join_type != JoinType::OUTER) {
        m_state = State::END_OF_ROWS;
        return false;
      }

      // As we managed to read to the end of the build iterator, this is the
      // last time we will read from the probe iterator. Thus, we can disable
      // probe row saving again (it was enabled if the hash table ran out of
      // memory _and_ we were not allowed to spill to disk).
      m_write_to_probe_row_saving = false;
      SetReadingProbeRowState();
      return false;
    }

step7

1. Request the row IDs of all tables

2. Store the row currently in the table record buffer and put it in store_row_result

3. Process according to the state of store_row_result

If it is *ROW_STORED*, it means that it has been stored, and break directly

 case hash_join_buffer::StoreRowResult::ROW_STORED:
        break;

If it is BUFFER_FULL, the buffer area is full.

If allowed, operate to the disk. If operations to the disk are not allowed, continue to read data from the probe iterator and enable probe row saving, so that no matching probe rows will be written to the saving file. The next time you refill the hash table, read the probe row from the saving file.

if (!m_allow_spill_to_disk) {
    if (m_join_type != JoinType::INNER) {
        // Enable probe row saving, so that unmatched probe rows are written
        // to the probe row saving file. After the next refill of the hash
        // table, we will read rows from the probe row saving file, ensuring
        // that we only read unmatched probe rows.
        InitWritingToProbeRowSavingFile();
    }
    SetReadingProbeRowState();
    return false;
}
// If we are not allowed to spill to disk, just go on to reading from
        // the probe iterator.
if (!m_allow_spill_to_disk) {
    if (m_join_type != JoinType::INNER) {
        // Enable probe row saving, so that unmatched probe rows are written
        // to the probe row saving file. After the next refill of the hash
        // table, we will read rows from the probe row saving file, ensuring
        // that we only read unmatched probe rows.
        InitWritingToProbeRowSavingFile();
    }
    SetReadingProbeRowState();
    return false;
}

Initialize the hashjoinchunk of two inputs. To estimate how many chunks are needed, the planner will give a number in advance, and here will recalculate the appropriate disk block for each block.

if (InitializeChunkFiles(
    m_estimated_build_rows, m_row_buffer.size(), kMaxChunks,
    m_probe_input_tables, m_build_input_tables,
    /*include_match_flag_for_probe=*/m_join_type == JoinType::OUTER,
    &m_chunk_files_on_disk)) {
    DBUG_ASSERT(thd()->is_error());  // my_error should have been called.
    return true;
}

Write the remaining data on the iterator to the chunk file on the disk. If an IO error occurs, return true

if (WriteRowsToChunks(thd(), m_build_input.get(), m_build_input_tables,
                      m_join_conditions, kChunkPartitioningHashSeed,
                      &m_chunk_files_on_disk,
                      true /* write_to_build_chunks */,
                      false /* write_rows_with_null_in_join_key */,
                      m_tables_to_get_rowid_for,
                      &m_temporary_row_and_join_key_buffer)) {
    DBUG_ASSERT(thd()->is_error() ||
                thd()->killed);  // my_error should have been called.
    return true;
}

Refresh and locate all chunk files from the beginning of the build input.

for (ChunkPair &chunk_pair : m_chunk_files_on_disk) {
    if (chunk_pair.build_chunk.Rewind()) {
        DBUG_ASSERT(thd()->is_error() ||
                    thd()->killed);  // my_error should have been called.
        return true;
    }
}
SetReadingProbeRowState();
return false;
}

If the status is FATAL_ERROR, an unexpected error has occurred, which may be a malloc failure. Return true.

case hash_join_buffer::StoreRowResult::FATAL_ERROR:
        // An unrecoverable error. Most likely, malloc failed, so report OOM.
        // Note that we cannot say for sure how much memory we tried to allocate
        // when failing, so just report 'join_buffer_size' as the amount of
        // memory we tried to allocate.
        my_error(ER_OUTOFMEMORY, MYF(ME_FATALERROR),
                 thd()->variables.join_buff_size);
        return true;
    }