View the generation of deadtuple dead data from the perspective of concurrency control (MVCC)

Author: Highgo PG Lab-Chrisx

View the generation of deadtuple dead data from the perspective of concurrency control (MVCC)

Article Directory

tuple structure

The data structure HeapTupleHeaderData is the core data structure for multi-version concurrency control

t_xmint_xmaxt_cidt_ctidt_infomask2t_infomaskt_hoffnull_bitmapuser_data

Although the [HeapTupleHeaderData] structure contains 7 elements, only 4 of them are involved in this article.

  • t_xmin records the transaction ID (txid) inserted into this tuple.
  • t_xmax records the transaction ID (txid) for deleting or updating this tuple. If this tuple has not been deleted or updated, t_xmax is set to 0, which means INVALID.
  • t_cid records the command ID (command id, cid), which increases from 0, indicating how many SQL commands are executed before this command is executed in the current transaction. For example, suppose we execute three INSERT commands in a single transaction: BEGIN; INSERT; INSERT; INSER; COMMIT;. If the first command inserts this tuple, t_cid is set to 0, if the second command inserts this tuple, t_cid is set to 1, and so on.
  • t_ctid records the tuple identifier (tid) that points to itself or a new tuple. tid is used to identify tuples in the table. When this tuple is updated, the t_ctid of this tuple points to the new tuple; otherwise, t_ctid points to itself.

Tuple addition, deletion, modification and dead tuple generation

1. Insert

During the insert operation, the new tuple will be inserted directly into the target table page. The xmin field is stored as the XID of the transaction, and xmax is 0. After the transaction is committed, all transactions whose XID is greater than or equal to the XID stored in xmin can see this record. This fully meets the requirements of the read commit transaction isolation level.

begin;
insert into test_con values (1,'A');
commit;

PostgreSQL provides an extension pageinspect, used to display the content of the page page

CREATE EXTENSION pageinspect;
create table test_con(id int,name text);
insert into test_con values (1,'A');
test=# SELECT lp as tuple, t_xmin, t_xmax, t_field3 as t_cid, t_ctid FROM heap_page_items(get_raw_page('test.test_con', 0));
tuple | t_xmin | t_xmax | t_cid | t_ctid
-------+--------+--------+-------+--------
1 | 594 | 0 | 0 | (0,1)
(1 row)
  • t_xmin is set to 594, indicating that this piece of data was inserted by transaction 594
  • t_xmax is set to 0, the transaction id is reserved, and it is invalid. Indicates that this row of data has not been updated or deleted
  • t_cid is set to 0, indicating that this row of data is the first row of data inserted by transaction 594
  • t_ctid is set to (0, 1), pointing to itself, no new version is generated

⚠️ Note: page structure is not discussed in this article, refer to architecture-physical structure

2. delete

If the record is deleted, in PostgreSQL, this record will not be deleted for the time being, but an identification will be made on this record. PostgreSQL's approach is to set the xmax of the record to the XID of the transaction that deleted this record. In this way, the XID of all transactions after the record is deleted is greater than the value of xmax, so the query initiated after the deletion cannot read this record; and for the query initiated before deleting this record, because XID is less than xmax, Therefore,
this record can still be read. This solves the problem of MVCC transaction isolation and consistent reading.

begin;
delete from test_con where id=1;
commit;

You can view the content of the page by extending pageinspect

test=# SELECT lp as tuple, t_xmin, t_xmax, t_field3 as t_cid, t_ctid FROM heap_page_items(get_raw_page('test.test_con', 0));
tuple | t_xmin | t_xmax | t_cid | t_ctid
-------+--------+--------+-------+--------
1 | 594 | 595 | 0 | (0,1)
(1 row)
  • t_xmax is set to 595, indicating that this row of data is updated or deleted by transaction 595
  • If the transaction operation is committed, then this row of data tuple_1 is no longer needed and will be marked as dead tuple.

3. update

If the record is updated (update), then PostgreSQL will not directly modify the original record, but will generate a new record. The xmin of the new record is the XID of the update operation, and xmax is 0. At the same time, the old record will be changed. xmax is set to the XID of the current operation, which means that the xmin of the new record is the same as the xmax of the old record. In this way, there will be multiple copies of the same record in the same table.

test=> insert into test_con values (1,'A');
INSERT 0 1
test=> update test_con set name='B' where id=1; UPDATE 1
test=> update test_con set name='C' where id=1; UPDATE 1
test=>

You can view the content of the page by extending pageinspect

test=# SELECT lp as tuple, t_xmin, t_xmax, t_field3 as t_cid, t_ctid FROM heap_page_items(get_raw_page('test.test_con', 0));
tuple | t_xmin | t_xmax | t_cid | t_ctid
-------+--------+--------+-------+--------
1 | 599 | 600 | 0 | (0,2)
2 | 600 | 601 | 0 | (0,3)
3 | 601 | 0 | 0 | (0,3)
(3 rows)

tuple_1
t_xmax is set to 600, modified by transaction 600,
t_ctid is set to (0, 2), no longer pointing to itself, pointing to the second version tuple_2

tuple_2
t_xmax is set to 601, modified by transaction 601
t_ctid is set to (0, 3), no longer points to itself, points to the third version tuple_3

tuple_3
t_xmax is set to 0, has not been modified
t_ctid is set to (0, 3), pointing to itself

If the transaction operation is committed, then the data tuple_1 and tuple_2 are no longer needed and will be marked as dead tuple.

Judging from the working principle of the MVCC mechanism, there are not too many problems with the INSERT operation. The INSERT operation of PostgreSQL is very similar to the working principle of other databases, except that the row header size of PostgreSQL is 20 bytes, which is much larger than the 3 bytes of Oracle. The storage overhead of PostgreSQL is slightly larger than that of Oracle. From the perspective of the UPDATE operation, no matter how many fields in the UPDATE, PostgreSQL needs to insert a new record, which will cause the increase of the high water mark of SEGMENT. If the data of a table needs to be updated multiple times after the data is inserted, then the high of this table The water level will skyrocket. To solve this problem, PostgreSQL uses a version recovery mechanism-VACUUM. Through VACUUM, PostgreSQL can recycle the old version, thereby avoiding performance problems caused by multiple versions.