oracle database: remove duplicate record rowid

Introduction to rowid

rowidIs oraclean important concept in. A relatively unique address value used to locate a record in the database. Normally, this value is determined and unique when the row of data is inserted into the database table. rowidIt is a pseudo column, it does not actually exist in the table. It is oraclea pseudo column encoded according to the physical address information of each row of data when reading data rows in the table. Therefore rowid, the physical address information of a row of data can be found according to a row of data , so as to quickly locate the data row. Most of the operations of the database are done through rowid, and it rowidis the fastest to use for single-line record positioning.

  • Sometimes when complex data retrieval, ordinary retrieval conditions can not meet the requirements, can be used rowidto accurately retrieve the results.

oracleIf you want to query multiple fields in a table, and only use the distinct keyword when deduplicating a field, or it wo group byn’t work, [ distinctand group bywill deduplicate the fields to be queried together, that is, when all the fields in the query are If they are the same, oraclethey will be considered as duplicates], then it can be used rowid.

Finding duplicate records

Problem scenario: When there are many duplicate records in our table, we need to find out the duplicate records.
Implementation steps:

  1. Group by duplicate content
  2. Take out a record of each group and keep it [note that it is unique]
  3. Delete data that is not in the reserved range

Prepare data: test table

--创建test表
create table test(
       name varchar(20),
       age number(3),
       sex varchar(10),
       weight number(3)
)

Four fields are defined above, namely name, age, gender, and weight. Next, add data to it.

--添加数据
insert into test values('貂蝉',22,'女',96);
insert into test values('小乔',24,'女',95);
insert into test values('孙尚香',21,'女',92);
insert into test values('露娜',20,'女',94);
insert into test values('女娲',26,'女',97);
insert into test values('甄姬',23,'女',91);
insert into test values('安琪拉',22,'女',98);
insert into test values('貂蝉',22,'女',96);
insert into test values('小乔',24,'女',95);
insert into test values('孙尚香',21,'女',92);
insert into test values('露娜',20,'女',94);
insert into test values('女娲',26,'女',97);
insert into test values('甄姬',23,'女',91);
insert into test values('安琪拉',22,'女',98);

Here deliberately add some duplicate data, which will be used next, now take a look at the content in the table: you can see that the content

Insert picture description here


of the duplicate data in the table is exactly the same, let’s take a lookrowid

--查看rowid
select name,age,sex,weight,rowid from test order by age desc;

In order to facilitate the comparison of whether the same content rowidis the same, sorting is used.

Insert picture description here


It can be seen that although the content is the same, it rowidis different.
Now to complete an operation: keep only one copy of all the same records

  1. Group data into groups according to repeated information
  2. Select a record in each group to save, and retrieve the characteristics:rowid
  3. When deleting, find the data that is not in the reserved range, and delete the data that is no longer in the range.
--将数据进行分组,按照重复信息分组
select name,sex,age,weight from test group by name,sex,age,weight;

Group the data and find out all the information here:

Insert picture description here


There are 7 groups of data. Now we need to take one piece of data for each group of data. Because the contents of other fields are the same, we can only take it here rowid.

--找到每组最小的rowid,也可以取最大的
select min(rowid) from test group by name,sex,age,weight;

Here is one for each group rowid, as shown in the following figure:

Insert picture description here


Next, you only need to rowiddelete the rows that are not here , and first rowidfind out the rows that are not in it:

--找到要删除的数据
select * from test where rowid not in (select min(rowid) from test group by name,sex,age,weight);

Here we take the result set just now as the query target, and use the not infunction to find the one to be deleted rowid.

Insert picture description here


It’s not shown here rowid, all can’t be seen, then just delete the data:

delete from test where rowid not in (select min(rowid) from test group by name,sex,age,weight);
commit;

After executing the deletion, submit the transaction, and then look up all the data in the table:

Insert picture description here


you can see that there is only one duplicate data, which is rowidthe usage of removing duplicate records.