Several SQL writing methods for judging time continuity in Oracle data warehouse

Zero, demand introduction

The existing table data is as follows:

Insert picture description here

This table is a mirror table. The policyno column represents a policy number, and the state column represents the last state of the policy number on snapdate (state may change many times a day, and the mirror table only retains the last state in the early morning of snapdate) ,
Snapdate represents the time for mirroring that day. Now there is a demand. We want to take out the start and end time of the policy number to maintain a certain state, for example: the start and end time of the
policy number sm1 to maintain state 1 is 20210201 20210202, and then change at 20210203 It became state 2, and it became state 3 at 20210204, and finally remained in state 1 during the time period of 20210205 20210209.
Then the mirroring table program may have problems during the period. From 20210210 to 20210215, no mirroring was successful until 20210216 The status of the policy number sm1 from 20210216 to 20210219 remains at 1,
and it may continue to change in the future. Then, for the continuous time of the several statuses of the policy sm1 mentioned above, the result we want is:

POLICYNO	STATE	START_DATE	END_DATE
sm1		1	20210201	20210202
sm1		2	20210203	20210203
sm1		3	20210204	20210204
sm1		1	20210205	20210209
sm1     1      20210216       20210219
.........................

I provide 5 ways of writing here, which can be summarized into two categories:
One category: obtain data continuity by using analysis functions or self-association, construct a grouping field to group to find the maximum and minimum values.
Type 2: Get continuity through tree-level query, and get start and end time.

1. Obtain the before and after time by using the lag analysis function, judge according to the difference between the current time and the before and after time to obtain the time continuity flag, and then use sum()over() to accumulate the continuity flag to generate a new temporary grouping Fields, finally grouped according to policyno, state, and temporary grouping fields to take the maximum and minimum values

For better understanding, each processing step is written separately, and it can be abbreviated in actual use:

with t as--求出来每条数据当天的前一天镜像时间
 (select a.policyno,
         a.state,
         a.snapdate,
         lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim
    from zyd.temp_0430 a
   order by a.policyno, a.snapdate),
t1 as--判断当天镜像时间和前一天的镜像时间+1是否相等,如果相等就置为0否则置为1,新增临时字段lxzt意为:连续状态标志
 (select t.*,
         case
           when t.snapdate = t.lag_tim + 1 then
            0
           else
            1
         end as lxzt
    from t
   order by policyno, snapdate),
t2 as--根据lxzt字段进行sum()over()求和,求出来一个新的用来做分组依据的字段,简称fzyj
 (select t1.*, sum(lxzt) over(order by policyno, snapdate) as fzyj from t1)
select policyno,--最后根据policyno,state,fzyj进行分组求最大最小值即为状态连续的开始结束时间
       state,
       -- fzyj,
       min(snapdate) as start_snap,
       max(snapdate) as end_snap
  from t2
 group by policyno, state, fzyj
 order by fzyj;
Insert picture description here

2. Without using the lag analysis function, you can also determine which days are continuous through auto-correlation, and then the following operation steps are the same as above. This writing method is a write-back to the lag()over() function, getting rid of the dependence on the analysis function

The following writing method needs to read the table twice. The above lag method is an optimization of this writing method:

with t as
 (select a.policyno, a.state, a.snapdate, b.snapdate as snap2
    from zyd.temp_0430 a, zyd.temp_0430 b
   where a.policyno = b.policyno(+)
     and a.state = b.state(+)
     and a.snapdate - 1 = b.snapdate(+)
   order by policyno, snapdate),
t1 as
 (select t.*,
         case
           when snap2 is null then
            1
           else
            0
         end as lxzt
    from t
   order by policyno, snapdate),
t2 as
 (select t1.*, sum(lxzt) over(order by policyno, snapdate) as fzyj
    from t1
   order by policyno, snapdate)
select policyno,
       state,
       fzyj,
       min(snapdate) as start_snap,
       max(snapdate) as end_snap
  from t2
 group by policyno, state, fzyj
 order by fzyj;
Insert picture description here

3. By constructing a tree structure, determine the root node and leaf node to obtain the continuous start and end time of the state

First, display the tree structure of each relationship according to the continuity of the data:

with t as
 (select a.policyno,
         a.state,
         a.snapdate,
         lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim
    from zyd.temp_0430 a --where policyno='sm1'
   order by a.policyno, a.snapdate),
t1 as
 (select t.*,
         case
           when t.snapdate = t.lag_tim + 1 then
            0
           else
            1
         end as lxzt
    from t
   order by policyno, snapdate),
t2 as
 (select t1.*,
         lpad('->', (level - 1) * 2, '->') || snapdate as 树状结构,
         level as 树中层次,
         decode(level, 1, 1) 是否根节点,
         decode(connect_by_isleaf, 1, 1) 是否叶子节点,
         case
           when (connect_by_isleaf = 0 and level > 1) then
            1
         end  是否树杈,
         (prior snapdate) as 根值,
         connect_by_root snapdate 主根值
    from t1
   start with (lxzt = 1)
  connect by (prior snapdate = snapdate - 1 
          and prior state = state and
              prior policyno = policyno)
   order by policyno, snapdate)
select * from t2;
Insert picture description here


It can be clearly seen from the above that the start date of each continuous state is used as the root of each tree, and the relationship between branch nodes, namely tree branches and leaf nodes, is expanded step by step. By analyzing the above data, we can know that if we want to obtain
The continuous time range of each policy state, based on the existing distribution of the above data, is now possible: group by through policyno, state, and main root value to take the maximum and minimum values ​​of snapdate, similar to the final steps of the previous two writing methods;
next, Our third way of writing is to write in this way:

with t as
 (select a.policyno,
         a.state,
         a.snapdate,
         lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim
    from zyd.temp_0430 a --where policyno='sm1'
   order by a.policyno, a.snapdate),
t1 as
 (select t.*,
         case
           when t.snapdate = t.lag_tim + 1 then
            0
           else
            1
         end as lxzt
    from t
   order by policyno, snapdate),
t2 as
 (select t1.*,
         lpad('->', (level - 1) * 2, '->') || snapdate as 树状结构,
         level as 树中层次,
         decode(level, 1, 1) 是否根节点,
         decode(connect_by_isleaf, 1, 1) 是否叶子节点,
         case
           when (connect_by_isleaf = 0 and level > 1) then
            1
         end  是否树杈,
         (prior snapdate) as 根值,
         connect_by_root snapdate 主根值
    from t1
   start with (lxzt = 1)
  connect by (prior snapdate = snapdate - 1 
          and prior state = state and
              prior policyno = policyno)
   order by policyno, snapdate)
select policyno,
       state,
       min(snapdate) as start_date,
       max(snapdate) as end_date
  from t2
 group by policyno, state, 主根值
 order by policyno, state;
Insert picture description here

4. Refer to process three. Now that the main root value and leaf node value of each piece of data have been obtained, this means that we know the continuous start and end time of each policy state, and then directly fetch the leaf node data, leaf node The main root value is the start date, and the leaf node value is the end date, so that we don’t need to group by

with t as
 (select a.policyno,
         a.state,
         a.snapdate,
         lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim
    from zyd.temp_0430 a --where policyno='sm1'
   order by a.policyno, a.snapdate),
t1 as
 (select t.*,
         case
           when t.snapdate = t.lag_tim + 1 then
            0
           else
            1
         end as lxzt
    from t
   order by policyno, snapdate),
t2 as
 (select t1.*,
         lpad('->', (level - 1) * 2, '->') || snapdate as 树状结构,
         level as 树中层次,
         decode(level, 1, 1) 是否根节点,
         decode(connect_by_isleaf, 1, 1) 是否叶子节点,
         case
           when (connect_by_isleaf = 0 and level > 1) then
            1
         end 是否树杈,
         (prior snapdate) as 根值,
         connect_by_root snapdate 主根值
    from t1
   start with (lxzt = 1)
  connect by (prior snapdate = snapdate - 1 and prior state = state and
             prior policyno = policyno)
   order by policyno, snapdate)
select policyno, state, 主根值 as start_date, snapdate as end_date
  from t2
 where 是否叶子节点 = 1
 order by policyno, snapdate
Insert picture description here

5. Before Oracle10g, the key function connect_by_root of the tree query above is not yet supported. If a tree structure is used, it can be achieved through sys_connect_by_path

with t as
 (select a.policyno,
         a.state,
         a.snapdate,
         lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim
  --case when lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) is null then snapdate else lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) end as lag_tim
    from zyd.temp_0430 a
   order by a.policyno, a.snapdate),
t1 as
 (select t.*,
         case
           when t.snapdate = t.lag_tim + 1 then
            0
           else
            1
         end as lxzt
    from t
   order by policyno, snapdate),
t2 as
 (select t1.*,
         sys_connect_by_path(snapdate, ',') as pt,
         level,
         connect_by_isleaf as cb
    from t1
   start with (lxzt = 1)
  connect by (prior snapdate = snapdate - 1 and prior state = state and
             prior policyno = policyno))
select t2.*,
       regexp_substr(pt, '[^,]+', 1, 1) as start_date,
       regexp_substr(pt, '[^,]+', 1, regexp_count(pt, ',')) as end_date
  from t2
 where cb = 1
 order by policyno, state;
Insert picture description here


There are many other ways of writing. I won’t list them all here. If you’re interested, you can comment and write other options below!