I see quite often when developers ask questions about connected components:
|Table “MESSAGES” contains fields “SENDER” and “RECIPIENT”, which store clients id.
How to quickly get all groups of clients who are connected even through other clients if the table has X million rows?
So for this table, there should be 4 groups:
Of course, we can solve this problem using SQL only (model, recursive subquery factoring or connect by with nocycle), but such solutions will be too slow for huge tables.
with t(sender,recipient) as (select level,level*2 from dual connect by level<=10) , v1 as (select rownum id,t.* from t) , v2 as (select id, account from v1 unpivot ( account for x in (sender,recipient) )) , v3 as ( select id ,account ,dense_rank()over(order by account) account_n ,count(*)over() cnt from v2) , v4 as ( select distinct grp,account from v3 model dimension by (id,account_n) measures(id grp,account,cnt) rules iterate(1e6)until(iteration_number>cnt[1,1])( grp[any,any] = min(grp)[any,cv()] ,grp[any,any] = min(grp)[cv(),any] ) ) select listagg(account,',')within group(order by account) s from v4 group by grp
In such situations it’s much better to adopt standard algorithms like Quick-find or Weighted quick-union for PL/SQL.
The first time I wrote such solution about 5 years ago and I even posted here one of the latest solutions, but all of them were not universal, so I’ve created the package today with a couple of functions for most common problems: XT_CONNECTED_COMPONENTS
It contains 2 functions based on Weighted quick-find quick-union algorithm:
- function get_strings(cur in sys_refcursor, delim varchar2:=’,’) return strings_array pipelined;
It accepts a cursor and returns found connected components as table of varchar2(v_size). You can change v_size in the package definition.
Input cursor should contain one Varchar2 column with linked strings, for example: ‘a,b,c’.
You can also specify list delimiter, by default it is comma.
select * from table(xt_connected_components.get_strings( cursor(select ELEM1||','||ELEM2 from TEST)); select * from table( xt_connected_components.get_strings( cursor(select 'a,b,c' from dual union all select 'd,e,f' from dual union all select 'e,c' from dual union all select 'z' from dual union all select 'X,Y' from dual union all select 'Y,Z' from dual))); COLUMN_VALUE ----------------------------------------- STRINGS('X', 'Y', 'Z') STRINGS('a', 'b', 'c', 'd', 'e', 'f') STRINGS('z')
- function get_numbers(cur in sys_refcursor) return numbers_array pipelined;
This function also returns connected components, but for numbers.
Input cursor should contain two columns with linked numbers.
select * from table( xt_connected_components.get_numbers( cursor( select sender_id, recipient_id from messages ))); select * from table( xt_connected_components.get_numbers( cursor( select level account1 , level*2 account2 from dual connect by level<=10 ))); SQL> select * 2 from 3 table( 4 xt_connected_components.get_numbers( 5 cursor( 6 select level account1 7 , level*2 account2 8 from dual 9 connect by level<=10 10* ))) SQL> / COLUMN_VALUE ------------------------ NUMBERS(1, 2, 4, 8, 16) NUMBERS(3, 6, 12) NUMBERS(5, 10, 20) NUMBERS(7, 14) NUMBERS(9, 18)
How to install:
Download all files from Github and execute “@install” in SQL*Plus or execute them in another tool in the following order: