Oracle SQL
  • LICENSE

How to group connected elements (or pairs)

Posted on September 29, 2017 by Sayan Malakshinov Posted in oracle, PL/SQL, PL/SQL optimization, query optimizing, SQL 1,804 Page views 7 Comments

I see quite often when developers ask questions about connected components:

Table “MESSAGES” contains fields “SENDER” and “RECIPIENT”, which store clients id.
How to quickly get all groups of clients who are connected even through other clients if the table has X million rows?
So for this table, there should be 4 groups:

  • (1, 2, 4, 8, 16)
  • (3, 6, 12)
  • (5, 10, 20)
  • (7, 14)
  • (9, 18)
SENDERRECIPIENT
12
24
36
48
510
612
714
816
918
1020

Of course, we can solve this problem using SQL only (model, recursive subquery factoring or connect by with nocycle), but such solutions will be too slow for huge tables.

Example of SQL solution

[sourcecode language=”sql”]
with
t(sender,recipient) as (select level,level*2 from dual connect by level<=10)
, v1 as (select rownum id,t.* from t)
, v2 as (select id, account
from v1
unpivot (
account for x in (sender,recipient)
))
, v3 as (
select
id
,account
,dense_rank()over(order by account) account_n
,count(*)over() cnt
from v2)
, v4 as (
select distinct grp,account
from v3
model
dimension by (id,account_n)
measures(id grp,account,cnt)
rules
iterate(1e6)until(iteration_number>cnt[1,1])(
grp[any,any] = min(grp)[any,cv()]
,grp[any,any] = min(grp)[cv(),any]
)
)
select
listagg(account,’,’)within group(order by account) s
from v4
group by grp
[/sourcecode]

[collapse]

In such situations it’s much better to adopt standard algorithms like Quick-find or Weighted quick-union for PL/SQL.
The first time I wrote such solution about 5 years ago and I even posted here one of the latest solutions, but all of them were not universal, so I’ve created the package today with a couple of functions for most common problems: XT_CONNECTED_COMPONENTS

It contains 2 functions based on Weighted quick-find quick-union algorithm:

  • function get_strings(cur in sys_refcursor, delim varchar2:=’,’) return strings_array pipelined;
    It accepts a cursor and returns found connected components as table of varchar2(v_size). You can change v_size in the package definition.
    Input cursor should contain one Varchar2 column with linked strings, for example: ‘a,b,c’.
    You can also specify list delimiter, by default it is comma.
    Examples:

    select * from table(xt_connected_components.get_strings( cursor(select ELEM1||','||ELEM2 from TEST));
    select * 
    from
     table(
       xt_connected_components.get_strings( 
         cursor(select 'a,b,c' from dual union all
                select 'd,e,f' from dual union all
                select 'e,c'   from dual union all
                select 'z'     from dual union all
                select 'X,Y'   from dual union all
                select 'Y,Z'   from dual)));
    COLUMN_VALUE
    -----------------------------------------
    STRINGS('X', 'Y', 'Z')
    STRINGS('a', 'b', 'c', 'd', 'e', 'f')
    STRINGS('z')
    
    
  • function get_numbers(cur in sys_refcursor) return numbers_array pipelined;
    This function also returns connected components, but for numbers.
    Input cursor should contain two columns with linked numbers.
    Examples:

    select * 
    from table(
            xt_connected_components.get_numbers( 
              cursor(
                select sender_id, recipient_id from messages
            )));
    select * 
    from
      table(
        xt_connected_components.get_numbers( 
           cursor(
              select level   account1
                   , level*2 account2 
              from dual 
              connect by level<=10
        )));
    SQL> select *
      2  from
      3    table(
      4      xt_connected_components.get_numbers(
      5         cursor(
      6            select level   account1
      7                 , level*2 account2
      8            from dual
      9            connect by level<=10
     10*     )))
    SQL> /
    
    COLUMN_VALUE
    ------------------------
    NUMBERS(1, 2, 4, 8, 16)
    NUMBERS(3, 6, 12)
    NUMBERS(5, 10, 20)
    NUMBERS(7, 14)
    NUMBERS(9, 18)
    

How to install:
Download all files from Github and execute “@install” in SQL*Plus or execute them in another tool in the following order:
xt_connected_components_types.sql
xt_connected_components_pkg.sql
xt_connected_components_bdy.sql

Download URL: https://github.com/xtender/xt_scripts/tree/master/extra/xt_connected_components

query optimization
« Ampersand instead of colon for bind variables
PL/SQL functions: Iterate and keys for associative arrays »
Page views: 1,804
photo Sayan Malakshinov

Oracle ACE Pro Oracle ACE Pro Alumni

DEVVYOracle Database Developer Choice Award winner

Oracle performance tuning expert

UK / Cambridge

LinkedIn   Twitter
sayan@orasql.org

Recent Posts

  • Oracle Telegram Bot
  • Partition Pruning and Global Indexes
  • Interval Search: Part 4. Dynamic Range Segmentation – interval quantization
  • Interval Search Series: Simplified, Advanced, and Custom Solutions
  • Interval Search: Part 3. Dynamic Range Segmentation – Custom Domain Index

Popular posts

Recent Comments

  • Oracle SQL | Interval Search: Part 4. Dynamic Range Segmentation – interval quantization on Interval Search: Part 3. Dynamic Range Segmentation – Custom Domain Index
  • Oracle SQL | Interval Search: Part 4. Dynamic Range Segmentation – interval quantization on Interval Search: Part 2. Dynamic Range Segmentation – Simplified
  • Oracle SQL | Interval Search: Part 4. Dynamic Range Segmentation – interval quantization on Interval Search: Optimizing Date Range Queries – Part 1
  • Oracle SQL | Interval Search Series: Simplified, Advanced, and Custom Solutions on Interval Search: Part 2. Dynamic Range Segmentation – Simplified
  • Oracle SQL | Interval Search: Part 2. Dynamic Range Segmentation – Simplified on Interval Search: Part 3. Dynamic Range Segmentation – Custom Domain Index

Blogroll

  • Alex Fatkulin
  • Alexander Anokhin
  • Andrey Nikolaev
  • Charles Hooper
  • Christian Antognini
  • Coskan Gundogar
  • David Fitzjarrell
  • Igor Usoltsev
  • Jonathan Lewis
  • Karl Arao
  • Mark Bobak
  • Martin Bach
  • Martin Berger
  • Neil Chandler
  • Randolf Geist
  • Richard Foote
  • Riyaj Shamsudeen
  • Tanel Poder
  • Timur Akhmadeev
  • Valentin Nikotin
©Sayan Malakshinov. Oracle SQL