Oracle SQL
  • LICENSE

How to group connected elements (or pairs)

Posted on September 29, 2017 by Sayan Malakshinov Posted in oracle, PL/SQL, PL/SQL optimization, query optimizing, SQL 7 Comments

I see quite often when developers ask questions about connected components:

Table “MESSAGES” contains fields “SENDER” and “RECIPIENT”, which store clients id.
How to quickly get all groups of clients who are connected even through other clients if the table has X million rows?
So for this table, there should be 4 groups:

  • (1, 2, 4, 8, 16)
  • (3, 6, 12)
  • (5, 10, 20)
  • (7, 14)
  • (9, 18)
SENDERRECIPIENT
12
24
36
48
510
612
714
816
918
1020

Of course, we can solve this problem using SQL only (model, recursive subquery factoring or connect by with nocycle), but such solutions will be too slow for huge tables.

Example of SQL solution

with 
   t(sender,recipient) as (select level,level*2 from dual connect by level<=10)
,  v1 as (select rownum id,t.* from t)
,  v2 as (select id, account
          from v1
           unpivot (
             account for x in (sender,recipient)
           ))
, v3 as (
           select
              id
             ,account
             ,dense_rank()over(order by account) account_n
             ,count(*)over() cnt
           from v2)
, v4 as (
           select distinct grp,account
           from v3
           model
                dimension by (id,account_n)
                measures(id grp,account,cnt)
                rules
                iterate(1e6)until(iteration_number>cnt[1,1])(
                   grp[any,any] = min(grp)[any,cv()]
                  ,grp[any,any] = min(grp)[cv(),any]
                )
)
select
   listagg(account,',')within group(order by account) s
from v4
group by grp

[collapse]

In such situations it’s much better to adopt standard algorithms like Quick-find or Weighted quick-union for PL/SQL.
The first time I wrote such solution about 5 years ago and I even posted here one of the latest solutions, but all of them were not universal, so I’ve created the package today with a couple of functions for most common problems: XT_CONNECTED_COMPONENTS

It contains 2 functions based on Weighted quick-find quick-union algorithm:

  • function get_strings(cur in sys_refcursor, delim varchar2:=’,’) return strings_array pipelined;
    It accepts a cursor and returns found connected components as table of varchar2(v_size). You can change v_size in the package definition.
    Input cursor should contain one Varchar2 column with linked strings, for example: ‘a,b,c’.
    You can also specify list delimiter, by default it is comma.
    Examples:

    select * from table(xt_connected_components.get_strings( cursor(select ELEM1||','||ELEM2 from TEST));
    select * 
    from
     table(
       xt_connected_components.get_strings( 
         cursor(select 'a,b,c' from dual union all
                select 'd,e,f' from dual union all
                select 'e,c'   from dual union all
                select 'z'     from dual union all
                select 'X,Y'   from dual union all
                select 'Y,Z'   from dual)));
    COLUMN_VALUE
    -----------------------------------------
    STRINGS('X', 'Y', 'Z')
    STRINGS('a', 'b', 'c', 'd', 'e', 'f')
    STRINGS('z')
    
    
  • function get_numbers(cur in sys_refcursor) return numbers_array pipelined;
    This function also returns connected components, but for numbers.
    Input cursor should contain two columns with linked numbers.
    Examples:

    select * 
    from table(
            xt_connected_components.get_numbers( 
              cursor(
                select sender_id, recipient_id from messages
            )));
    select * 
    from
      table(
        xt_connected_components.get_numbers( 
           cursor(
              select level   account1
                   , level*2 account2 
              from dual 
              connect by level<=10
        )));
    SQL> select *
      2  from
      3    table(
      4      xt_connected_components.get_numbers(
      5         cursor(
      6            select level   account1
      7                 , level*2 account2
      8            from dual
      9            connect by level<=10
     10*     )))
    SQL> /
    
    COLUMN_VALUE
    ------------------------
    NUMBERS(1, 2, 4, 8, 16)
    NUMBERS(3, 6, 12)
    NUMBERS(5, 10, 20)
    NUMBERS(7, 14)
    NUMBERS(9, 18)
    

How to install:
Download all files from Github and execute “@install” in SQL*Plus or execute them in another tool in the following order:
xt_connected_components_types.sql
xt_connected_components_pkg.sql
xt_connected_components_bdy.sql

Download URL: https://github.com/xtender/xt_scripts/tree/master/extra/xt_connected_components

query optimization
« Ampersand instead of colon for bind variables
PL/SQL functions: Iterate and keys for associative arrays »
photo Sayan Malakshinov

Oracle ACE Pro Oracle ACE Pro

DEVVYOracle Database Developer Choice Award winner

Oracle performance tuning expert

UK / Cambridge

LinkedIn   Twitter
sayan@orasql.org

Recent Posts

  • CBO and Partial indexing
  • Slow index access “COL=:N” where :N is NULL
  • Where does the commit or rollback happen in PL/SQL code?
  • :1 and SP2-0553: Illegal variable name “1”.
  • ORA exceptions that can’t be caught by exception handler

Recent Comments

  • Oracle SGA 값을 증가 시킬 때 발생 장애 원인 – DBA의 정석 on Example of controlling “direct path reads” decision through SQL profile hints (index_stats/table_stats)
  • Oracle SQL | Oracle diagnostic events — Cheat sheet on Where does the commit or rollback happen in PL/SQL code?
  • Functions & Subqueries | Oracle Scratchpad on Deterministic function vs scalar subquery caching. Part 3
  • Materialized views state turns into compilation_error after refresh - kranar.top - Answering users questions... on Friday prank: select from join join join
  • Exadata Catalogue | Oracle Scratchpad on When bloggers get it wrong – part 1
  • Exadata Catalogue | Oracle Scratchpad on Serial Scans failing to offload
  • lateral join – decorrelation gone wrong – svenweller on Lateral view decorrelation(VW_DCL) causes wrong results with rownum
  • 255 column catalogue | Oracle Scratchpad on Intra-block row chaining optimization in 12.2
  • 255 column catalogue | Oracle Scratchpad on row pieces, 255 columns, intra-block row chaining in details
  • opt_estimate catalogue | Oracle Scratchpad on Correct syntax for the table_stats hint

Blogroll

  • Alex Fatkulin
  • Alexander Anokhin
  • Andrey Nikolaev
  • Charles Hooper
  • Christian Antognini
  • Coskan Gundogar
  • David Fitzjarrell
  • Igor Usoltsev
  • Jonathan Lewis
  • Karl Arao
  • Mark Bobak
  • Martin Bach
  • Martin Berger
  • Neil Chandler
  • Randolf Geist
  • Richard Foote
  • Riyaj Shamsudeen
  • Tanel Poder
  • Timur Akhmadeev
  • Valentin Nikotin

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
©Sayan Malakshinov. Oracle SQL