Everyone knows Tom Kyte’s mantra:
You should do it in a single SQL statement if at all possible.
But we all know that “Every rule has an exception”
There are many different cases when pl/sql with sql can be more efficient than only sql, and i dont want to catalog them. I just want to show a couple examples of such exceptions:
1. Running totals by several dimensions
Simple example from forum:
select dt, dim1, dim2, val, sum(val) over(partition by dim1 order by dt) dim1_cumulative_sum, sum(val) over(partition by dim2 order by dt) dim2_cumulative_sum, sum(val) over(partition by dim1, dim2 order by dt) dim1_dim2_cumulative_sum from mg_t order by dt;
This query will be very hard for big data sets, so we can do it efficiently with pl/sql:
create or replace function xf_to_drop return xt2_to_drop pipelined is type tt is table of number index by pls_integer; type tt2 is table of tt index by pls_integer; dim1_c tt; dim2_c tt; dim12_c tt2; begin for r in ( select dt, dim1, dim2, val from mg_t order by dt ) loop dim1_c(r.dim1):=case when dim1_c.exists(r.dim1) then dim1_c(r.dim1) else 0 end + r.val; dim2_c(r.dim1):=case when dim2_c.exists(r.dim1) then dim2_c(r.dim1) else 0 end + r.val; dim12_c(r.dim1)(r.dim2):=case when dim12_c.exists(r.dim1) and dim12_c(r.dim1).exists(r.dim2) then dim12_c(r.dim1)(r.dim2) else 0 end + r.val; pipe row (xt1_to_drop( r.dt ,r.dim1 ,r.dim2 ,r.val ,dim1_c(r.dim1) ,dim1_c(r.dim1) ,dim12_c(r.dim1)(r.dim2) )); end loop; end; /
create table mg_t as select trunc(sysdate) + level/1440 dt, trunc(3 * dbms_random.value()) dim1, trunc(3 * dbms_random.value()) dim2, trunc(100 * dbms_random.value()) val from dual connect by level <= 3e6; create type xt1_to_drop is object( dt date ,dim1 number ,dim2 number ,val number ,dim1_cumulative_sum number ,dim2_cumulative_sum number ,dim1_dim2_cumulative_sum number ); create type xt2_to_drop as table of xt1_to_drop; create or replace function xf_to_drop return xt2_to_drop pipelined is type tt is table of number index by pls_integer; type tt2 is table of tt index by pls_integer; dim1_c tt; dim2_c tt; dim12_c tt2; begin for r in ( select dt, dim1, dim2, val from mg_t order by dt ) loop dim1_c(r.dim1):=case when dim1_c.exists(r.dim1) then dim1_c(r.dim1) else 0 end + r.val; dim2_c(r.dim1):=case when dim2_c.exists(r.dim1) then dim2_c(r.dim1) else 0 end + r.val; dim12_c(r.dim1)(r.dim2):=case when dim12_c.exists(r.dim1) and dim12_c(r.dim1).exists(r.dim2) then dim12_c(r.dim1)(r.dim2) else 0 end + r.val; pipe row (xt1_to_drop( r.dt,r.dim1,r.dim2,r.val,dim1_c(r.dim1),dim1_c(r.dim1),dim12_c(r.dim1)(r.dim2))); end loop; end; / exec for r in (select * from table(xf_to_drop)) loop null; end loop;
2. Finding connected components
Assume that we have big table with many-to-many relationship:
create table test (clientid NUMBER(10), accountid NUMBER(10));
How we can find all connected groups?
This example also taken from our russian forum and there was very good and simple sql-only solution, but it’s not efficient on big data sets:
select min(group_member_id) as group_max_id, accountid, clientid from (select clientid as group_member_id , connect_by_root accountid as accountid , connect_by_root clientid as clientid from test connect by nocycle decode(accountid, prior accountid, 1, 0) + decode(clientid, prior clientid, 1, 0) = 1 ) a group by accountid, clientid order by group_max_id, accountid /
This pure SQL solution is for the cases when ClientId and AccountId are different entities. If they are the same entities in your case, you need to use UNION ALL:
select min(group_member_id) as group_max_id, accountid, clientid
from (select clientid as group_member_id
, connect_by_root accountid as accountid
, connect_by_root clientid as clientid
from test
connect by nocycle decode(accountid, prior accountid, 1, 0)
+ decode(clientid, prior clientid, 1, 0)
= 1
) a
group by accountid, clientid
order by group_max_id, accountid
/
select min(group_member_id) as group_max_id, accountid, clientid from (select clientid as group_member_id , connect_by_root accountid as accountid , connect_by_root clientid as clientid from (select accountid, clientid from test union all select clientid,accountid from test) connect by nocycle decode(accountid, prior accountid, 1, 0) + decode(clientid, prior clientid, 1, 0) = 1 ) a group by accountid, clientid order by group_max_id, accountid /
We can try to remember algorithms courses and adopt one of the several algorithms for connected components:
declare type int_array is table of pls_integer index by pls_integer; type arr_elems is table of sys.ku$_objnumset index by pls_integer; root int_array; root_elems arr_elems; n int; clients int_array; accounts int_array; l integer:=dbms_utility.get_time(); procedure print(v in varchar2) is begin dbms_output.put_line(to_char((dbms_utility.get_time-l)/100,'0999.99')||' '||v); l:=dbms_utility.get_time(); end; function get_root(n int) return pls_integer is begin if root.exists(n) then return root(n); else return null; end if; end; procedure update_root(old_root pls_integer,new_root pls_integer) is i pls_integer; elem pls_integer; cnt_old pls_integer; cnt_new pls_integer; begin if old_root!=new_root then --root_elems(new_root):=root_elems(new_root) multiset union all root_elems(old_root); cnt_old:=root_elems(old_root).count; cnt_new:=root_elems(new_root).count; root_elems(new_root).extend(cnt_old); for i in 1..cnt_old loop elem := root_elems(old_root)(i); root(elem):=new_root; root_elems(new_root)(cnt_new+i):=elem; end loop; root_elems(old_root).delete; end if; end; procedure add_elem(p_root pls_integer, p_elem pls_integer) is begin if not root_elems.exists(p_root) then root_elems(p_root):=sys.ku$_objnumset(p_elem); else root_elems(p_root).extend(); root_elems(p_root)(root_elems(p_root).count):=p_elem; end if; end; procedure add_link(clientid pls_integer,accountid pls_integer) is r1 pls_integer; r2 pls_integer; new_root pls_integer; begin r1:=get_root(clientid); r2:=get_root(accountid); if r1 is null or r2 is null then new_root := coalesce(r1,r2,clientid); if r1 is null then add_elem(new_root,clientid ); root(clientid) :=new_root; end if; if r2 is null then add_elem(new_root,accountid); root(accountid):=new_root; end if; else new_root := least(r1,r2); root(clientid) :=new_root; root(accountid):=new_root; update_root(greatest(r1,r2),new_root); end if; end; function str_format(p int) return varchar2 is begin return utl_lms.format_message('(%d, %d) = group #%d' ,clients(p) ,accounts(p) ,get_root(clients(p)) ); end; begin print('start'); select clientid,accountid bulk collect into clients,accounts from test -- where rownum<=1000 ; print('fetched'); n:=clients.count; dbms_output.put_line('count='||n); for i in 1..n loop add_link(clients(i),accounts(i)); end loop; print('processed'); --- /* for i in 1..n loop dbms_output.put_line(str_format(i)); end loop; -- */ end;
We can also try even more interesting special algorithms for parallel processing: CONNECTED COMPONENTS ALGORITHMS
FOR MESH-CONNECTED PARALLEL COMPUTERS