Oracle SQL | Single SQL vs SQL+PL/SQL

Everyone knows Tom Kyte’s mantra:

You should do it in a single SQL statement if at all possible.

But we all know that “Every rule has an exception”
There are many different cases when pl/sql with sql can be more efficient than only sql, and i dont want to catalog them. I just want to show a couple examples of such exceptions:

1. Running totals by several dimensions

Simple example from forum:

select dt,
       dim1,
       dim2,
       val,
       sum(val) over(partition by dim1 order by dt) dim1_cumulative_sum,
       sum(val) over(partition by dim2 order by dt) dim2_cumulative_sum,
       sum(val) over(partition by dim1, dim2 order by dt) dim1_dim2_cumulative_sum
  from mg_t
 order by dt;

This query will be very hard for big data sets, so we can do it efficiently with pl/sql:

create or replace function xf_to_drop return xt2_to_drop pipelined
is
   type tt  is table of number index by pls_integer;
   type tt2 is table of tt index by pls_integer;
   dim1_c tt;
   dim2_c tt;
   dim12_c tt2;
begin
   for r in (
            select dt,
                   dim1,
                   dim2,
                   val
              from mg_t
             order by dt
   )
   loop
      dim1_c(r.dim1):=case when dim1_c.exists(r.dim1) then dim1_c(r.dim1) else 0 end + r.val;
      dim2_c(r.dim1):=case when dim2_c.exists(r.dim1) then dim2_c(r.dim1) else 0 end + r.val;
      dim12_c(r.dim1)(r.dim2):=case
                                  when dim12_c.exists(r.dim1)
                                   and dim12_c(r.dim1).exists(r.dim2)
                                  then dim12_c(r.dim1)(r.dim2)
                                  else 0
                               end + r.val;
      pipe row (xt1_to_drop( r.dt
                            ,r.dim1
                            ,r.dim2
                            ,r.val
                            ,dim1_c(r.dim1)
                            ,dim1_c(r.dim1)
                            ,dim12_c(r.dim1)(r.dim2)
                           ));
   end loop;
end;
/

Full example

create table mg_t as
select trunc(sysdate) + level/1440 dt,
       trunc(3 * dbms_random.value()) dim1,
       trunc(3 * dbms_random.value()) dim2,
       trunc(100 * dbms_random.value()) val
  from dual
connect by level <= 3e6;

create type xt1_to_drop is object(
       dt                  date
      ,dim1                number
      ,dim2                number
      ,val                 number
      ,dim1_cumulative_sum number
      ,dim2_cumulative_sum number
      ,dim1_dim2_cumulative_sum number
);
create type xt2_to_drop as table of xt1_to_drop;

create or replace function xf_to_drop return xt2_to_drop pipelined
is
   type tt is table of number index by pls_integer;
   type tt2 is table of tt index by pls_integer;
   dim1_c tt;
   dim2_c tt;
   dim12_c tt2;
begin
   for r in (
            select dt,
                   dim1,
                   dim2,
                   val
              from mg_t
             order by dt
   )
   loop
      dim1_c(r.dim1):=case when dim1_c.exists(r.dim1) then dim1_c(r.dim1) else 0 end + r.val;
      dim2_c(r.dim1):=case when dim2_c.exists(r.dim1) then dim2_c(r.dim1) else 0 end + r.val;
      dim12_c(r.dim1)(r.dim2):=case
                                  when dim12_c.exists(r.dim1)
                                   and dim12_c(r.dim1).exists(r.dim2)
                                  then dim12_c(r.dim1)(r.dim2)
                                  else 0
                               end + r.val;
      pipe row (xt1_to_drop( r.dt,r.dim1,r.dim2,r.val,dim1_c(r.dim1),dim1_c(r.dim1),dim12_c(r.dim1)(r.dim2)));
   end loop;
end;
/
exec for r in (select * from table(xf_to_drop)) loop null; end loop;

[collapse]

2. Finding connected components

Assume that we have big table with many-to-many relationship:

create table test (clientid NUMBER(10), accountid NUMBER(10));

How we can find all connected groups?

This example also taken from our russian forum and there was very good and simple sql-only solution, but it’s not efficient on big data sets:

select min(group_member_id) as group_max_id, accountid, clientid
  from  (select clientid as group_member_id
         , connect_by_root accountid as accountid
         , connect_by_root clientid  as clientid
      from test
      connect by nocycle decode(accountid, prior accountid, 1, 0)
                       + decode(clientid,  prior clientid,  1, 0)
                       = 1
  ) a
  group by accountid, clientid
  order by group_max_id, accountid
/

This pure SQL solution is for the cases when ClientId and AccountId are different entities. If they are the same entities in your case, you need to use UNION ALL:
select min(group_member_id) as group_max_id, accountid, clientid
from (select clientid as group_member_id
, connect_by_root accountid as accountid
, connect_by_root clientid as clientid
from test
connect by nocycle decode(accountid, prior accountid, 1, 0)
+ decode(clientid, prior clientid, 1, 0)
= 1
) a
group by accountid, clientid
order by group_max_id, accountid
/

select min(group_member_id) as group_max_id, accountid, clientid
  from  (select clientid as group_member_id
         , connect_by_root accountid as accountid
         , connect_by_root clientid  as clientid
      from (select accountid, clientid from test union all select clientid,accountid from test)
      connect by nocycle decode(accountid, prior accountid, 1, 0)
                       + decode(clientid,  prior clientid,  1, 0)
                       = 1
  ) a
  group by accountid, clientid
  order by group_max_id, accountid
/

We can try to remember algorithms courses and adopt one of the several algorithms for connected components:

Based on weighted quick-union algorithm

declare
   type int_array    is table of pls_integer index by pls_integer;
   type arr_elems    is table of sys.ku$_objnumset index by pls_integer;
   root              int_array;
   root_elems        arr_elems;

   n        int;
   clients  int_array;
   accounts int_array;

    l integer:=dbms_utility.get_time();

    procedure print(v in varchar2) is
    begin
      dbms_output.put_line(to_char((dbms_utility.get_time-l)/100,'0999.99')||' '||v);
      l:=dbms_utility.get_time();
    end;

   
   function get_root(n int) return pls_integer is
   begin
      if root.exists(n) then 
         return root(n);
      else 
         return null;
      end if;
   end;
   
   procedure update_root(old_root pls_integer,new_root pls_integer) is
      i       pls_integer;
      elem    pls_integer;
      cnt_old pls_integer;
      cnt_new pls_integer;
   begin
      if old_root!=new_root then 
         --root_elems(new_root):=root_elems(new_root) multiset union all root_elems(old_root);
         cnt_old:=root_elems(old_root).count;
         cnt_new:=root_elems(new_root).count;
         root_elems(new_root).extend(cnt_old);
         for i in 1..cnt_old
         loop
            elem := root_elems(old_root)(i);
            root(elem):=new_root;
            root_elems(new_root)(cnt_new+i):=elem;
         end loop;
         root_elems(old_root).delete;
      end if;
   end;
   
   procedure add_elem(p_root pls_integer, p_elem pls_integer) is
   begin
      if not root_elems.exists(p_root) then
         root_elems(p_root):=sys.ku$_objnumset(p_elem);
      else
         root_elems(p_root).extend();
         root_elems(p_root)(root_elems(p_root).count):=p_elem;
      end if;
   end;
   
   procedure add_link(clientid pls_integer,accountid pls_integer) is
      r1       pls_integer;
      r2       pls_integer;
      new_root pls_integer;
   begin
      r1:=get_root(clientid);
      r2:=get_root(accountid);
      
      if r1 is null or r2 is null then
         new_root := coalesce(r1,r2,clientid);
         if r1 is null then add_elem(new_root,clientid ); root(clientid) :=new_root; end if;
         if r2 is null then add_elem(new_root,accountid); root(accountid):=new_root; end if;
      else
         new_root := least(r1,r2);
         root(clientid) :=new_root;
         root(accountid):=new_root;
         update_root(greatest(r1,r2),new_root);
      end if;
   end;
   
   function str_format(p int) return varchar2 is
   begin
      return utl_lms.format_message('(%d, %d) = group #%d'
                                   ,clients(p)
                                   ,accounts(p)
                                   ,get_root(clients(p))
                                   );
   end;
begin
   print('start');
   select clientid,accountid 
          bulk collect into clients,accounts
   from test 
--   where rownum<=1000
   ;
   print('fetched');
   n:=clients.count;
   dbms_output.put_line('count='||n);
   for i in 1..n loop
      add_link(clients(i),accounts(i));
   end loop;
   print('processed');
---
/*
   for i in 1..n loop
      dbms_output.put_line(str_format(i));
   end loop;
--   */
end;

[collapse]

We can also try even more interesting special algorithms for parallel processing: CONNECTED COMPONENTS ALGORITHMS
FOR MESH-CONNECTED PARALLEL COMPUTERS