Oracle SQL | Category Archives: PL/SQL optimization

Another interesting troubleshooting case

Posted on May 29, 2021 by Sayan Malakshinov Posted in oracle, PL/SQL, PL/SQL optimization, trace, troubleshooting 2,025 Page views Leave a comment

Got an interesting question today in RuOUG:

Some very simple PL/SQL procedures usually are completed within ~50ms, but sometimes sporadically longer than a second. For example, the easiest one from these procedures:

create or replace PROCEDURE XXXX (
  P_ORG_NUM IN number,
  p_result OUT varchar2, 
  p_seq OUT number 
) AS 
BEGIN
  p_seq := P_ORG_NUM; p_result:='';
END;

sql_trace shows that it was executed for 1.001sec and all the time was “ON CPU”:

Continue reading→

PL/SQL functions and statement level consistency

Posted on December 30, 2019 by Sayan Malakshinov Posted in deterministic functions, oracle, PL/SQL, PL/SQL optimization, query optimizing, SQL 1,951 Page views Leave a comment

You may know that whenever you call PL/SQL functions from within SQL query, each query in the function is consistent to the SCN of its start and not to the SCN of parent query.

Simple example:

create table test as 
  select level a, level b from dual connect by level<=10;

create or replace function f1(a int) return int as
  res int;
begin
  select b into res 
  from test t 
  where t.a=f1.a;
  dbms_lock.sleep(1);
  return res;
end;
/

As you can see we created a simple PL/SQL function that returns the result of the query select b from test where a=:input_var

But lets check what does it return if another session changes data in the table:

-- session 2:
begin
    for i in 1..30 loop
      update test set b=b+1;
      commit;
      dbms_lock.sleep(1);
    end loop;
end;
/
-- session 1:
SQL> select t.*, f1(a) func from test t;

         A          B       FUNC
---------- ---------- ----------
         1          1          1
         2          2          3
         3          3          5
         4          4          7
         5          5          9
         6          6         11
         7          7         13
         8          8         15
         9          9         17
        10         10         19

10 rows selected.

As you can see we got inconsistent results in the column FUNC, but we can easily fix it using OPERATORs:

CREATE OPERATOR f1_op
   BINDING (INT) 
   RETURN INT 
   USING F1;

Lets revert changes back and check our query with new operator now:

--session 1:
SQL> update test set b=a;

10 rows updated.

SQL> commit;

Commit complete.

-- session 2:
begin
    for i in 1..30 loop
      update test set b=b+1;
      commit;
      dbms_lock.sleep(1);
    end loop;
end;
/

-- session 1:
SQL> select t.*, f1(a) func, f1_op(a) op from test t;

         A          B       FUNC         OP
---------- ---------- ---------- ----------
         1          2          2          2
         2          3          5          3
         3          4          8          4
         4          5         11          5
         5          6         14          6
         6          7         17          7
         7          8         20          8
         8          9         23          9
         9         10         26         10
        10         11         29         11

10 rows selected.

As you can see, all values in the column OP are equal to the values of B, while, in turn, function F1 returns inconsistent values.

SQL: Fast ways to count unique characters in the string

Posted on August 19, 2019 by Sayan Malakshinov Posted in oracle, PL/SQL optimization, query optimizing, SQL 2,282 Page views Leave a comment

Test data:

create table t_str as
select round(dbms_random.value(1e10,9e10)) str from dual connect by level&lt;=1e5
/

PL/SQL variant:

with
  function ff(s varchar2) return varchar2 
  as
      type avarchars is table of varchar2(100) index by varchar2(1);
      st  avarchars;
      idx varchar2(1);
      res varchar2(10);
      
      function iterate( idx in out nocopy varchar2, arr in out nocopy avarchars) 
         return boolean
      as --pragma inline;
      begin
         if idx is null
            then idx:=arr.first; 
            else idx:=arr.next(idx);
         end if;
         return idx is not null;
      end;  
   begin
     for i in 1..length(s) loop
        st(substr(s,i,1)):=1;
     end loop;
     while iterate(idx,st) loop
        res:=res||idx;
     end loop;
     return res;
   end;

select min(ff(str)) res
from t_str
/

SQL-only variant:

select min(fstr)
from t_str t
     cross apply (
     select listagg(c) within group (order by 1) fstr
     from (select
            distinct substr(t.str, level, 1) c
           from dual
           connect by level <= length(t.str)
          )
     )
/

Timings:

SQL> create table t_str as
  2  select round(dbms_random.value(1e10,9e10)) str from dual connect by level<=1e5
  3  /

Table created.

Elapsed: 00:00:00.55
SQL> with
  2    function ff(s varchar2) return varchar2
  3    as
  4        type avarchars is table of varchar2(100) index by varchar2(1);
  5        st  avarchars;
  6        idx varchar2(1);
  7        res varchar2(10);
  8
  9        function iterate( idx in out nocopy varchar2, arr in out nocopy avarchars)
 10           return boolean
 11        as --pragma inline;
 12        begin
 13           if idx is null
 14              then idx:=arr.first;
 15              else idx:=arr.next(idx);
 16           end if;
 17           return idx is not null;
 18        end;
 19     begin
 20       for i in 1..length(s) loop
 21          st(substr(s,i,1)):=1;
 22       end loop;
 23       while iterate(idx,st) loop
 24          res:=res||idx;
 25       end loop;
 26       return res;
 27     end;
 28
 29  select min(ff(str)) res
 30  from t_str
 31  /

RES
--------------------------------------------------------------
0123

Elapsed: 00:00:00.48
SQL> select min(fstr) res2
  2  from t_str t
  3       cross apply (
  4       select listagg(c) within group (order by 1) fstr
  5       from (select
  6              distinct substr(t.str, level, 1) c
  7             from dual
  8             connect by level <= length(t.str)
  9            )
 10       )
 11  /

RES2
--------------------------------------------------------------
0123

Elapsed: 00:00:01.01

And much easier variant if you need your strings contain digits only:

select min(translate('0123456789', translate('z0123456789','z'||str,'z'), chr(0)))
from t_str

“Collection iterator pickler fetch”: pipelined vs simple table functions

Posted on December 13, 2017 by Sayan Malakshinov Posted in oracle, PL/SQL, PL/SQL optimization, query optimizing, SQL, troubleshooting 2,754 Page views 2 Comments

Alex R recently discovered interesting thing: in SQL pipelined functions work much faster than simple non-pipelined table functions, so if you already have simple non-pipelined table function and want to get its results in sql (select * from table(fff)), it’s much better to create another pipelined function which will get and return its results through PIPE ROW().

A bit more details:

Assume we need to return collection “RESULT” from PL/SQL function into SQL query “select * from table(function_F(…))”.
If we create 2 similar functions: pipelined f_pipe and simple non-pipelined f_non_pipe,

create or replace function f_pipe(n int) return tt_id_value pipelined 
as
  result tt_id_value;
begin
  ...
  for i in 1..n loop
    pipe row (result(i));
  end loop;
end f_pipe;
/
create or replace function f_non_pipe(n int) return tt_id_value 
as
  result tt_id_value;
begin
  ...
  return result;
end f_non_pipe;
/

Full functions definitions

[sourcecode language=”sql”]
create or replace type to_id_value as object (id int, value int)
/
create or replace type tt_id_value as table of to_id_value
/
create or replace function f_pipe(n int) return tt_id_value pipelined
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
for i in 1..n loop
pipe row (result(i));
end loop;
end f_pipe;
/
create or replace function f_non_pipe(n int) return tt_id_value
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
return result;
end f_non_pipe;
/
create or replace function f_pipe_for_nonpipe(n int) return tt_id_value pipelined
as
result tt_id_value;
begin
result:=f_non_pipe(n);
for i in 1..result.count loop
pipe row (result(i));
end loop;
end;
/
create or replace function f_udf_pipe(n int) return tt_id_value pipelined
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
for i in 1..n loop
pipe row (result(i));
end loop;
end;
/
create or replace function f_udf_non_pipe(n int) return tt_id_value
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
return result;
end;
/
[/sourcecode]

[collapse]

Test queries

[sourcecode language=”sql”]
set echo on feed only timing on;
–alter session set optimizer_adaptive_plans=false;
–alter session set "_optimizer_use_feedback"=false;

select sum(id * value) s from table(f_pipe(&1));
select sum(id * value) s from table(f_non_pipe(&1));
select sum(id * value) s from table(f_pipe_for_nonpipe(&1));
select sum(id * value) s from table(f_udf_pipe(&1));
select sum(id * value) s from table(f_udf_non_pipe(&1));
with function f_inline_non_pipe(n int) return tt_id_value
as
result tt_id_value;
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
return result;
end;
select sum(id * value) s from table(f_inline_non_pipe(&1));
/
set timing off echo off feed on;
[/sourcecode]

[collapse]

we’ll find that the function with simple “return result” works at least twice slower than pipelined function:

Function	1 000 000 elements	100 000 elements
F_PIPE	2.46	0.20
F_NON_PIPE	4.39	0.44
F_PIPE_FOR_NONPIPE	2.61	0.26
F_UDF_PIPE	2.06	0.20
F_UDF_NON_PIPE	4.46	0.44

I was really surprised that even “COLLECTION ITERATOR PICKLER FETCH” with F_PIPE_FOR_NONPIPE that gets result of F_NON_PIPE and returns it through PIPE ROW() works almost twice faster than F_NON_PIPE, so I decided to analyze it using stapflame by Frits Hoogland.

I added “dbms_lock.sleep(1)” into both of these function after collection generation, to compare the difference only between “pipe row” in loop and “return result”:

Modified functions

[sourcecode language=”sql”]
create or replace function f_pipe(n int) return tt_id_value pipelined
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
dbms_lock.sleep(1);
for i in 1..n loop
pipe row (result(i));
end loop;
end f_pipe;
/
create or replace function f_non_pipe(n int) return tt_id_value
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
dbms_lock.sleep(1);
return result;
end f_non_pipe;
/
[/sourcecode]

[collapse]

And stapflame showed that almost all overhead was consumed by the function “kgmpoa_Assign_Out_Arguments”:

I don’t know what this function is doing exactly, but we can see that oracle assign collection a bit later.
From other functions in this stack(pmucpkl, kopp2isize, kopp2colsize, kopp2atsize(attribute?), kopuadt) I suspect that is some type of preprocessiong of return arguments.
What do you think about it?

Full stapflame output:
stapflame_nonpipe
stapflame_pipe

Bug with integer literals in PL/SQL

Posted on December 9, 2017 by Sayan Malakshinov Posted in curious, oracle, PL/SQL, PL/SQL optimization, undocumented 1,826 Page views Leave a comment

This interesting question was posted on our russian forum yesterday:

We have a huge PL/SQL package and this simple function returns wrong result when it’s located at the end of package body:
create or replace package body PKGXXX as
  ...
  function ffff return number is
  nRes number;
  begin        
    nRes :=  268435456;
    return nRes;
  end;
end;
/
But it works fine in any of the following cases:
* replace 268435456 with power(2, 28), or
* replace 268435456 with small literal like 268, or
* move this function to the beginning of package body

The one of the interesting findings was that the returned value is equal to the one of literals in another function.
We can reproduce this bug even with an anonymous pl/sql block. The following test case uses 32768 integer literals from 1000001 to 1032768 and prints 5 other integers:

declare n number;
begin
  n:=1000001; -- this part
  n:=1000002; -- creates
  n:=1000003; -- 32768 
   ...        -- integer
  n:=1032768; -- literals
    dbms_output.put_line('100000='||100000); -- it should print: 100000=100000
    dbms_output.put_line('32766 ='||32766);
    dbms_output.put_line('32767 ='||32767);    
    dbms_output.put_line('32768 ='||32768);
    dbms_output.put_line('32769 ='||32769);
end;

Test code

[sourcecode language=”sql”]
declare
c clob:=’declare n number;begin’||chr(10);
f varchar2(100):=’n:=%s;’||chr(10);
v varchar2(32767);
n number:=32768;
begin
for i in 1..n loop
v:=v||utl_lms.format_message(f,to_char(1e7+i));
if length(v)>30000 then
c:=c||v;
v:=”;
end if;
end loop;
v:=v||q'[
dbms_output.put_line(‘100000=’||100000);
dbms_output.put_line(‘32766 =’||32766);
dbms_output.put_line(‘32767 =’||32767);
dbms_output.put_line(‘32768 =’||32768);
dbms_output.put_line(‘32769 =’||32769);
end;
]’;
c:=c||v;
execute immediate c;
end;
/
[/sourcecode]

[collapse]

It produces the following output:

100000=10000001
32766 =32766
32767 =32767
32768 =10000002
32769 =10000003

This test case well demonstrates wrong results:
* instead of 100000 we get 10000001, which is the value from first line after “begin”, ie 1st integer literal in the code,
* for 32766 and 32767 oracle returns right values
* instead of 32768 (==32767+1) it returns 10000002, which is the integer from 2nd line, ie 2nd integer literal in the code,
* instead of 32769 (==32767+2) it returns 10000003, which is the integer from 3rd line, ie 3rd integer literal in the code
After several tests I can make a conclusion:

It doesn’t matter what plsql_optimize_level or plsql_code_type you set, was debug enabled or not, the behaviour is the same.
It seems that this is a kind of PL/SQL optimization: during parsing, oracle leaves integer literal in place if its value is in range -32768..32767 (16bit signed int), but if its value is out of this range, oracle adds this value into array of integers’ constants and replaces the value with the index of this element in this array. But because of index value overflow in cases when a count of such integer literals becomes larger than 32768, instead of Nth element of this array, oracle returns Mth element, where M is mod(N,32767).

So we can describe this behaviour using first test case:

declare n number;
begin
  n:=1000001; -- this part
  n:=1000002; -- creates
  n:=1000003; -- 32768 
   ...        -- integer
  n:=1032768; -- literals
    dbms_output.put_line('100000='||100000); -- it should print 100000, ie 32768th element of array, but prints 10000001
                                             -- where 10000001 is the 1st element of array (1==mod(32768,32767))
    dbms_output.put_line('32766 ='||32766);  -- these 2 lines print right values,
    dbms_output.put_line('32767 ='||32767);  -- because their values are in the range of -32768..32767
    dbms_output.put_line('32768 ='||32768);  -- this line contains 32769th element and prints 2nd element of array (2==mod(32769,32767))
    dbms_output.put_line('32769 ='||32769);  -- this line contains 32770th element and prints 3nd element of array (3==mod(32770,32767))
end;

The following query can help you to find objects which can potentially have this problem:

select
  s.owner,s.name,s.type
 ,sum(regexp_count(text,'(\W|^)3\d{4,}([^.0-9]|$)')) nums_count -- this regexp counts integer literals >= 30000
from dba_source s 
where 
    owner='&owner'
and type in ('FUNCTION','PROCEDURE','PACKAGE','PACKAGE BODY')
group by s.owner,s.name,s.type
having sum(regexp_count(text,'(\W|^)3\d{4,}([^.0-9]|$)'))>32767 -- filter only objects which have >=32767 integer literal

Workaround:
You may noticed that I wrote about INTEGER literals only, so the easiest workaround is to make them FLOAT – just add “.” to the end of each literal:

declare n number;
begin
  n:=1000001.;
  n:=1000002.;
  n:=1000003.;
   ...       
  n:=1032768.;
    dbms_output.put_line('100000='||100000.);
    dbms_output.put_line('32766 ='||32766.);
    dbms_output.put_line('32767 ='||32767.);    
    dbms_output.put_line('32768 ='||32768.);
    dbms_output.put_line('32769 ='||32769.);
end;

Fixed test cases

[sourcecode language=”sql”]
declare
c clob:=’declare n number;begin’||chr(10);
f varchar2(100):=’n:=%s.;’||chr(10); — I’ve added here "."
v varchar2(32767);
n number:=32768;
begin
for i in 1..n loop
v:=v||utl_lms.format_message(f,to_char(1e7+i));
if length(v)>30000 then
c:=c||v;
v:=”;
end if;
end loop;
v:=v||q'[
dbms_output.put_line(‘100000=’||100000.); — .
dbms_output.put_line(‘32766 =’||32766.);
dbms_output.put_line(‘32767 =’||32767.);
dbms_output.put_line(‘32768 =’||32768.);
dbms_output.put_line(‘32769 =’||32769.);
end;
]’;
c:=c||v;
execute immediate c;
end;
/
[/sourcecode]

[collapse]

PL/SQL functions: Iterate and keys for associative arrays

Posted on October 2, 2017 by Sayan Malakshinov Posted in oracle, PL/SQL, PL/SQL optimization 2,675 Page views Leave a comment

Unfortunately associative arrays still require more “coding”:
we still can’t use “indices of” or “values of” in simple FOR(though they are available for FORALL for a long time), don’t have convinient iterators and even function to get all keys…
That’s why I want to show my templates for such things like iterator and keys function. You can adopt these functions and create them on schema level.

declare 
   type numbers  is table of number;
   type anumbers is table of number index by pls_integer;
   a anumbers;
   i pls_integer;

   function iterate( idx in out nocopy pls_integer, arr in out nocopy anumbers) 
      return boolean
   as pragma inline;
   begin
      if idx is null 
         then idx:=arr.first; 
         else idx:=arr.next(idx);
      end if;
      return idx is not null;
   end;

  function keys(a in out nocopy anumbers) return numbers as
     res numbers:=numbers();
     idx number;
     pragma inline;
  begin
     while iterate(idx,a) loop
        res.extend;
        res(res.count):=idx;
     end loop;
     return res;
  end;

begin
   a(1):=10;
   a(3):=30;
   a(5):=50;
   a(8):=80;
   -- iterate:
   while iterate(i,a) loop
      dbms_output.put_line(a(i));
   end loop;

   -- keys:
   for i in 1..keys(a).count loop
      dbms_output.put_line(a(keys(a)(i)));
   end loop;

end;

How to group connected elements (or pairs)

Posted on September 29, 2017 by Sayan Malakshinov Posted in oracle, PL/SQL, PL/SQL optimization, query optimizing, SQL 1,858 Page views 7 Comments

I see quite often when developers ask questions about connected components:

Table “MESSAGES” contains fields “SENDER” and “RECIPIENT”, which store clients id.
How to quickly get all groups of clients who are connected even through other clients if the table has X million rows?
So for this table, there should be 4 groups:

(1, 2, 4, 8, 16)
(3, 6, 12)
(5, 10, 20)
(7, 14)
(9, 18)

SENDER	RECIPIENT
1	2
2	4
3	6
4	8
5	10
6	12
7	14
8	16
9	18
10	20

Of course, we can solve this problem using SQL only (model, recursive subquery factoring or connect by with nocycle), but such solutions will be too slow for huge tables.

Example of SQL solution

[sourcecode language=”sql”]
with
t(sender,recipient) as (select level,level*2 from dual connect by level<=10)
, v1 as (select rownum id,t.* from t)
, v2 as (select id, account
from v1
unpivot (
account for x in (sender,recipient)
))
, v3 as (
select
id
,account
,dense_rank()over(order by account) account_n
,count(*)over() cnt
from v2)
, v4 as (
select distinct grp,account
from v3
model
dimension by (id,account_n)
measures(id grp,account,cnt)
rules
iterate(1e6)until(iteration_number>cnt[1,1])(
grp[any,any] = min(grp)[any,cv()]
,grp[any,any] = min(grp)[cv(),any]
)
)
select
listagg(account,’,’)within group(order by account) s
from v4
group by grp
[/sourcecode]

[collapse]

In such situations it’s much better to adopt standard algorithms like Quick-find or Weighted quick-union for PL/SQL.
The first time I wrote such solution about 5 years ago and I even posted here one of the latest solutions, but all of them were not universal, so I’ve created the package today with a couple of functions for most common problems: XT_CONNECTED_COMPONENTS

It contains 2 functions based on Weighted quick-find quick-union algorithm:

function get_strings(cur in sys_refcursor, delim varchar2:=’,’) return strings_array pipelined;
It accepts a cursor and returns found connected components as table of varchar2(v_size). You can change v_size in the package definition.
Input cursor should contain one Varchar2 column with linked strings, for example: ‘a,b,c’.
You can also specify list delimiter, by default it is comma.
Examples:

select * from table(xt_connected_components.get_strings( cursor(select ELEM1||','||ELEM2 from TEST));
select * 
from
 table(
   xt_connected_components.get_strings( 
     cursor(select 'a,b,c' from dual union all
            select 'd,e,f' from dual union all
            select 'e,c'   from dual union all
            select 'z'     from dual union all
            select 'X,Y'   from dual union all
            select 'Y,Z'   from dual)));
COLUMN_VALUE
-----------------------------------------
STRINGS('X', 'Y', 'Z')
STRINGS('a', 'b', 'c', 'd', 'e', 'f')
STRINGS('z')

function get_numbers(cur in sys_refcursor) return numbers_array pipelined;
This function also returns connected components, but for numbers.
Input cursor should contain two columns with linked numbers.
Examples:

select * 
from table(
        xt_connected_components.get_numbers( 
          cursor(
            select sender_id, recipient_id from messages
        )));
select * 
from
  table(
    xt_connected_components.get_numbers( 
       cursor(
          select level   account1
               , level*2 account2 
          from dual 
          connect by level<=10
    )));
SQL> select *
  2  from
  3    table(
  4      xt_connected_components.get_numbers(
  5         cursor(
  6            select level   account1
  7                 , level*2 account2
  8            from dual
  9            connect by level<=10
 10*     )))
SQL> /

COLUMN_VALUE
------------------------
NUMBERS(1, 2, 4, 8, 16)
NUMBERS(3, 6, 12)
NUMBERS(5, 10, 20)
NUMBERS(7, 14)
NUMBERS(9, 18)

How to install:
Download all files from Github and execute “@install” in SQL*Plus or execute them in another tool in the following order:
xt_connected_components_types.sql
xt_connected_components_pkg.sql
xt_connected_components_bdy.sql

Download URL: https://github.com/xtender/xt_scripts/tree/master/extra/xt_connected_components

Single SQL vs SQL+PL/SQL

Posted on February 28, 2014 by Sayan Malakshinov Posted in oracle, PL/SQL, PL/SQL optimization, query optimizing 3,224 Page views 1 Comment

Everyone knows Tom Kyte’s mantra:

You should do it in a single SQL statement if at all possible.

But we all know that “Every rule has an exception”
There are many different cases when pl/sql with sql can be more efficient than only sql, and i dont want to catalog them. I just want to show a couple examples of such exceptions:

1. Running totals by several dimensions

Simple example from forum:

select dt,
       dim1,
       dim2,
       val,
       sum(val) over(partition by dim1 order by dt) dim1_cumulative_sum,
       sum(val) over(partition by dim2 order by dt) dim2_cumulative_sum,
       sum(val) over(partition by dim1, dim2 order by dt) dim1_dim2_cumulative_sum
  from mg_t
 order by dt;

This query will be very hard for big data sets, so we can do it efficiently with pl/sql:

create or replace function xf_to_drop return xt2_to_drop pipelined
is
   type tt  is table of number index by pls_integer;
   type tt2 is table of tt index by pls_integer;
   dim1_c tt;
   dim2_c tt;
   dim12_c tt2;
begin
   for r in (
            select dt,
                   dim1,
                   dim2,
                   val
              from mg_t
             order by dt
   )
   loop
      dim1_c(r.dim1):=case when dim1_c.exists(r.dim1) then dim1_c(r.dim1) else 0 end + r.val;
      dim2_c(r.dim1):=case when dim2_c.exists(r.dim1) then dim2_c(r.dim1) else 0 end + r.val;
      dim12_c(r.dim1)(r.dim2):=case
                                  when dim12_c.exists(r.dim1)
                                   and dim12_c(r.dim1).exists(r.dim2)
                                  then dim12_c(r.dim1)(r.dim2)
                                  else 0
                               end + r.val;
      pipe row (xt1_to_drop( r.dt
                            ,r.dim1
                            ,r.dim2
                            ,r.val
                            ,dim1_c(r.dim1)
                            ,dim1_c(r.dim1)
                            ,dim12_c(r.dim1)(r.dim2)
                           ));
   end loop;
end;
/

Full example

[sourcecode language=”sql”]
create table mg_t as
select trunc(sysdate) + level/1440 dt,
trunc(3 * dbms_random.value()) dim1,
trunc(3 * dbms_random.value()) dim2,
trunc(100 * dbms_random.value()) val
from dual
connect by level <= 3e6;

create type xt1_to_drop is object(
dt date
,dim1 number
,dim2 number
,val number
,dim1_cumulative_sum number
,dim2_cumulative_sum number
,dim1_dim2_cumulative_sum number
);
create type xt2_to_drop as table of xt1_to_drop;

create or replace function xf_to_drop return xt2_to_drop pipelined
is
type tt is table of number index by pls_integer;
type tt2 is table of tt index by pls_integer;
dim1_c tt;
dim2_c tt;
dim12_c tt2;
begin
for r in (
select dt,
dim1,
dim2,
val
from mg_t
order by dt
)
loop
dim1_c(r.dim1):=case when dim1_c.exists(r.dim1) then dim1_c(r.dim1) else 0 end + r.val;
dim2_c(r.dim1):=case when dim2_c.exists(r.dim1) then dim2_c(r.dim1) else 0 end + r.val;
dim12_c(r.dim1)(r.dim2):=case
when dim12_c.exists(r.dim1)
and dim12_c(r.dim1).exists(r.dim2)
then dim12_c(r.dim1)(r.dim2)
else 0
end + r.val;
pipe row (xt1_to_drop( r.dt,r.dim1,r.dim2,r.val,dim1_c(r.dim1),dim1_c(r.dim1),dim12_c(r.dim1)(r.dim2)));
end loop;
end;
/
exec for r in (select * from table(xf_to_drop)) loop null; end loop;
[/sourcecode]

[collapse]

2. Finding connected components

Assume that we have big table with many-to-many relationship:

create table test (clientid NUMBER(10), accountid NUMBER(10));

How we can find all connected groups?

This example also taken from our russian forum and there was very good and simple sql-only solution, but it’s not efficient on big data sets:

select min(group_member_id) as group_max_id, accountid, clientid
  from  (select clientid as group_member_id
         , connect_by_root accountid as accountid
         , connect_by_root clientid  as clientid
      from test
      connect by nocycle decode(accountid, prior accountid, 1, 0)
                       + decode(clientid,  prior clientid,  1, 0)
                       = 1
  ) a
  group by accountid, clientid
  order by group_max_id, accountid
/

This pure SQL solution is for the cases when ClientId and AccountId are different entities. If they are the same entities in your case, you need to use UNION ALL:
select min(group_member_id) as group_max_id, accountid, clientid
from (select clientid as group_member_id
, connect_by_root accountid as accountid
, connect_by_root clientid as clientid
from test
connect by nocycle decode(accountid, prior accountid, 1, 0)
+ decode(clientid, prior clientid, 1, 0)
= 1
) a
group by accountid, clientid
order by group_max_id, accountid
/

select min(group_member_id) as group_max_id, accountid, clientid
  from  (select clientid as group_member_id
         , connect_by_root accountid as accountid
         , connect_by_root clientid  as clientid
      from (select accountid, clientid from test union all select clientid,accountid from test)
      connect by nocycle decode(accountid, prior accountid, 1, 0)
                       + decode(clientid,  prior clientid,  1, 0)
                       = 1
  ) a
  group by accountid, clientid
  order by group_max_id, accountid
/

We can try to remember algorithms courses and adopt one of the several algorithms for connected components:

Based on weighted quick-union algorithm

[sourcecode language=”sql”]
declare
type int_array is table of pls_integer index by pls_integer;
type arr_elems is table of sys.ku$_objnumset index by pls_integer;
root int_array;
root_elems arr_elems;

n int;
clients int_array;
accounts int_array;

l integer:=dbms_utility.get_time();

procedure print(v in varchar2) is
begin
dbms_output.put_line(to_char((dbms_utility.get_time-l)/100,’0999.99′)||’ ‘||v);
l:=dbms_utility.get_time();
end;

function get_root(n int) return pls_integer is
begin
if root.exists(n) then
return root(n);
else
return null;
end if;
end;

procedure update_root(old_root pls_integer,new_root pls_integer) is
i pls_integer;
elem pls_integer;
cnt_old pls_integer;
cnt_new pls_integer;
begin
if old_root!=new_root then
–root_elems(new_root):=root_elems(new_root) multiset union all root_elems(old_root);
cnt_old:=root_elems(old_root).count;
cnt_new:=root_elems(new_root).count;
root_elems(new_root).extend(cnt_old);
for i in 1..cnt_old
loop
elem := root_elems(old_root)(i);
root(elem):=new_root;
root_elems(new_root)(cnt_new+i):=elem;
end loop;
root_elems(old_root).delete;
end if;
end;

procedure add_elem(p_root pls_integer, p_elem pls_integer) is
begin
if not root_elems.exists(p_root) then
root_elems(p_root):=sys.ku$_objnumset(p_elem);
else
root_elems(p_root).extend();
root_elems(p_root)(root_elems(p_root).count):=p_elem;
end if;
end;

procedure add_link(clientid pls_integer,accountid pls_integer) is
r1 pls_integer;
r2 pls_integer;
new_root pls_integer;
begin
r1:=get_root(clientid);
r2:=get_root(accountid);

if r1 is null or r2 is null then
new_root := coalesce(r1,r2,clientid);
if r1 is null then add_elem(new_root,clientid ); root(clientid) :=new_root; end if;
if r2 is null then add_elem(new_root,accountid); root(accountid):=new_root; end if;
else
new_root := least(r1,r2);
root(clientid) :=new_root;
root(accountid):=new_root;
update_root(greatest(r1,r2),new_root);
end if;
end;

function str_format(p int) return varchar2 is
begin
return utl_lms.format_message(‘(%d, %d) = group #%d’
,clients(p)
,accounts(p)
,get_root(clients(p))
);
end;
begin
print(‘start’);
select clientid,accountid
bulk collect into clients,accounts
from test
— where rownum<=1000
;
print(‘fetched’);
n:=clients.count;
dbms_output.put_line(‘count=’||n);
for i in 1..n loop
add_link(clients(i),accounts(i));
end loop;
print(‘processed’);
—
/*
for i in 1..n loop
dbms_output.put_line(str_format(i));
end loop;
— */
end;
[/sourcecode]

[collapse]

We can also try even more interesting special algorithms for parallel processing: CONNECTED COMPONENTS ALGORITHMS
FOR MESH-CONNECTED PARALLEL COMPUTERS

Too many function executions in simple query

Posted on June 10, 2013 by Sayan Malakshinov Posted in CBO, oracle, PL/SQL optimization, undocumented 2,975 Page views Leave a comment

Suppose we have a table with 10 rows:

SQL> select id from t10;

        ID
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

And we have the query:

select *
from (
     select xf(t10.id) a
     from t10 
     )
where a*a >= 25

At first it may seem that the function should be executed as many times as rows in a table T10, i.e. 10 times.
Lets test it:

SQL> create or replace function xf(p int) return int as
  2  begin
  3    dbms_output.put_line('F fired!');
  4    return p;
  5  end;
  6  /

Function created.

SQL> set serverout on;
SQL> select *
  2  from (
  3       select xf(t10.id) a
  4       from t10
  5       )
  6  where a*a >= 25
  7  /

         A
----------
         5
         6
         7
         8
         9
        10

6 rows selected.

F fired!
F fired!
F fired!
F fired!
F fired! -- 5
F fired!
F fired!
F fired!
F fired!
F fired! -- 10
F fired!
F fired!
F fired!
F fired!
F fired! -- 15
F fired!
F fired!
F fired!
F fired!
F fired! -- 20
F fired!
F fired!
F fired!
F fired!
F fired! -- 25
F fired!

As you see, there are more than 10 executions, so lets see the execution plan:

SQL> @xplan +projection

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 2919944937

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     3 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T10  |     1 |     3 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("XF"("T10"."ID")*"XF"("T10"."ID")>=25)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "T10"."ID"[NUMBER,22]

Now you see that inner view was merged, and the function was executed 20 times in the filter and 6 times on the fetch after filtering(6 rows – 6 calls).
I see that often in such cases “no_merge” hint is suggested, but let’s test it:

SQL> select *
  2  from (
  3       select/*+ no_merge */ xf(t10.id) a
  4       from t10
  5       )
  6  where a*a >= 25
  7  /

         A
----------
         5
         6
         7
         8
         9
        10

6 rows selected.

F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!
F fired!

As you can see, the number of function calls wasn’t changed.
And if we look into the plan, we understood why:

SQL> @xplan +projection

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 2027387203

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |     3   (0)| 00:00:01 |
|   1 |  VIEW              |      |     1 |    13 |     3   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T10  |     1 |     3 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("XF"("T10"."ID")*"XF"("T10"."ID")>=25)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "A"[NUMBER,22]
   2 - "T10"."ID"[NUMBER,22]

20 rows selected.

Now you see, that main problem is the “filter pushdown” transformation. Previously, if we were to disable the “filter pushdown” operation, we had to use a variety of tricks, such as “materialize” hint or adding the “rownum” in expession, etc. But all these solutions require rewriting the query.
But from 11.2.0.3 we can use “_optimizer_filter_pushdown” parameter, for example:

SQL> begin
  2    dbms_sqltune.import_sql_profile(
  3      sql_text    => 'select * from (select xf(t10.id) a from t10) where a*a >= 25'
  4     ,profile     => sys.sqlprof_attr(
  5                             q'[NO_MERGE(@SEL$2)]'
  6                            ,q'[OPT_PARAM('_OPTIMIZER_FILTER_PUSHDOWN' 'FALSE')]'
  7                          )
  8     ,category    => 'DEFAULT'
  9     ,name        => 'TEST_PROFILE'
 10     ,force_match => true
 11     ,replace     => true
 12    );
 13  end;
 14  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.16
SQL> set serverout on
SQL> select * from (select xf(t10.id) a from t10) where a*a >= 25;

         A
----------
         5
         6
         7
         8
         9
        10

6 rows selected.

F fired!
F fired!
F fired!
F fired!
F fired! -- 5
F fired!
F fired!
F fired!
F fired!
F fired! -- 10
F fired! -- extra execution because of sql*plus
Elapsed: 00:00:00.17
-- there are no extra calls when we fetches by 100 rows in pl/sql:
SQL> exec for r in (select * from (select xf(t10.id) a from t10) where a*a >= 25) loop null; end loop;
F fired!
F fired!
F fired!
F fired!
F fired! -- 5
F fired!
F fired!
F fired!
F fired!
F fired! -- 10

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.22

With changing arraysize

[sourcecode language=”sql”]
— with fetching by 1 row:
SQL> declare
2 cursor c is select * from (select xf(t10.id) a from t10) where a*a >= 25;
3 n number;
4 begin
5 open c;
6 loop
7 fetch c into n;
8 exit when c%notfound;
9 end loop;
10 end;
11 /
F fired!
F fired!
F fired!
F fired!
F fired! — 5
F fired!
F fired!
F fired!
F fired!
F fired! — 10
F fired!
F fired!
F fired!
F fired!
F fired! — 15
F fired!
F fired!
F fired!
F fired!
F fired! — 20
F fired!
F fired! — 22

PL/SQL procedure successfully completed.

— with arraysize = 3
SQL> set arraysi 3
SQL> select * from (select xf(t10.id) a from t10) where a*a >= 25;

A
———-
5
6
7
8
9
10

6 rows selected.

F fired!
F fired!
F fired!
F fired!
F fired! — 5
F fired!
F fired!
F fired!
F fired!
F fired! — 10
F fired!
F fired!
F fired!
F fired! — 14
Elapsed: 00:00:00.45
SQL> set arraysi 2
SQL> select * from (select xf(t10.id) a from t10) where a*a >= 25;

A
———-
5
6
7
8
9
10

6 rows selected.

F fired!
F fired!
F fired!
F fired!
F fired! — 5
F fired!
F fired!
F fired!
F fired!
F fired! — 10
F fired!
F fired!
F fired!
F fired!
F fired!
F fired! — 16
Elapsed: 00:00:00.72
[/sourcecode]

[collapse]

PS. I found that about “_optimizer_filter_pushdown” parameter already wrote Randolf Geist

How works optimization of loops in PL/SQL in 11g: Part 1. Deterministic functions

Posted on March 13, 2013 by Sayan Malakshinov Posted in deterministic functions, oracle, PL/SQL optimization 2,928 Page views 4 Comments

As you know, since 10g Oracle can optimize PL/SQL loops with moving code out of loops for reducing number of identical code executions.
If you have not read yet about it, I advise you to first read these two articles:
1. “PLSQL_OPTIMIZE_LEVEL: The optimization strategy of Oracle” by Saurabh K. Gupta
2. “PL/SQL optimisation in 10g” by Adrian Billington

But since 11g Oracle also can optimize code with deterministic functions too. For this to happen, code must meet the following conditions:
1. PLSQL_OPTIMIZE_LEVEL greater or equal 2
2. Parameters should not be changed in the loop body. Strictly speaking, there should not be any assignment of parameters.
3. Should not be any implicit conversions of parameters in function call, i.e. all variables/literals must be the same type as input parameters declared.
4. Should not be any call of non-deterministic functions (except some standard sql functions like to_date, to_char, nvl) or procedures in the loop

Note that this rules concerns only same scope level as this loop and not inner loops or another blocks.

Let me show how it works on simple examples: