Oracle SQL | Category Archives: SQL

v$sql_hint.target_level

Posted on May 28, 2019 by Sayan Malakshinov Posted in CBO, oracle, SQL, undocumented 2,025 Page views Leave a comment

Today I wanted to give a link to the description of v$sql_hint.target_level to show that no_parallel can be specified for statement or object, and though it’s pretty obvious, but surprisingly I haven’t found any articles or posts about it, so this short post describes it.
v$sql_hint.target_level is a bitset, where
1st bit set to 1 means that the hint can be specified on statement level,
2nd – on query block level,
3rd – on object level,
4th – on join level(for multiple objects).
Short example:

   select name,sql_feature
          ,class,inverse
          ,version,version_outline
          ,target_level
         ,decode(bitand(target_level,1),0,'no','yes') Statement_level
         ,decode(bitand(target_level,2),0,'no','yes') Query_block_level
         ,decode(bitand(target_level,4),0,'no','yes') Object_level
         ,decode(bitand(target_level,8),0,'no','yes') Join_level
   from v$sql_hint h;

with hints as (
   select name,sql_feature
          ,class,inverse
          ,version,version_outline
          ,target_level
         ,decode(bitand(target_level,1),0,'no','yes') Statement_level
         ,decode(bitand(target_level,2),0,'no','yes') Query_block_level
         ,decode(bitand(target_level,4),0,'no','yes') Object_level
         ,decode(bitand(target_level,8),0,'no','yes') Join_level
   from v$sql_hint h
)
select *
from hints
where statement_level='yes'
  and to_number(regexp_substr(version,'^\d+')) >= 18
order by version;

Result:

NAME              SQL_FEATURE     CLASS                VERSION  TARGET_LEVEL STATEMENT_LEVEL QUERY_BLOCK_LEVEL OBJECT_LEVEL JOIN_LEVEL
----------------- --------------- -------------------- -------- ------------ --------------- ----------------- ------------ ----------
PDB_LOCAL_ONLY    QKSFM_DML       PDB_LOCAL_ONLY       18.1.0              1 yes             no                no           no
SUPPRESS_LOAD     QKSFM_DDL       SUPPRESS_LOAD        18.1.0              1 yes             no                no           no
SYSTEM_STATS      QKSFM_ALL       SYSTEM_STATS         18.1.0              1 yes             no                no           no
MEMOPTIMIZE_WRITE QKSFM_EXECUTION MEMOPTIMIZE_WRITE    18.1.0              1 yes             no                no           no
SKIP_PROXY        QKSFM_ALL       SKIP_PROXY           18.1.0              1 yes             no                no           no
CURRENT_INSTANCE  QKSFM_ALL       CURRENT_INSTANCE     18.1.0              1 yes             no                no           no
JSON_LENGTH       QKSFM_EXECUTION JSON_LENGTH          19.1.0              1 yes             no                no           no
QUARANTINE        QKSFM_EXECUTION QUARANTINE           19.1.0              1 yes             no                no           no

Top time-consuming predicates from ASH

Posted on May 13, 2019 by Sayan Malakshinov Posted in oracle, query optimizing, SQL, statistics, troubleshooting 1,893 Page views Leave a comment

Sometimes it might be useful to analyze top time-consuming filter and access predicates from ASH, especially in cases when db load is spread evenly enough by different queries and top segments do not show anything interesting, except usual things like “some tables are requested more often than others”.
Of course, we can start from analysis of SYS.COL_USAGE$: col_usage.sql

col_usage.sql

[sourcecode language=”sql”]
col owner format a30
col oname format a30 heading "Object name"
col cname format a30 heading "Column name"
accept owner_mask prompt "Enter owner mask: ";
accept tab_name prompt "Enter tab_name mask: ";
accept col_name prompt "Enter col_name mask: ";

SELECT a.username as owner
,o.name as oname
,c.name as cname
,u.equality_preds as equality_preds
,u.equijoin_preds as equijoin_preds
,u.nonequijoin_preds as nonequijoin_preds
,u.range_preds as range_preds
,u.like_preds as like_preds
,u.null_preds as null_preds
,to_char(u.timestamp, ‘yyyy-mm-dd hh24:mi:ss’) when
FROM
sys.col_usage$ u
, sys.obj$ o
, sys.col$ c
, all_users a
WHERE a.user_id = o.owner#
AND u.obj# = o.obj#
AND u.obj# = c.obj#
AND u.intcol# = c.col#
AND a.username like upper(‘&owner_mask’)
AND o.name like upper(‘&tab_name’)
AND c.name like upper(‘&col_name’)
ORDER BY a.username, o.name, c.name
;
col owner clear;
col oname clear;
col cname clear;
undef tab_name col_name owner_mask;
[/sourcecode]

[collapse]

But it’s not enough, for example it doesn’t show predicates combinations. In this case we can use v$active_session_history and v$sql_plan:

Top 50 predicates

[sourcecode language=”sql”]
with
ash as (
select
sql_id
,plan_hash_value
,table_name
,alias
,ACCESS_PREDICATES
,FILTER_PREDICATES
,count(*) cnt
from (
select
h.sql_id
,h.SQL_PLAN_HASH_VALUE plan_hash_value
,decode(p.OPERATION
,’TABLE ACCESS’,p.OBJECT_OWNER||’.’||p.OBJECT_NAME
,(select i.TABLE_OWNER||’.’||i.TABLE_NAME from dba_indexes i where i.OWNER=p.OBJECT_OWNER and i.index_name=p.OBJECT_NAME)
) table_name
,OBJECT_ALIAS ALIAS
,p.ACCESS_PREDICATES
,p.FILTER_PREDICATES
— поля, которые могут быть полезны для анализа в других разрезах:
— ,h.sql_plan_operation
— ,h.sql_plan_options
— ,decode(h.session_state,’ON CPU’,’ON CPU’,h.event) event
— ,h.current_obj#
from v$active_session_history h
,v$sql_plan p
where h.sql_opname=’SELECT’
and h.IN_SQL_EXECUTION=’Y’
and h.sql_plan_operation in (‘INDEX’,’TABLE ACCESS’)
and p.SQL_ID = h.sql_id
and p.CHILD_NUMBER = h.SQL_CHILD_NUMBER
and p.ID = h.SQL_PLAN_LINE_ID
— если захотим за последние 3 часа:
— and h.sample_time >= systimestamp – interval ‘3’ hour
)
— если захотим анализируем предикаты только одной таблицы:
— where table_name=’&OWNER.&TABNAME’
group by
sql_id
,plan_hash_value
,table_name
,alias
,ACCESS_PREDICATES
,FILTER_PREDICATES
)
,agg_by_alias as (
select
table_name
,regexp_substr(ALIAS,’^[^@]+’) ALIAS
,listagg(ACCESS_PREDICATES,’ ‘) within group(order by ACCESS_PREDICATES) ACCESS_PREDICATES
,listagg(FILTER_PREDICATES,’ ‘) within group(order by FILTER_PREDICATES) FILTER_PREDICATES
,sum(cnt) cnt
from ash
group by
sql_id
,plan_hash_value
,table_name
,alias
)
,agg as (
select
table_name
,’ALIAS’ alias
,replace(access_predicates,’"’||alias||’".’,’"ALIAS".’) access_predicates
,replace(filter_predicates,’"’||alias||’".’,’"ALIAS".’) filter_predicates
,sum(cnt) cnt
from agg_by_alias
group by
table_name
,replace(access_predicates,’"’||alias||’".’,’"ALIAS".’)
,replace(filter_predicates,’"’||alias||’".’,’"ALIAS".’)
)
,cols as (
select
table_name
,cols
,access_predicates
,filter_predicates
,sum(cnt)over(partition by table_name,cols) total_by_cols
,cnt
from agg
,xmltable(
‘string-join(for $c in /ROWSET/ROW/COL order by $c return $c,",")’
passing
xmltype(
cursor(
(select distinct
nvl(
regexp_substr(
access_predicates||’ ‘||filter_predicates
,'("’||alias||’"\.|[^.]|^)"([A-Z0-9#_$]+)"([^.]|$)’
,1
,level
,’i’,2
),’ ‘)
col
from dual
connect by
level<=regexp_count(
access_predicates||’ ‘||filter_predicates
,'("’||alias||’"\.|[^.]|^)"([A-Z0-9#_$]+)"([^.]|$)’
)
)
))
columns cols varchar2(400) path ‘.’
)(+)
order by total_by_cols desc, table_name, cnt desc
)
select
table_name
,cols
,sum(cnt)over(partition by table_name,cols) total_by_cols
,access_predicates
,filter_predicates
,cnt
from cols
where rownum<=50
order by total_by_cols desc, table_name, cnt desc;
[/sourcecode]

[collapse]

As you can see it shows top 50 predicates and their columns for last 3 hours. Despite the fact that ASH stores just sampled data, its results are representative enough for high-load databases.
Just few details:

Column “COLS” shows “search columns”, and total_by_cols – their number of occurrences
I think it’s obvious, that this info is not unambiguous marker of the problem, because for example few full table scans can misrepresent the statistics, so sometimes you will need to analyze such queries deeper (v$sqlstats,dba_hist_sqlstat)
We need to group data by OBJECT_ALIAS within SQL_ID and plan_hash_value, because in case of index access with lookup to table(“table access by rowid”) some predicates are in the row with index access and others are in the row with table access.

Depending on the needs, we can modify this query to analyze ASH data by different dimensions, for example with additional analysis of partitioning or wait events.

Top-N again: fetch first N rows only vs rownum

Posted on December 30, 2018 by Sayan Malakshinov Posted in adaptive serial direct path reads, CBO, oracle, query optimizing, SQL, troubleshooting 7,471 Page views Leave a comment

Three interesting myths about rowlimiting clause vs rownum have recently been posted on our Russian forum:

TopN query with rownum<=N is always faster than "fetch first N rows only" (ie. row_number()over(order by ...)<=N)
“fetch first N rows only” is always faster than rownum<=N
“SORT ORDER BY STOPKEY” stores just N top records during sorting, while “WINDOW SORT PUSHED RANK” sorts all input and stores all records sorted in memory.

Interestingly that after Vyacheslav posted first statement as an axiom and someone posted old tests(from 2009) and few people made own tests which showed that “fetch first N rows” is about 2-3 times faster than the query with rownum, the final decision was that “fetch first” is always faster.

First of all I want to show that statement #3 is wrong and “WINDOW SORT PUSHED RANK” with row_number works similarly as “SORT ORDER BY STOPKEY”:
It’s pretty easy to show using sort trace:
Let’s create simple small table Tests1 with 1000 rows where A is in range 1-1000 (just 1 block):

create table test1(a not null, b) as
  select level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS '10032 trace name context forever, level 10';

ALTER SESSION SET tracefile_identifier = 'rownum';
select * from (select * from test1 order by a) where rownum<=10;

ALTER SESSION SET tracefile_identifier = 'rownumber';
select * from test1 order by a fetch first 10 rows only;

And we can see from the trace files that both queries did the same number of comparisons:

rownum:

[sourcecode language=”sql” highlight=”7″]
—– Current SQL Statement for this session (sql_id=bbg66rcbt76zt) —–
select * from (select * from test1 order by a) where rownum<=10

—- Sort Statistics ——————————
Input records 1000
Output records 10
Total number of comparisons performed 999
Comparisons performed by in-memory sort 999
Total amount of memory used 2048
Uses version 1 sort
—- End of Sort Statistics ———————–
[/sourcecode]

[collapse]

row_number

[sourcecode language=”sql” highlight=”7″]
—– Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) —–
select * from test1 order by a fetch first 10 rows only

[collapse]

Ie. each row (except first one) was compared with the biggest value from top 10 values and since they were bigger than top 10 value, oracle doesn’t compare it with other TopN values.

And if we change the order of rows in the table both of these queries will do the same number of comparisons again:

from 999 to 0

[sourcecode language=”sql”]
create table test1(a not null, b) as
select 1000-level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS ‘10032 trace name context forever, level 10’;

ALTER SESSION SET tracefile_identifier = ‘rownum’;
select * from (select * from test1 order by a) where rownum<=10;

ALTER SESSION SET tracefile_identifier = ‘rownumber’;
select * from test1 order by a fetch first 10 rows only;
[/sourcecode]

[collapse]

rownum

[sourcecode language=”sql” highlight=”7″]
—– Current SQL Statement for this session (sql_id=bbg66rcbt76zt) —–
select * from (select * from test1 order by a) where rownum<=10

—- Sort Statistics ——————————
Input records 1000
Output records 1000
Total number of comparisons performed 4976
Comparisons performed by in-memory sort 4976
Total amount of memory used 2048
Uses version 1 sort
—- End of Sort Statistics ———————–
[/sourcecode]

[collapse]

row_number

[sourcecode language=”sql” highlight=”7″]
—– Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) —–
select * from test1 order by a fetch first 10 rows only

[collapse]

We can see that both queries required much more comparisons(4976) here, that’s because each new value is smaller than the biggest value from the topN and even smaller than lowest value, so oracle should get right position for it and it requires 5 comparisons for that (it compares with 10th value, then with 6th, 3rd, 2nd and 1st values from top10). Obviously it makes less comparisons for the first 10 rows.

Now let’s talk about statements #1 and #2:
We know that rownum forces optimizer_mode to switch to “first K rows”, because of the parameter “_optimizer_rownum_pred_based_fkr”

SQL> @param_ rownum

NAME                               VALUE  DEFLT  TYPE      DESCRIPTION
---------------------------------- ------ ------ --------- ------------------------------------------------------
_optimizer_rownum_bind_default     10     TRUE   number    Default value to use for rownum bind
_optimizer_rownum_pred_based_fkr   TRUE   TRUE   boolean   enable the use of first K rows due to rownum predicate
_px_rownum_pd                      TRUE   TRUE   boolean   turn off/on parallel rownum pushdown optimization

while fetch first/row_number doesn’t (it will be changed after the patch #22174392) and it leads to the following consequences:
1. first_rows disables serial direct reads optimization(or smartscan on Exadata), that’s why the tests with big tables showed that “fetch first” were much faster than the query with rownum.
So if we set “_serial_direct_read”=always, we get the same performance in both tests (within the margin of error).

2. In cases when index access (index full scan/index range scan) is better, CBO differently calculates the cardinality of underlying INDEX FULL(range) SCAN:
the query with rownum is optimized for first_k_rows and the cardinality of index access is equal to K rows, but CBO doesn’t reduce cardinality for “fetch first”, so the cost of index access is much higher, compare them:

rownum

[sourcecode language=”sql” highlight=”13″]
SQL> explain plan for
2 select *
3 from (select * from test order by a,b)
4 where rownum<=10;

——————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————–
| 0 | SELECT STATEMENT | | 10 | 390 | 4 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 10 | 390 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1000K| 12M| 4 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | IX_TEST_AB | 10 | | 3 (0)| 00:00:01 |
——————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – filter(ROWNUM<=10)
[/sourcecode]

[collapse]

fetch first

[sourcecode language=”sql” highlight=”13″]
SQL> explain plan for
2 select *
3 from test
4 order by a,b
5 fetch first 10 rows only;

—————————————————————————————–
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
—————————————————————————————–
| 0 | SELECT STATEMENT | | 10 | 780 | | 5438 (1)| 00:00:01 |
|* 1 | VIEW | | 10 | 780 | | 5438 (1)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 1000K| 12M| 22M| 5438 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST | 1000K| 12M| | 690 (1)| 00:00:01 |
—————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
2 – filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)
[/sourcecode]

[collapse]

fetch first + first_rows

[sourcecode language=”sql” highlight=”14″]
SQL> explain plan for
2 select/*+ first_rows */ *
3 from test
4 order by a,b
5 fetch first 10 rows only;

——————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————–
| 0 | SELECT STATEMENT | | 10 | 780 | 27376 (1)| 00:00:02 |
|* 1 | VIEW | | 10 | 780 | 27376 (1)| 00:00:02 |
|* 2 | WINDOW NOSORT STOPKEY | | 1000K| 12M| 27376 (1)| 00:00:02 |
| 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1000K| 12M| 27376 (1)| 00:00:02 |
| 4 | INDEX FULL SCAN | IX_TEST_AB | 1000K| | 2637 (1)| 00:00:01 |
——————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
2 – filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)
[/sourcecode]

[collapse]

fetch first + index

[sourcecode language=”sql” highlight=”14″]
SQL> explain plan for
2 select/*+ index(test (a,b)) */ *
3 from test
4 order by a,b
5 fetch first 10 rows only;

Predicate Information (identified by operation id):
—————————————————

1 – filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
2 – filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)
[/sourcecode]

[collapse]

So in this case we can add hints “first_rows” or “index”, or install the patch #22174392.

ps. I thought to post this note later, since I hadn’t time enough to add other interesting details about the different TopN variants, including “with tie”, rank(), etc, so I’ll post another note with more details later.

“Collection iterator pickler fetch”: pipelined vs simple table functions

Posted on December 13, 2017 by Sayan Malakshinov Posted in oracle, PL/SQL, PL/SQL optimization, query optimizing, SQL, troubleshooting 2,681 Page views 2 Comments

Alex R recently discovered interesting thing: in SQL pipelined functions work much faster than simple non-pipelined table functions, so if you already have simple non-pipelined table function and want to get its results in sql (select * from table(fff)), it’s much better to create another pipelined function which will get and return its results through PIPE ROW().

A bit more details:

Assume we need to return collection “RESULT” from PL/SQL function into SQL query “select * from table(function_F(…))”.
If we create 2 similar functions: pipelined f_pipe and simple non-pipelined f_non_pipe,

create or replace function f_pipe(n int) return tt_id_value pipelined 
as
  result tt_id_value;
begin
  ...
  for i in 1..n loop
    pipe row (result(i));
  end loop;
end f_pipe;
/
create or replace function f_non_pipe(n int) return tt_id_value 
as
  result tt_id_value;
begin
  ...
  return result;
end f_non_pipe;
/

Full functions definitions

[sourcecode language=”sql”]
create or replace type to_id_value as object (id int, value int)
/
create or replace type tt_id_value as table of to_id_value
/
create or replace function f_pipe(n int) return tt_id_value pipelined
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
for i in 1..n loop
pipe row (result(i));
end loop;
end f_pipe;
/
create or replace function f_non_pipe(n int) return tt_id_value
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
return result;
end f_non_pipe;
/
create or replace function f_pipe_for_nonpipe(n int) return tt_id_value pipelined
as
result tt_id_value;
begin
result:=f_non_pipe(n);
for i in 1..result.count loop
pipe row (result(i));
end loop;
end;
/
create or replace function f_udf_pipe(n int) return tt_id_value pipelined
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
for i in 1..n loop
pipe row (result(i));
end loop;
end;
/
create or replace function f_udf_non_pipe(n int) return tt_id_value
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
return result;
end;
/
[/sourcecode]

[collapse]

Test queries

[sourcecode language=”sql”]
set echo on feed only timing on;
–alter session set optimizer_adaptive_plans=false;
–alter session set "_optimizer_use_feedback"=false;

select sum(id * value) s from table(f_pipe(&1));
select sum(id * value) s from table(f_non_pipe(&1));
select sum(id * value) s from table(f_pipe_for_nonpipe(&1));
select sum(id * value) s from table(f_udf_pipe(&1));
select sum(id * value) s from table(f_udf_non_pipe(&1));
with function f_inline_non_pipe(n int) return tt_id_value
as
result tt_id_value;
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
return result;
end;
select sum(id * value) s from table(f_inline_non_pipe(&1));
/
set timing off echo off feed on;
[/sourcecode]

[collapse]

we’ll find that the function with simple “return result” works at least twice slower than pipelined function:

Function	1 000 000 elements	100 000 elements
F_PIPE	2.46	0.20
F_NON_PIPE	4.39	0.44
F_PIPE_FOR_NONPIPE	2.61	0.26
F_UDF_PIPE	2.06	0.20
F_UDF_NON_PIPE	4.46	0.44

I was really surprised that even “COLLECTION ITERATOR PICKLER FETCH” with F_PIPE_FOR_NONPIPE that gets result of F_NON_PIPE and returns it through PIPE ROW() works almost twice faster than F_NON_PIPE, so I decided to analyze it using stapflame by Frits Hoogland.

I added “dbms_lock.sleep(1)” into both of these function after collection generation, to compare the difference only between “pipe row” in loop and “return result”:

Modified functions

[sourcecode language=”sql”]
create or replace function f_pipe(n int) return tt_id_value pipelined
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
dbms_lock.sleep(1);
for i in 1..n loop
pipe row (result(i));
end loop;
end f_pipe;
/
create or replace function f_non_pipe(n int) return tt_id_value
as
result tt_id_value;

procedure gen is
begin
result:=tt_id_value();
result.extend(n);
for i in 1..n loop
result(i):=to_id_value(i, 1);
end loop;
end;
begin
gen();
dbms_lock.sleep(1);
return result;
end f_non_pipe;
/
[/sourcecode]

[collapse]

And stapflame showed that almost all overhead was consumed by the function “kgmpoa_Assign_Out_Arguments”:

I don’t know what this function is doing exactly, but we can see that oracle assign collection a bit later.
From other functions in this stack(pmucpkl, kopp2isize, kopp2colsize, kopp2atsize(attribute?), kopuadt) I suspect that is some type of preprocessiong of return arguments.
What do you think about it?

Full stapflame output:
stapflame_nonpipe
stapflame_pipe

Easy(lazy) way to check which programs have properly configured FetchSize

Posted on November 16, 2017 by Sayan Malakshinov Posted in oracle, SQL, statistics 1,787 Page views Leave a comment

select 
   s.module
  ,ceil(max(s.rows_processed/s.fetches)) rows_per_fetch
from v$sql s
where 
    s.rows_processed>100
and s.executions    >1
and s.fetches       >1
and s.module is not null
and s.command_type  = 3    -- SELECTs only
and s.program_id    = 0    -- do not account recursive queries from stored procs
and s.parsing_schema_id!=0 -- <> SYS
group by s.module
order by rows_per_fetch desc nulls last
/

How to group connected elements (or pairs)

Posted on September 29, 2017 by Sayan Malakshinov Posted in oracle, PL/SQL, PL/SQL optimization, query optimizing, SQL 1,845 Page views 7 Comments

I see quite often when developers ask questions about connected components:

Table “MESSAGES” contains fields “SENDER” and “RECIPIENT”, which store clients id.
How to quickly get all groups of clients who are connected even through other clients if the table has X million rows?
So for this table, there should be 4 groups:

(1, 2, 4, 8, 16)
(3, 6, 12)
(5, 10, 20)
(7, 14)
(9, 18)

SENDER	RECIPIENT
1	2
2	4
3	6
4	8
5	10
6	12
7	14
8	16
9	18
10	20

Of course, we can solve this problem using SQL only (model, recursive subquery factoring or connect by with nocycle), but such solutions will be too slow for huge tables.

Example of SQL solution

[sourcecode language=”sql”]
with
t(sender,recipient) as (select level,level*2 from dual connect by level<=10)
, v1 as (select rownum id,t.* from t)
, v2 as (select id, account
from v1
unpivot (
account for x in (sender,recipient)
))
, v3 as (
select
id
,account
,dense_rank()over(order by account) account_n
,count(*)over() cnt
from v2)
, v4 as (
select distinct grp,account
from v3
model
dimension by (id,account_n)
measures(id grp,account,cnt)
rules
iterate(1e6)until(iteration_number>cnt[1,1])(
grp[any,any] = min(grp)[any,cv()]
,grp[any,any] = min(grp)[cv(),any]
)
)
select
listagg(account,’,’)within group(order by account) s
from v4
group by grp
[/sourcecode]

[collapse]

In such situations it’s much better to adopt standard algorithms like Quick-find or Weighted quick-union for PL/SQL.
The first time I wrote such solution about 5 years ago and I even posted here one of the latest solutions, but all of them were not universal, so I’ve created the package today with a couple of functions for most common problems: XT_CONNECTED_COMPONENTS

It contains 2 functions based on Weighted quick-find quick-union algorithm:

function get_strings(cur in sys_refcursor, delim varchar2:=’,’) return strings_array pipelined;
It accepts a cursor and returns found connected components as table of varchar2(v_size). You can change v_size in the package definition.
Input cursor should contain one Varchar2 column with linked strings, for example: ‘a,b,c’.
You can also specify list delimiter, by default it is comma.
Examples:

select * from table(xt_connected_components.get_strings( cursor(select ELEM1||','||ELEM2 from TEST));
select * 
from
 table(
   xt_connected_components.get_strings( 
     cursor(select 'a,b,c' from dual union all
            select 'd,e,f' from dual union all
            select 'e,c'   from dual union all
            select 'z'     from dual union all
            select 'X,Y'   from dual union all
            select 'Y,Z'   from dual)));
COLUMN_VALUE
-----------------------------------------
STRINGS('X', 'Y', 'Z')
STRINGS('a', 'b', 'c', 'd', 'e', 'f')
STRINGS('z')

function get_numbers(cur in sys_refcursor) return numbers_array pipelined;
This function also returns connected components, but for numbers.
Input cursor should contain two columns with linked numbers.
Examples:

select * 
from table(
        xt_connected_components.get_numbers( 
          cursor(
            select sender_id, recipient_id from messages
        )));
select * 
from
  table(
    xt_connected_components.get_numbers( 
       cursor(
          select level   account1
               , level*2 account2 
          from dual 
          connect by level<=10
    )));
SQL> select *
  2  from
  3    table(
  4      xt_connected_components.get_numbers(
  5         cursor(
  6            select level   account1
  7                 , level*2 account2
  8            from dual
  9            connect by level<=10
 10*     )))
SQL> /

COLUMN_VALUE
------------------------
NUMBERS(1, 2, 4, 8, 16)
NUMBERS(3, 6, 12)
NUMBERS(5, 10, 20)
NUMBERS(7, 14)
NUMBERS(9, 18)

How to install:
Download all files from Github and execute “@install” in SQL*Plus or execute them in another tool in the following order:
xt_connected_components_types.sql
xt_connected_components_pkg.sql
xt_connected_components_bdy.sql

Download URL: https://github.com/xtender/xt_scripts/tree/master/extra/xt_connected_components

Ampersand instead of colon for bind variables

Posted on September 27, 2017 by Sayan Malakshinov Posted in curious, oracle, SQL, SQL*Plus, undocumented 2,715 Page views 1 Comment

I’ve troubleshooted one query today and I was very surprised that bind variables in this query were specified with &ampersand instead of :colon! I have never seen this before and I couldn’t find anything about this in documentation…
Unfortunately SQL*Plus doesn’t support ampersand yet, even if you disable define (“set define off”),
so I’ve tested such behaviour with this code:

set def off serverout on
exec declare s varchar2(1); begin execute immediate 'select 1 from dual where dummy=&var' into s using 'X'; dbms_output.put_line(s); end;

And it really works! //at least on 11.2.0.2 and 12.2.0.1

SQL> set def off serverout on
SQL> exec declare s varchar2(1); begin execute immediate 'select 1 from dual where dummy=&var' into s using 'X'; dbms_output.put_line(s); end;
1

PL/SQL procedure successfully completed.

SQL> select substr(sql_text,1,40) stext,sql_id,executions,rows_processed from v$sqlarea a where sql_text like '%dual%&var';

STEXT                                 SQL_ID        EXECUTIONS ROWS_PROCESSED
------------------------------------- ------------- ---------- --------------
select 1 from dual where dummy=&var   ckkw4u3atxz02          3              3

SQL> select * from table(dbms_xplan.display_cursor('ckkw4u3atxz02'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
SQL_ID  ckkw4u3atxz02, child number 0
-------------------------------------
select 1 from dual where dummy=&var

Plan hash value: 272002086

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DUMMY"=:VAR)


18 rows selected.

Update: Btw, it works for SQL only, not for PL/SQL:

SQL> var v varchar2(1);
SQL> begin &v = 'Z'; end;
  2  /
begin &v = 'Z'; end;
      *
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00103: Encountered the symbol "&" when expecting one of the following:

SQL> exec &v := 'X';
BEGIN &v := 'X'; END;

      *
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00103: Encountered the symbol "&" when expecting one of the following:
The symbol "&" was ignored.

SQL> exec :v := 'X';

PL/SQL procedure successfully completed.

SQL> select * from dual where dummy=&v
  2  ;

D
-
X

And we can can use mixed placeholders:

SQL> select * from dual where dummy=&v and &v=:v;

D
-
X

Simple regexp to check that string contains word1 and word2 and doesn’t contain word3

Posted on September 14, 2017 by Sayan Malakshinov Posted in oracle, SQL 1,829 Page views 1 Comment

with tst as (
  select 'qwe word1 asd ...............' s from dual
  union all
  select 'qwe word1 asd word2 .........' s from dual
  union all
  select 'qwe word1 asd word2 zxc word3' s from dual
  union all
  select 'qwe word2 asd word1 zxc word4' s from dual
)
select
  s
  ,regexp_replace(s, '(word1)|(word2)|(word3)|(.)','`\3') subst
  ,case when regexp_like(regexp_replace(s, '(word1)|(word2)|(word3)|(.)','`\3') , '^`+$') then 'matched' end tst2
from tst
where 1=1
--and regexp_like(regexp_replace(s, '(word1)|(word2)|(word3)|(.)','`\3') , '^`+$')

Offloading row level security to Smart Scan

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, SQL 1,575 Page views Leave a comment

The question came up this week about whether the predicates for row level security are offloaded to Smart Scan. The simple answer is yes as long as the policy_function contains off-loadable predicates.

Let’s see this in action. Using the public TPC-H schema we can set up the customer table so that customers can only see their own entries in that table. We need a row level security policy to see if each row is one that the current user is permitted to see and a function for the policy to call to make that determination:

Continue reading→

SQL validation during PL/SQL compilation

Posted on May 1, 2017 by Sayan Malakshinov Posted in oracle, PL/SQL, SQL 2,478 Page views Leave a comment

A recent posting on SQL.RU asked why Oracle doesn’t raise such errors like “ORA-00979 not a group by expression” during PL/SQL compilation. Since I couldn’t find a link to the answer (though I read about it many years ago, but I don’t remember where…), I’ve decided to post short answer:

During PL/SQL compilation Oracle checks static SQL using only:

Syntactic analysis – Oracle verifies that keywords, object names, operators, delimiters, and so on are placed correctly in your SQL statement. So such queries like “select * foRm dual” will fail during this validation. For example, we can get here such errors like:
ORA-00900: invalid SQL statement
ORA-00923: FROM keyword not found where expected
ORA-00924: missing BY keyword
ORA-00933: SQL command not properly ended
…
Semantic analysis – it verifies that references to host variables and database objects are valid(including their grants) and that host-variable datatypes are correct. For example, “select * from nonexisting_table” will fail this validation.

And since Oracle doesn’t validate all other types of errors during Syntactic and Semantic analysis, we can detect them only during optimization* or execution*.
For example, Oracle detects “ORA-00979 not a group by expression” during optimization phase.

* Read “Chapter Chapter 7: Parsing and Optimizing” in Oracle Core by Jonathan Lewis

NB. It doesn’t not apply to CREATE or ALTER VIEW, because Oracle executes optimization step for them. You can check it using trace 10053.

Simple example:

-- fORm instead of "from" - syntactic validation fails:
SQL> create table t(a int, b int, c int);
SQL> create or replace procedure p_syntactic is
  2     cursor c is select a,b,sum(c) sum_c fORm t group by a;
  3  begin
  4     null;
  5  end;
  6  /

Warning: Procedure created with compilation errors.

SQL> sho error;
Errors for PROCEDURE P_SYNTACTIC:

LINE/COL ERROR
-------- -----------------------------------------------------------------
2/16     PL/SQL: SQL Statement ignored
2/40     PL/SQL: ORA-00923: FROM keyword not found where expected

-- semantic validation fails:
SQL> create or replace procedure p_semantic is
  2     cursor c is select a,b,sum(blabla) sum_c from t group by a;
  3  begin
  4     null;
  5  end;
  6  /

Warning: Procedure created with compilation errors.

SQL> sho error;
Errors for PROCEDURE P_SEMANTIC:

LINE/COL ERROR
-------- -----------------------------------------------------------------
2/16     PL/SQL: SQL Statement ignored
2/31     PL/SQL: ORA-00904: "BLABLA": invalid identifier

-- As you can see this procedure passes successfully both syntactic and semantic analysis,
-- though query is not valid: it should raise "ORA-00979: not a GROUP BY expression"
SQL> create or replace procedure p_valid is
  2     cursor c is select a,b,sum(c) sum_c from t group by a;
  3  begin
  4     null;
  5  end;
  6  /

SQL> sho error;
No errors.

-- Oracle checks such errors for "CREATE VIEW", because it runs optimization for the query text:
SQL> create view v_cursor as select a,b,sum(c) sum_c from t group by a order by a;
create view v_cursor as select a,b,sum(c) sum_c from t group by a order by a
                                 *
ERROR at line 1:
ORA-00979: not a GROUP BY expression