A couple of days ago someone posted a question on the forum which at the first glance seemed old, boring, beaten up and down:
There is a news feed. All news are divided into 10 categories (Politics, sport, auto, real estate, etc).
I need to get top 4 news sorted by time descending for each category with 1 query.
If you sort the results – you get 4 politics news, then 4 sport news etc.
But the task was to make it optimal, and the standard solution with usual TopN using row_number can not be called optimal in any way, especially in case of big tables, relatively small number of categories and uneven distribution or just overall low selectivity.
So my idea was to start from min() and get next values using “Index range scan(min/max)” recursively. I couldn’t find a good name for this technique, so let’s call it as Jonathan Lewis – “Index bouncy scan”:
1. Getting distinct values from the index
Suppose we have a table with index on the “а” column:
create table xt_test(a not null,b not null,c)
as
select
length(object_name)
,nvl(object_id,0)
,o.OBJECT_NAME
from dba_objects o;
create index ix_test_a on xt_test(a);
SQL> select i.index_name
2 ,i.distinct_keys,i.num_rows
3 ,i.blevel,i.leaf_blocks
4 ,i.avg_leaf_blocks_per_key,i.avg_data_blocks_per_key
5 from user_indexes i where i.table_name='XT_TEST';
INDEX_NAME DISTINCT_KEYS NUM_ROWS BLEVEL LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
----------- ------------- --------- -------- ----------- ----------------------- -----------------------
IX_TEST_A 30 69230 1 135 4 191
1 row selected.
DDL for this test case:
drop table xt_test purge; create table xt_test(a not null,b not null,c) as select length(object_name) ,nvl(object_id,0) ,o.OBJECT_NAME from dba_objects o ; create index ix_test_a on xt_test(a); begin dbms_stats.gather_table_stats( '' ,'XT_TEST' ,estimate_percent=>100 ,cascade=>true ,method_opt => 'for all indexed columns size auto' ); end; / select i.index_name ,i.distinct_keys,i.num_rows ,i.blevel,i.leaf_blocks ,i.avg_leaf_blocks_per_key,i.avg_data_blocks_per_key from user_indexes i where i.table_name='XT_TEST';
This field have very skewed distribution of values:
A | COUNT(*) |
---|---|
1 | 11 |
2 | 20 |
3 | 59 |
4 | 92 |
5 | 178 |
6 | 251 |
7 | 521 |
9 | 570 |
10 | 636 |
8 | 640 |
11 | 962 |
12 | 970 |
13 | 1151 |
15 | 1363 |
14 | 1544 |
16 | 1692 |
18 | 2021 |
17 | 2023 |
19 | 2550 |
20 | 2606 |
21 | 3050 |
22 | 3171 |
23 | 3395 |
24 | 3472 |
29 | 3527 |
27 | 3596 |
26 | 3698 |
28 | 4130 |
25 | 4268 |
30 | 17063 |
ALL | 69230 |
A standard query using distinct is very unsuccessful – there are only 30 distinct keys in the index, while there are 135 blocks to read!
With IFS:
DB11G/XTENDER> select/*+ INDEX(xt_test) */ distinct a from xt_test;
30 rows selected.
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
Plan hash value: 3405466263
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 30 | 90 | 140 (3)| 00:00:02 |
| 1 | SORT UNIQUE NOSORT| | 30 | 90 | 140 (3)| 00:00:02 |
| 2 | INDEX FULL SCAN | IX_TEST_A | 69230 | 202K| 137 (1)| 00:00:02 |
--------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
138 consistent gets
0 physical reads
0 redo size
751 bytes sent via SQL*Net to client
431 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
30 rows processed
DB11G/XTENDER> select distinct a from xt_test; 30 rows selected. Elapsed: 00:00:00.05 Execution Plan ---------------------------------------------------------- Plan hash value: 4206828362 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 30 | 90 | 42 (10)| 00:00:01 | | 1 | HASH UNIQUE | | 30 | 90 | 42 (10)| 00:00:01 | | 2 | INDEX FAST FULL SCAN| IX_TEST_A | 69230 | 202K| 38 (0)| 00:00:01 | ----------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 143 consistent gets 0 physical reads 0 redo size 751 bytes sent via SQL*Net to client 431 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 30 rows processed
We also could go along the tree visiting only the required blocks, but not all leaf blocks! However, Oracle can’t manage this on its own so we have to make a certain twist: aside from IFS(min/max) Oracle also has IRS(min/max) which works well with ranges and boundaries. We can use recursive query to make it read only what we need!
DB11G/XTENDER> with t_unique( a ) as (
2 select min(t1.a)
3 from xt_test t1
4 union all
5 select (select min(t1.a) from xt_test t1 where t1.a>t.a)
6 from t_unique t
7 where a is not null
8 )
9 select * from t_unique where a is not null;
30 rows selected.
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2791305641
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 26 | 4 (0)| 00:00:01 |
|* 1 | VIEW | | 2 | 26 | 4 (0)| 00:00:01 |
| 2 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | |
| 3 | SORT AGGREGATE | | 1 | 3 | | |
| 4 | INDEX FULL SCAN (MIN/MAX) | IX_TEST_A | 1 | 3 | 2 (0)| 00:00:01 |
| 5 | SORT AGGREGATE | | 1 | 3 | | |
| 6 | FIRST ROW | | 1 | 3 | 2 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN (MIN/MAX) | IX_TEST_A | 1 | 3 | 2 (0)| 00:00:01 |
|* 8 | RECURSIVE WITH PUMP | | | | | |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A" IS NOT NULL)
7 - access("T1"."A">:B1)
8 - filter("A" IS NOT NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
36 consistent gets
0 physical reads
0 redo size
751 bytes sent via SQL*Net to client
431 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
32 sorts (memory)
0 sorts (disk)
30 rows processed
The difference is obvious: 36 consistent gets for 30 values, instead of 135. Note that this is a very small table, and we will have а much notable difference for millions and billions of entries!
Here is the explanation of the algorithm:
- In the first part of union all (3-4 strings of plan) we specify where to start the recursion, and more specifically we choose a minimal (first) the value from the index.
- After that we choose the first value that is bigger than the one chosen in the previous step, using IRS(min/max) (7-6-5 stings of the plan).
- Repeat the recursion while we find anything
Proceed to the next:
2. TopN entries for every key value
Now as we are armed with an easy tool to get every initial value, we can easily get Top N for each of them. The only problem that remains is that, we can not use inline view with row_number/rownum, as the predicate from higher level won’t be pushed there, and we will have to use simple restriction by count stop key (by rownum) with required access by IRS descending (order by is generally unnecessary there, but it further reduces reading costs of IRS descending, which is necessary for implicit sorting) with the index_desc hint, to nail it dead, otherwise sorting may break. So to make this happen we either have to use an undocumented Lateral() with a corresponding event turned on, or use a simpler and standard table(multiset(…)) or a little harder with xmltable() – but it is not so dangerous. Yet another variant is the use cursor() with pushed predicates:
with t_unique( a ) as ( select min(t1.a) from xt_test t1 union all select (select min(t1.a) from xt_test t1 where t1.a>t.a) from t_unique t where a is not null ) select cursor( select rid from( select/*+ index_desc(tt ix_xt_test_ab) */ tt.a ,tt.rowid rid ,row_number()over(partition by a order by b desc) rn from xt_test tt order by tt.b desc ) where a=v.a and rn<=5 ) from t_unique v
[collapse]
DB11G/XTENDER> with t_unique( a ) as ( 2 select min(t1.a) 3 from xt_test t1 4 union all 5 select (select min(t1.a) from xt_test t1 where t1.a>t.a) 6 from t_unique t 7 where a is not null 8 ) 9 select/*+ use_nl(rids tt) */ * 10 from t_unique v 11 ,table( 12 cast( 13 multiset( 14 select/*+ index_desc(tt ix_xt_test_ab) */ tt.rowid rid 15 from xt_test tt 16 where tt.a=v.a 17 and rownum<=5 18 order by tt.b desc 19 ) 20 as sys.odcivarchar2list 21 ) 22 ) rids 23 ,xt_test tt 24 where tt.rowid=rids.column_value 25 order by tt.a,tt.b desc; 150 rows selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 4085270117 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11M| 506M| | 149K (1)| 00:29:54 | | 1 | SORT ORDER BY | | 11M| 506M| 649M| 149K (1)| 00:29:54 | | 2 | NESTED LOOPS | | 11M| 506M| | 16402 (1)| 00:03:17 | | 3 | NESTED LOOPS | | 16336 | 239K| | 60 (0)| 00:00:01 | | 4 | VIEW | | 2 | 26 | | 4 (0)| 00:00:01 | | 5 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | | | | 6 | SORT AGGREGATE | | 1 | 3 | | | | | 7 | INDEX FULL SCAN (MIN/MAX) | IX_TEST_A | 1 | 3 | | 2 (0)| 00:00:01 | | 8 | SORT AGGREGATE | | 1 | 3 | | | | | 9 | FIRST ROW | | 1 | 3 | | 2 (0)| 00:00:01 | |* 10 | INDEX RANGE SCAN (MIN/MAX) | IX_TEST_A | 1 | 3 | | 2 (0)| 00:00:01 | |* 11 | RECURSIVE WITH PUMP | | | | | | | | 12 | COLLECTION ITERATOR SUBQUERY FETCH | | 8168 | 16336 | | 28 (0)| 00:00:01 | |* 13 | COUNT STOPKEY | | | | | | | |* 14 | INDEX RANGE SCAN DESCENDING | IX_XT_TEST_AB | 2308 | 64624 | | 8 (0)| 00:00:01 | |* 15 | TABLE ACCESS BY USER ROWID | XT_TEST | 692 | 22144 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 10 - access("T1"."A">:B1) 11 - filter("A" IS NOT NULL) 13 - filter(ROWNUM<=5) 14 - access("TT"."A"=:B1) 15 - access(CHARTOROWID(VALUE(KOKBF$))) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 166 consistent gets 0 physical reads 0 redo size 7523 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 11 SQL*Net roundtrips to/from client 33 sorts (memory) 0 sorts (disk) 150 rows processed
[collapse]
It is similarly possible through “lateral”:
alter session set events '22829 trace name context forever'; with t_unique( a ) as ( select min(t1.a) from xt_test t1 union all select (select min(t1.a) from xt_test t1 where t1.a>t.a) from t_unique t where a is not null ) select/*+ use_nl(rids tt) */ * from t_unique v ,lateral( select/*+ index_desc(tt ix_xt_test_ab) */ tt.* from xt_test tt where tt.a=v.a and rownum<=5 order by tt.a, b desc ) r order by r.a,r.b desc
[collapse]
In general, we could do without the dangerous sorting, using “xmltable” and dbms_xmlgen instead of “table” sending a parameter directly to the internal subquery, but this is a bit harder than the regular ”table”
with t_unique( owner ) as ( select min(owner) from ttt union all select (select min(t1.owner) from ttt t1 where t1.owner>t.owner) from t_unique t where owner is not null ) select r.* from t_unique v ,xmltable('/ROWSET/ROW' passing( dbms_xmlgen.getxmltype( q'[select * from ( select/*+ index_asc(tt ix_ttt) */ owner, to_char(created,'yyyy-mm-dd hh24:mi:ss') created from ttt tt where tt.owner=']'||v.owner||q'[' order by tt.created asc ) where rownum<=5 ]' ) ) columns owner varchar2(30) path 'OWNER' ,created varchar2(30) path 'CREATED' ,x xmltype path '.' ) r where v.owner is not null order by r.owner,r.created asc; ----------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 148 |00:00:00.28 | 365 | | | | | 1 | SORT ORDER BY | | 1 | 16336 | 148 |00:00:00.28 | 365 | 20480 | 20480 |18432 (0)| | 2 | NESTED LOOPS | | 1 | 16336 | 148 |00:00:00.10 | 365 | | | | |* 3 | VIEW | | 1 | 2 | 30 |00:00:00.01 | 66 | | | | | 4 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | 1 | | 31 |00:00:00.01 | 66 | | | | | 5 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | | | | | 6 | INDEX FULL SCAN (MIN/MAX) | IX_TTT | 1 | 1 | 1 |00:00:00.01 | 3 | | | | | 7 | SORT AGGREGATE | | 30 | 1 | 30 |00:00:00.01 | 63 | | | | | 8 | FIRST ROW | | 30 | 1 | 29 |00:00:00.01 | 63 | | | | |* 9 | INDEX RANGE SCAN (MIN/MAX) | IX_TTT | 30 | 1 | 29 |00:00:00.01 | 63 | | | | | 10 | RECURSIVE WITH PUMP | | 31 | | 30 |00:00:00.01 | 0 | | | | | 11 | COLLECTION ITERATOR PICKLER FETCH | XMLSEQUENCEFROMXMLTYPE | 30 | 8168 | 148 |00:00:00.10 | 299 | | | | ----------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("V"."OWNER" IS NOT NULL) 9 - access("T1"."OWNER">:B1)
[collapse]
Update: Since Oracle 12c it would be much better to use Laterals