Just short note: If your imported Oracle Linux image hangs on boot in the Oracle cloud, just set GRUB_DISABLE_UUID=”true” in /etc/default/grub
Category Archives: oracle
Top-N again: fetch first N rows only vs rownum
Three interesting myths about rowlimiting clause vs rownum have recently been posted on our Russian forum:
- TopN query with rownum<=N is always faster than "fetch first N rows only" (ie. row_number()over(order by ...)<=N)
- “fetch first N rows only” is always faster than rownum<=N
- “SORT ORDER BY STOPKEY” stores just N top records during sorting, while “WINDOW SORT PUSHED RANK” sorts all input and stores all records sorted in memory.
Interestingly that after Vyacheslav posted first statement as an axiom and someone posted old tests(from 2009) and few people made own tests which showed that “fetch first N rows” is about 2-3 times faster than the query with rownum, the final decision was that “fetch first” is always faster.
First of all I want to show that statement #3 is wrong and “WINDOW SORT PUSHED RANK” with row_number works similarly as “SORT ORDER BY STOPKEY”:
It’s pretty easy to show using sort trace:
Let’s create simple small table Tests1 with 1000 rows where A is in range 1-1000 (just 1 block):
create table test1(a not null, b) as select level, level from dual connect by level<=1000; alter session set max_dump_file_size=unlimited; ALTER SESSION SET EVENTS '10032 trace name context forever, level 10'; ALTER SESSION SET tracefile_identifier = 'rownum'; select * from (select * from test1 order by a) where rownum<=10; ALTER SESSION SET tracefile_identifier = 'rownumber'; select * from test1 order by a fetch first 10 rows only;
And we can see from the trace files that both queries did the same number of comparisons:
----- Current SQL Statement for this session (sql_id=bbg66rcbt76zt) ----- select * from (select * from test1 order by a) where rownum<=10 ---- Sort Statistics ------------------------------ Input records 1000 Output records 10 Total number of comparisons performed 999 Comparisons performed by in-memory sort 999 Total amount of memory used 2048 Uses version 1 sort ---- End of Sort Statistics -----------------------
----- Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) ----- select * from test1 order by a fetch first 10 rows only ---- Sort Statistics ------------------------------ Input records 1000 Output records 10 Total number of comparisons performed 999 Comparisons performed by in-memory sort 999 Total amount of memory used 2048 Uses version 1 sort ---- End of Sort Statistics -----------------------
Ie. each row (except first one) was compared with the biggest value from top 10 values and since they were bigger than top 10 value, oracle doesn’t compare it with other TopN values.
And if we change the order of rows in the table both of these queries will do the same number of comparisons again:
create table test1(a not null, b) as select 1000-level, level from dual connect by level<=1000; alter session set max_dump_file_size=unlimited; ALTER SESSION SET EVENTS '10032 trace name context forever, level 10'; ALTER SESSION SET tracefile_identifier = 'rownum'; select * from (select * from test1 order by a) where rownum<=10; ALTER SESSION SET tracefile_identifier = 'rownumber'; select * from test1 order by a fetch first 10 rows only;
----- Current SQL Statement for this session (sql_id=bbg66rcbt76zt) ----- select * from (select * from test1 order by a) where rownum<=10 ---- Sort Statistics ------------------------------ Input records 1000 Output records 1000 Total number of comparisons performed 4976 Comparisons performed by in-memory sort 4976 Total amount of memory used 2048 Uses version 1 sort ---- End of Sort Statistics -----------------------
----- Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) ----- select * from test1 order by a fetch first 10 rows only ---- Sort Statistics ------------------------------ Input records 1000 Output records 1000 Total number of comparisons performed 4976 Comparisons performed by in-memory sort 4976 Total amount of memory used 2048 Uses version 1 sort ---- End of Sort Statistics -----------------------
We can see that both queries required much more comparisons(4976) here, that’s because each new value is smaller than the biggest value from the topN and even smaller than lowest value, so oracle should get right position for it and it requires 5 comparisons for that (it compares with 10th value, then with 6th, 3rd, 2nd and 1st values from top10). Obviously it makes less comparisons for the first 10 rows.
Now let’s talk about statements #1 and #2:
We know that rownum forces optimizer_mode to switch to “first K rows”, because of the parameter “_optimizer_rownum_pred_based_fkr”
SQL> @param_ rownum NAME VALUE DEFLT TYPE DESCRIPTION ---------------------------------- ------ ------ --------- ------------------------------------------------------ _optimizer_rownum_bind_default 10 TRUE number Default value to use for rownum bind _optimizer_rownum_pred_based_fkr TRUE TRUE boolean enable the use of first K rows due to rownum predicate _px_rownum_pd TRUE TRUE boolean turn off/on parallel rownum pushdown optimization
while fetch first/row_number doesn’t (it will be changed after the patch #22174392) and it leads to the following consequences:
1. first_rows disables serial direct reads optimization(or smartscan on Exadata), that’s why the tests with big tables showed that “fetch first” were much faster than the query with rownum.
So if we set “_serial_direct_read”=always, we get the same performance in both tests (within the margin of error).
2. In cases when index access (index full scan/index range scan) is better, CBO differently calculates the cardinality of underlying INDEX FULL(range) SCAN:
the query with rownum is optimized for first_k_rows and the cardinality of index access is equal to K rows, but CBO doesn’t reduce cardinality for “fetch first”, so the cost of index access is much higher, compare them:
SQL> explain plan for 2 select * 3 from (select * from test order by a,b) 4 where rownum<=10; -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 390 | 4 (0)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | | 2 | VIEW | | 10 | 390 | 4 (0)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1000K| 12M| 4 (0)| 00:00:01 | | 4 | INDEX FULL SCAN | IX_TEST_AB | 10 | | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=10)
SQL> explain plan for 2 select * 3 from test 4 order by a,b 5 fetch first 10 rows only; ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 780 | | 5438 (1)| 00:00:01 | |* 1 | VIEW | | 10 | 780 | | 5438 (1)| 00:00:01 | |* 2 | WINDOW SORT PUSHED RANK| | 1000K| 12M| 22M| 5438 (1)| 00:00:01 | | 3 | TABLE ACCESS FULL | TEST | 1000K| 12M| | 690 (1)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)
SQL> explain plan for 2 select/*+ first_rows */ * 3 from test 4 order by a,b 5 fetch first 10 rows only; -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 780 | 27376 (1)| 00:00:02 | |* 1 | VIEW | | 10 | 780 | 27376 (1)| 00:00:02 | |* 2 | WINDOW NOSORT STOPKEY | | 1000K| 12M| 27376 (1)| 00:00:02 | | 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1000K| 12M| 27376 (1)| 00:00:02 | | 4 | INDEX FULL SCAN | IX_TEST_AB | 1000K| | 2637 (1)| 00:00:01 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)
SQL> explain plan for 2 select/*+ index(test (a,b)) */ * 3 from test 4 order by a,b 5 fetch first 10 rows only; -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 780 | 27376 (1)| 00:00:02 | |* 1 | VIEW | | 10 | 780 | 27376 (1)| 00:00:02 | |* 2 | WINDOW NOSORT STOPKEY | | 1000K| 12M| 27376 (1)| 00:00:02 | | 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1000K| 12M| 27376 (1)| 00:00:02 | | 4 | INDEX FULL SCAN | IX_TEST_AB | 1000K| | 2637 (1)| 00:00:01 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)
So in this case we can add hints “first_rows” or “index”, or install the patch #22174392.
ps. I thought to post this note later, since I hadn’t time enough to add other interesting details about the different TopN variants, including “with tie”, rank(), etc, so I’ll post another note with more details later.
Docker with Oracle database: install patches automatically
Recently I had to install the patch for fixing cross-platform PDB transport bug onto the docker images with Oracle, so these are easy way how to do it:
1. create directory “patches” and create “install_patches.sh”:
#!/bin/bash unzip -u ./*.zip CURDIR=`pwd` for D in *; do if [ -d "${D}" ]; then echo ================================================= echo " *** Processing patch # ${D}... " # your processing here cd "${D}" opatch apply -silent fi cd $CURDIR done
2. add the following commands into the dockerfile:
USER root # Copy DB install file COPY --chown=oracle:dba patches $INSTALL_DIR/patches # Install DB software binaries USER oracle RUN chmod ug+x $INSTALL_DIR/patches/*.sh && \ sync && \ cd $INSTALL_DIR/patches && \ ./install_patches.sh
3. put downloaded patches into the “patches” directory and build image.
For example, dockerfile for 18.3:
FROM oracle/database:18.3.0-ee MAINTAINER Sayan Malakshinov <sayan@orasql.org> USER root # Copy patches: COPY --chown=oracle:dba patches $INSTALL_DIR/patches # Install patches: USER oracle RUN chmod ug+x $INSTALL_DIR/patches/*.sh && \ sync && \ cd $INSTALL_DIR/patches && \ ./install_patches.sh
ps. I’ve also create the request for that in the official Docker github: https://github.com/oracle/docker-images/issues/1070
Compression in a well-balanced system
Exadata is designed to present a well-balanced system between disk scan rates, cell CPU processing rates, network bandwidth, and RDBMS CPU such that on average across a lot of workloads and a lot of applications there should not be no one obvious bottleneck. Good performance engineering means avoiding a design like this:
because that would be nuts. If you change one component in a car e.g. the engine you need to rework the other components such as the chassis and the brakes. When we rev the Exadata hardware we go through the same exercise to make sure the components match to avoid bottlenecks.
One of the components in a well-balanced system that people often leave off this list is compression algorithm. With compression you are balancing storage costs as well as the disk scan rates against the CPU’s decompression rate. In a disk bound system you can alleviate the bottleneck by compressing the data down to the point that it is being scanned at the same rate that the CPUs can decompress and process it. Squeeze it too far, say with HCC “Compress For Archive High” which uses BZ2, the CPUs become a major bottleneck because the decompression costs unbalance the system (but it’s wonderful if you think you’re unlikely to ever need the data again).
Continue readingCreate Quarantine
First if you want don’t know what an Exadata Quarantine is read this.
Someone asked whether you can create your own Exadata Cell quarantine and, if you can, why you might ever want to do it?
The first step when you don’t know how to do something is try HELP in cellcli
CellCLI> HELP
...
ALTER QUARANTINE
...
CREATE QUARANTINE
...
DROP QUARANTINE
...
LIST QUARANTINE
So we see we can create a quarantine, so we use HELP again:
Continue readingShining some light on Database In-Memory vs the Exadata Columnar Cache in 12.1.0.2
I posted a while back on how to use Tracing Hybrid Columnar Compression in an offload server so this is a quick follow up.
- I have trouble remembering the syntax for setting a regular parameter in an offload server without bouncing it. Since I need to keep this written down somewhere I thought it might be use to support folks and dbas.
- I forgot to show you how to specify which offload group to set the trace event
So this example should do both:
CellCLI > alter cell offloadGroupEvents = "immediate cellsrv.cellsrv_setparam('my_parameter, 'TRUE')", offloadGroupName = "SYS_122110_160621"
this will, of course, set a parameter temporarily until the next time the offload server is bounced, but also adding it to the offload group’s init.ora will take care of that.
Create External Table as Select
I was looking through a test script and saw something I didn’t know you could do in Oracle. I mentioned it to an Oracle ACE and he didn’t know it either. I then said to the External Table engineers “Oh I see you’ve added this cool new feature” and he replied dryly – “Yes, we added it in Oracle 10.1”. Ouch! So just in case you also didn’t know, you can create an External Table using a CTAS and the ORACLE_DATAPUMP driver.
This feature only work with the ORACLE_DATAPUMP access driver (it does NOT work with with the LOADER, HIVE, or HDFS drivers) and we can use it like this:
SQL> create table cet_test organization external
2 (
3 type ORACLE_DATAPUMP
4 default directory T_WORK
5 location ('xt_test01.dmp','xt_test02.dmp')
6 ) parallel 2
7 as select * from lineitem
Table created.
Checking the results shows us
-rw-rw---- ... 786554880 Mar 9 10:48 xt_test01.dmp
-rw-rw---- ... 760041472 Mar 9 10:48 xt_test02.dmp
This can be a great way of creating a (redacted) sample of data to give to a developer to test or for a bug repro to give to support or to move between systems.
Understanding External Table URowids
I finally found time to get back to External Tables and have a list of blog posts I need to write on this topic. Here’s a brief one.
DBMS_ROWID will nicely break down a heap table’s rowid for you into file number, block number, and row number but it doesn’t handle the rowids coming from External Tables. So let’s look at how to make sense of them. They fall under the datatype UROWID which is a nominally opaque rowid defined by the data source. The first byte of a UROWID tells you which data source it came from and consequently how to deconstruct it.
The easiest way to see what is happening is via the SQL Dump function:
SQL> column xtrowid format a55
SQL> select c_custkey, dump(rowid,16) "XTROWID" from c_et
2> where c_custkey < 10;
C_CUSTKEY XTROWID
---------- -------------------------------------------------------
1 Typ=208 Len=17: 4,0,1,27,a2,0,0,0,0,0,0,0,0,0,0,0,1
2 Typ=208 Len=17: 4,0,1,27,a2,0,0,0,0,0,0,0,0,0,0,0,2
3 Typ=208 Len=17: 4,0,1,27,a2,0,0,0,0,0,0,0,0,0,0,0,3
4 Typ=208 Len=17: 4,0,1,27,a2,0,0,0,0,0,0,0,0,0,0,0,4
5 Typ=208 Len=17: 4,0,1,27,a2,0,0,0,0,0,0,0,0,0,0,0,5
6 Typ=208 Len=17: 4,0,1,27,a2,0,0,0,0,0,0,0,0,0,0,0,6
7 Typ=208 Len=17: 4,0,1,27,a2,0,0,0,0,0,0,0,0,0,0,0,7
8 Typ=208 Len=17: 4,0,1,27,a2,0,0,0,0,0,0,0,0,0,0,0,8
9 Typ=208 Len=17: 4,0,1,27,a2,0,0,0,0,0,0,0,0,0,0,0,9
9 rows selected.
How to tell if the Exadata column cache is fully loaded
When a background activity is happening on the cell you typically can’t use RDBMS v$ views to monitor it in the same way. One such question is how to tell if a segment is fully loaded in the Exadata column cache since this does not appear in the equivalent In-Memory v$ views.
When a segment is scanned by Smart Scan sufficiently often to be eligible the AUTOKEEP pool (typically that means at least twice an hour), the eligible 1MB chunks are written to flash in 12.1.0.2 style format, and put on a background queue. Lower priority tasks pick up the queued 1MB 12.1.0.2 format chunks from the flash cache, run them though the In-Memory loader, and rewrite the pure columnar representation in place of the old 12.1.0.2 style column cache chunks.
The easiest way that I know of to tell when this completes is to monitor that background activity is to use the following query until it shows zero:
select name, sum(value) value from (
select extractvalue(value(t),'/stat/@name') name,
extractvalue(value(t),'/stat') value
from v$cell_state cs,
table(xmlsequence(extract(xmltype(cs.statistics_value),
'//stats[@type="columnarcache"]/stat'))) t
where statistics_type='CELL')
where name in ('outstanding_imcpop_requests')
group by name;
Top N biggest tables (with lobs, indexes and nested table)
Script for SQL*Plus: https://github.com/xtender/xt_scripts/blob/master/tops/top_seg_by_size.sql
with seg as ( select owner,segment_name ,segment_type ,tablespace_name ,sum(blocks) blocks ,sum(bytes) bytes from dba_segments s where segment_type not in ( 'TYPE2 UNDO' ,'ROLLBACK' ,'SYSTEM STATISTICS' ) and segment_name not like 'BIN$%' --not in recyclebin and owner like '&owner_mask' -- you can specify schema here group by owner,segment_name,segment_type,tablespace_name ) ,segs as ( select owner,segment_name ,case when segment_name like 'DR$%$%' then 'CTX INDEX' else segment_type end segment_type ,tablespace_name ,case when segment_name like 'DR$%$%' then (select table_owner||'.'||table_name from dba_indexes i where i.owner=s.owner and i.index_name = substr(segment_name,4,length(segment_name)-5)) when segment_type in ('TABLE','TABLE PARTITION','TABLE SUBPARTITION') then owner||'.'||segment_name when segment_type in ('INDEX','INDEX PARTITION','INDEX SUBPARTITION') then (select i.table_owner||'.'||i.table_name from dba_indexes i where i.owner=s.owner and i.index_name=s.segment_name) when segment_type in ('LOBSEGMENT','LOB PARTITION','LOB SUBPARTITION') then (select l.owner||'.'||l.TABLE_NAME from dba_lobs l where l.segment_name = s.segment_name and l.owner = s.owner) when segment_type = 'LOBINDEX' then (select l.owner||'.'||l.TABLE_NAME from dba_lobs l where l.index_name = s.segment_name and l.owner = s.owner) when segment_type = 'NESTED TABLE' then (select nt.owner||'.'||nt.parent_table_name from dba_nested_tables nt where nt.owner=s.owner and nt.table_name=s.segment_name) when segment_type = 'CLUSTER' then (select min(owner||'.'||table_name) from dba_tables t where t.owner=s.owner and t.cluster_name=s.segment_name and rownum=1) end table_name ,blocks ,bytes from seg s ) ,so as ( select segs.owner ,substr(segs.table_name,instr(segs.table_name,'.')+1) TABLE_NAME ,sum(segs.bytes) total_bytes ,sum(segs.blocks) total_blocks ,sum(case when segs.segment_type in ('TABLE','TABLE PARTITION','TABLE SUBPARTITION','NESTED TABLE','CLUSTER') then segs.bytes end) tab_size ,sum(case when segs.segment_type in ('INDEX','INDEX PARTITION','INDEX SUBPARTITION','CTX INDEX') then segs.bytes end) ind_size ,sum(case when segs.segment_type in ('CTX INDEX') then segs.bytes end) ctx_size ,sum(case when segs.segment_type in ('LOBSEGMENT','LOBINDEX','LOB PARTITION','LOB SUBPARTITION') then segs.bytes end) lob_size from segs group by owner,table_name ) ,tops as ( select dense_rank()over (order by total_bytes desc) rnk ,so.* from so ) select * from tops where rnk<=50 -- top 50 /