Oracle SQL | Tag Archives: direct path reads

Triaging Smart Scan

Posted on April 8, 2021 by Roger MacNicol Posted in adaptive serial direct path reads, cell_offload, oracle, SmartScan, trace 2,813 Page views Leave a comment

This document is my attempt to bring together the available options that can be used to determine the root cause of an issue in order to create a roadmap to help support engineers narrow down the cause of concern.

It is a living document and will be edited and amended as time goes by. Please do check back again in the future.

Warning: these parameters should only be used in conjunction with an Oracle Support Engineer and are not intended for DBAs to self-triage; also they should not be left set after triage without discussion with an Oracle Support Engineer.

Continue reading→

Serial Scans failing to offload

Posted on October 13, 2020 by Roger MacNicol Posted in adaptive serial direct path reads, cell_offload, oracle, SmartScan 1,871 Page views 1 Comment

Very Large Buffer Cache

We’ve observed databases with very large buffer caches where Serial Scans don’t make use of Smart Scan when that would have executed faster: improvements to the decision making for Serial Scans have been made under bug 31626438. This fix is back-portable.

A key difference between PQ and Serial is that as part of granule generation PQ sums the sizes of all the partitions that have not been pruned and passes that total size to the buffer cache decision making logic. Because the entire size to be scanned is considered, we make an accurate determination of smart scan benefits and the risk of cache thrashing.

Serial Scans on partitioned tables do not involved the coordinator and have no opportunity to get the larger picture, instead they start work immediately so each partition is considered one at a time and only that one partition’s size is considered by the decision for using Buffer Cache or Direct Read (and hence offload). In the presence of very large buffer caches any given partition can fail the “Is Medium” test (or even the “Is Small” test) and so not get offloaded.

In order to avoid this situation an upper bound of 100MB for using a buffer cache scan has been implemented for any serially scanned segment that:

isn’t using Automatic Big Table Caching (ABTC).
hasn’t had the Small Table parameter changed to a non-default value.

Any partitions larger than 100 MB will now automatically use Direct Read and hence offload on Exadata.

NSMTIO: kxfxghwm:[HWM_NOT_FOUND]

Another case to watch out for is when NSMTIO tracing shows HWM_NOT_FOUND and then choosing a Buffer Cache scan when a Direct Read offloaded scan would have been faster. This can happen when a PQ query gets executed serially (NB: this is NOT the downgrade to serial case, this is still PQ but on a single thread). In this case the coordinator again does not have the opportunity to process all the partitions and as part of that gather the High Water Mark (HWM) for each segment and checkpoint them so we fall back on buffer cache scans. A fix for this is currently being investigated.

Mixed Block Sizes

I have consistently advised against mixing block sizes in a database without a compelling reason backed up by empirical evidence, but for those who must the “Is Medium Table” logic for whether to use buffer cache or direct read has been improved when the database has more than one block size in use. This is tracked by bug 24655250 and fixed in 20.1.

Smart Scan and Recursive queries

Posted on March 5, 2020 by Roger MacNicol Posted in adaptive serial direct path reads, CBO, hints, oracle, parallel, SmartScan, trace, troubleshooting 1,716 Page views Leave a comment

Since Christmas I have been asked to investigate two different “failures to use Smart Scan”. It turns out they both fell into the same little known restriction on the use of Direct Read. Smart Scan critically depends on Direct Read in order to read the synthetic output blocks into private buffers in PGA so with Direct Read disabled Smart Scan is also disabled. In these two cases the restriction is on using Direct Read on Serial Recursive queries.

Case 1: Materialized View Refresh

A customer asked me to investigate why his MView refresh was running slowly and was failing to use Smart Scan. He had used 'trace[NSMTIO] disk=highest' which showed the cause as:

Direct Read for serial qry: disabled(::recursive_call::kctfsage:::)

Continue reading→

Thoughts on Big Data SQL SmartScan

Posted on August 18, 2019 by Roger MacNicol Posted in cell_offload, External tables, oracle, SmartScan 1,654 Page views Leave a comment

Just a few thoughts I’ve been meaning to blog for a while:

1. Number of columns that can be offloaded

Exadata

Very early on in the Exadata Smart Scan implementation a restriction was added to prevent offloading more than 255 columns. This was done because of performance issues observed by customers. Smart Scan works on 1 MB chunks at a time and needs all the row pieces up to the highest #segcol referenced to be present in the 1 MB chunk that is being processed. When rows with more than 255 columns are first inserted all the row pieces are contiguous and hence likely to be present in the 1 MB chunk.

However, if the DBA has not provided sufficient freespace (PCTFREE) in the block when it is first loaded subsequent DMLs may end up splitting row pieces and moving pieces to blocks in the freelist that have more space available. PCTFREE specifies how much empty space to leave for future updates when a block is first populated.

Continue reading→

Top-N again: fetch first N rows only vs rownum

Posted on December 30, 2018 by Sayan Malakshinov Posted in adaptive serial direct path reads, CBO, oracle, query optimizing, SQL, troubleshooting 7,449 Page views Leave a comment

Three interesting myths about rowlimiting clause vs rownum have recently been posted on our Russian forum:

TopN query with rownum<=N is always faster than "fetch first N rows only" (ie. row_number()over(order by ...)<=N)
“fetch first N rows only” is always faster than rownum<=N
“SORT ORDER BY STOPKEY” stores just N top records during sorting, while “WINDOW SORT PUSHED RANK” sorts all input and stores all records sorted in memory.

Interestingly that after Vyacheslav posted first statement as an axiom and someone posted old tests(from 2009) and few people made own tests which showed that “fetch first N rows” is about 2-3 times faster than the query with rownum, the final decision was that “fetch first” is always faster.

First of all I want to show that statement #3 is wrong and “WINDOW SORT PUSHED RANK” with row_number works similarly as “SORT ORDER BY STOPKEY”:
It’s pretty easy to show using sort trace:
Let’s create simple small table Tests1 with 1000 rows where A is in range 1-1000 (just 1 block):

create table test1(a not null, b) as
  select level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS '10032 trace name context forever, level 10';

ALTER SESSION SET tracefile_identifier = 'rownum';
select * from (select * from test1 order by a) where rownum<=10;

ALTER SESSION SET tracefile_identifier = 'rownumber';
select * from test1 order by a fetch first 10 rows only;

And we can see from the trace files that both queries did the same number of comparisons:

rownum:

[sourcecode language=”sql” highlight=”7″]
—– Current SQL Statement for this session (sql_id=bbg66rcbt76zt) —–
select * from (select * from test1 order by a) where rownum<=10

—- Sort Statistics ——————————
Input records 1000
Output records 10
Total number of comparisons performed 999
Comparisons performed by in-memory sort 999
Total amount of memory used 2048
Uses version 1 sort
—- End of Sort Statistics ———————–
[/sourcecode]

[collapse]

row_number

[sourcecode language=”sql” highlight=”7″]
—– Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) —–
select * from test1 order by a fetch first 10 rows only

[collapse]

Ie. each row (except first one) was compared with the biggest value from top 10 values and since they were bigger than top 10 value, oracle doesn’t compare it with other TopN values.

And if we change the order of rows in the table both of these queries will do the same number of comparisons again:

from 999 to 0

[sourcecode language=”sql”]
create table test1(a not null, b) as
select 1000-level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS ‘10032 trace name context forever, level 10’;

ALTER SESSION SET tracefile_identifier = ‘rownum’;
select * from (select * from test1 order by a) where rownum<=10;

ALTER SESSION SET tracefile_identifier = ‘rownumber’;
select * from test1 order by a fetch first 10 rows only;
[/sourcecode]

[collapse]

rownum

[sourcecode language=”sql” highlight=”7″]
—– Current SQL Statement for this session (sql_id=bbg66rcbt76zt) —–
select * from (select * from test1 order by a) where rownum<=10

—- Sort Statistics ——————————
Input records 1000
Output records 1000
Total number of comparisons performed 4976
Comparisons performed by in-memory sort 4976
Total amount of memory used 2048
Uses version 1 sort
—- End of Sort Statistics ———————–
[/sourcecode]

[collapse]

row_number

[sourcecode language=”sql” highlight=”7″]
—– Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) —–
select * from test1 order by a fetch first 10 rows only

[collapse]

We can see that both queries required much more comparisons(4976) here, that’s because each new value is smaller than the biggest value from the topN and even smaller than lowest value, so oracle should get right position for it and it requires 5 comparisons for that (it compares with 10th value, then with 6th, 3rd, 2nd and 1st values from top10). Obviously it makes less comparisons for the first 10 rows.

Now let’s talk about statements #1 and #2:
We know that rownum forces optimizer_mode to switch to “first K rows”, because of the parameter “_optimizer_rownum_pred_based_fkr”

SQL> @param_ rownum

NAME                               VALUE  DEFLT  TYPE      DESCRIPTION
---------------------------------- ------ ------ --------- ------------------------------------------------------
_optimizer_rownum_bind_default     10     TRUE   number    Default value to use for rownum bind
_optimizer_rownum_pred_based_fkr   TRUE   TRUE   boolean   enable the use of first K rows due to rownum predicate
_px_rownum_pd                      TRUE   TRUE   boolean   turn off/on parallel rownum pushdown optimization

while fetch first/row_number doesn’t (it will be changed after the patch #22174392) and it leads to the following consequences:
1. first_rows disables serial direct reads optimization(or smartscan on Exadata), that’s why the tests with big tables showed that “fetch first” were much faster than the query with rownum.
So if we set “_serial_direct_read”=always, we get the same performance in both tests (within the margin of error).

2. In cases when index access (index full scan/index range scan) is better, CBO differently calculates the cardinality of underlying INDEX FULL(range) SCAN:
the query with rownum is optimized for first_k_rows and the cardinality of index access is equal to K rows, but CBO doesn’t reduce cardinality for “fetch first”, so the cost of index access is much higher, compare them:

rownum

[sourcecode language=”sql” highlight=”13″]
SQL> explain plan for
2 select *
3 from (select * from test order by a,b)
4 where rownum<=10;

——————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————–
| 0 | SELECT STATEMENT | | 10 | 390 | 4 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 10 | 390 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1000K| 12M| 4 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | IX_TEST_AB | 10 | | 3 (0)| 00:00:01 |
——————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – filter(ROWNUM<=10)
[/sourcecode]

[collapse]

fetch first

[sourcecode language=”sql” highlight=”13″]
SQL> explain plan for
2 select *
3 from test
4 order by a,b
5 fetch first 10 rows only;

—————————————————————————————–
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
—————————————————————————————–
| 0 | SELECT STATEMENT | | 10 | 780 | | 5438 (1)| 00:00:01 |
|* 1 | VIEW | | 10 | 780 | | 5438 (1)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 1000K| 12M| 22M| 5438 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST | 1000K| 12M| | 690 (1)| 00:00:01 |
—————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
2 – filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)
[/sourcecode]

[collapse]

fetch first + first_rows

[sourcecode language=”sql” highlight=”14″]
SQL> explain plan for
2 select/*+ first_rows */ *
3 from test
4 order by a,b
5 fetch first 10 rows only;

——————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————–
| 0 | SELECT STATEMENT | | 10 | 780 | 27376 (1)| 00:00:02 |
|* 1 | VIEW | | 10 | 780 | 27376 (1)| 00:00:02 |
|* 2 | WINDOW NOSORT STOPKEY | | 1000K| 12M| 27376 (1)| 00:00:02 |
| 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1000K| 12M| 27376 (1)| 00:00:02 |
| 4 | INDEX FULL SCAN | IX_TEST_AB | 1000K| | 2637 (1)| 00:00:01 |
——————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
2 – filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)
[/sourcecode]

[collapse]

fetch first + index

[sourcecode language=”sql” highlight=”14″]
SQL> explain plan for
2 select/*+ index(test (a,b)) */ *
3 from test
4 order by a,b
5 fetch first 10 rows only;

Predicate Information (identified by operation id):
—————————————————

1 – filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
2 – filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)
[/sourcecode]

[collapse]

So in this case we can add hints “first_rows” or “index”, or install the patch #22174392.

ps. I thought to post this note later, since I hadn’t time enough to add other interesting details about the different TopN variants, including “with tie”, rank(), etc, so I’ll post another note with more details later.

Adaptive serial direct path read decision ignores object statistics since 12.1

Posted on November 19, 2017 by Sayan Malakshinov Posted in 12c, adaptive serial direct path reads, oracle 1,905 Page views 1 Comment

On versions 11.2.0.2 – 11.2.0.4 Oracle uses objects’ statistics to make direct path reads decision (of course, if “_direct_read_decision_statistics_driven” haven’t changed it to “false”), and we can force serial direct reads on statement level using sql profiles with hints INDEX_STATS/TABLES_STATS, but since at least 12.1.0.2 this decision ignores statistics.

Btw, thanks to Jure Bratina, we know now, that we need to repeat hint TABLE_STATS at least twice to make it work 🙂 And from the following test case we know that it takes parameters from second one 🙂

Compare trace files of the following test cases:

table_stats2.sql

[sourcecode language=”sql”]
drop table t1;

create table t1 as select * from dual;
—————————-
pro ######################################;
exec dbms_stats.gather_table_stats(”,’T1′);
exec dbms_stats.set_table_stats(user,’T1′,numblks => 33333333);

col value new_val oldval noprint;
select value from v$statname n, v$mystat s
where n.statistic#=s.statistic# and n.name =’physical reads direct’;

—————————-
alter session set tracefile_identifier=’table_stats2′;
–alter session set events ‘10053 trace name context forever, level 1’;
alter session set events ‘trace[nsmtio] disk highest’;
select/*+ table_stats(t1, scale, blocks=66666666 rows=2222222222)
table_stats(t1, scale, blocks=99999999 rows=4444444444)
*/
*
from t1;
select value-&oldval directreads, value from v$statname n, v$mystat s
where n.statistic#=s.statistic# and n.name =’physical reads direct’;

[/sourcecode]

[collapse]

You can see that our hint successfully changed number of blocks and forced direct path reads on 11.2.0.4:

Oracle 11.2.0.4:

[sourcecode language=”sql” highlight=”3″]
NSMTIO: qertbFetch:DirectRead:[OBJECT_SIZE>VLOT]
NSMTIO: Additional Info: VLOT=797445
Object# = 78376, Object_Size = 66666666 blocks
SqlId = 7naawntkc57yx, plan_hash_value = 3617692013, Partition# = 0
[/sourcecode]

[collapse]

But on 12.1.0.2 and 12.2.0.1 we can see 2 lines with “NSMTIO: kcbism” with the different types(2 and 3) and different number of blocks, and the direct path read decision was based on second one – from segment header:

Oracle 12.1.0.2:

[sourcecode language=”sql” highlight=”1,2″]
NSMTIO: kcbism: islarge 1 next 0 nblks 66666666 type 3, bpid 65535, kcbisdbfc 0 kcbnhl 4096 kcbstt 983 keep_nb 0 kcbnbh 59010 kcbnwp 1
NSMTIO: kcbism: islarge 0 next 0 nblks 4 type 2, bpid 3, kcbisdbfc 0 kcbnhl 4096 kcbstt 983 keep_nb 0 kcbnbh 59010 kcbnwp 1
NSMTIO: qertbFetch:NoDirectRead:[- STT < OBJECT_SIZE < MTT]:Obect’s size: 4 (blocks), Threshold: MTT(4917 blocks),
_object_statistics: enabled, Sage: enabled,
Direct Read for serial qry: enabled(::::::), Ascending SCN table scan: FALSE
flashback_table_scan: FALSE, Row Versions Query: FALSE
SqlId: 7naawntkc57yx, plan_hash_value: 3617692013, Object#: 302342, Parition#: 0 DW_scan: disabled
[/sourcecode]

[collapse]

Oracle 12.2.0.1:

[sourcecode language=”sql” highlight=”1,2″]
NSMTIO: kcbism: islarge 1 next 0 nblks 66666666 type 3, bpid 65535, kcbisdbfc 0 kcbnhl 4096 kcbstt 1214 keep_nb 0 kcbnbh 45026 kcbnwp 1
NSMTIO: kcbism: islarge 0 next 0 nblks 4 type 2, bpid 3, kcbisdbfc 0 kcbnhl 4096 kcbstt 1214 keep_nb 0 kcbnbh 45026 kcbnwp 1
NSMTIO: qertbFetch:NoDirectRead:[- STT < OBJECT_SIZE < MTT]:Obect’s size: 4 (blocks), Threshold: MTT(6072 blocks),
_object_statistics: enabled, Sage: enabled,
Direct Read for serial qry: enabled(:::::::), Ascending SCN table scan: FALSE
flashback_table_scan: FALSE, Row Versions Query: FALSE
SqlId: 7naawntkc57yx, plan_hash_value: 3617692013, Object#: 174411, Parition#: 0 DW_scan: disabled
[/sourcecode]

[collapse]

And similar example, but for IFFS(index fast full scan):
nb: I set the number of index blocks using dbms_stats to 33333000 and hinted the query with 77777700

index_stats2.sql

[sourcecode language=”sql”]
drop table t2 purge;
ALTER SESSION SET optimizer_dynamic_sampling = 0;
ALTER SESSION SET "_optimizer_use_feedback" = FALSE;
ALTER SESSION SET optimizer_adaptive_features = FALSE;
ALTER SESSION SET optimizer_adaptive_plans=FALSE;

create table t2(x) as select level from dual connect by level<=1000;
create index t2_ix on t2(1,x,rpad(x,100));

begin
dbms_stats.gather_table_stats(”,’T2′,cascade => true);
dbms_stats.set_table_stats(user,’T2′ ,numblks => 33333333);
dbms_stats.set_index_stats(user,’T2_IX’,numlblks => 33333000);
end;
/

col value new_val oldval noprint;
select value from v$statname n, v$mystat s
where n.statistic#=s.statistic# and n.name =’physical reads direct’;

alter session set tracefile_identifier=’index_stats2′;
alter session set events ‘trace[nsmtio] disk highest’;
select/*+ index_stats(t2, t2_ix, scale, blocks=7777700)
index_ffs(t2 t2_ix)
dynamic_sampling(0)
*/
count(*) cnt2
from t2;
select value-&oldval directreads, value from v$statname n, v$mystat s
where n.statistic#=s.statistic# and n.name =’physical reads direct’;

disc;
[/sourcecode]

[collapse]

You can see that on 11.2.0.4 oracle gets number of blocks from the hint (7777700)

Oracle 11.2.0.4 - index_stats:

[sourcecode language=”sql” highlight=”2″]
NSMTIO: qerixFetchFastFullScan:DirectRead[OBJECT_SIZE > VLOT]:
NSMTIO: AdditionalInfo: Object_size: 7777700 (blocks), vlot=797445
SqlId=by2zv0k566hj5, plan_hash_value=3419274230,Object#=78375
[/sourcecode]

[collapse]

From the first line we can see that kcbism takes the hinted number of blocks, but later kcbivlo rewrites it with the number from segment header:

Oracle 12.1.0.2 - index_stats:

[sourcecode language=”sql” highlight=”2″]
NSMTIO: kcbism: islarge 1 next 0 nblks 7777700 type 3, bpid 65535, kcbisdbfc 0 kcbnhl 4096 kcbstt 983 keep_nb 0 kcbnbh 59010 kcbnwp 1
NSMTIO: kcbivlo: nblks 22 vlot 500 pnb 49175 kcbisdbfc 0 is_large 0
NSMTIO: qerixFetchFastFullScan:[MTT < OBJECT_SIZE < VLOT]:NSMTIO: AdditionalInfo: Object_size: 22 (blocks), vlot=245875
SqlId=by2zv0k566hj5, plan_hash_value=3419274230,Object#=302347
[/sourcecode]

[collapse]

Oracle 12.2.0.1 - index_stats:

[sourcecode language=”sql” highlight=”2″]
NSMTIO: kcbism: islarge 1 next 0 nblks 7777700 type 3, bpid 65535, kcbisdbfc 0 kcbnhl 4096 kcbstt 1214 keep_nb 0 kcbnbh 45026 kcbnwp 1
NSMTIO: kcbivlo: nblks 22 vlot 500 pnb 60729 kcbisdbfc 0 is_large 0
NSMTIO: qerixFetchFastFullScan:[MTT < OBJECT_SIZE < VLOT]:NSMTIO: AdditionalInfo: Object_size: 22 (blocks), vlot=303645
SqlId=by2zv0k566hj5, plan_hash_value=3419274230,Object#=174409
[/sourcecode]

[collapse]

So we have 2 options to force direct reads:
1. to execute alter session set “_serial_direct_read”=’ALWAYS’;
2. or to force parallel plan to get parallel direct path reads (we can do it with even with dop=1)

Revisiting buffer cache Very Large Object Threshold

Posted on November 15, 2017 by Roger MacNicol Posted in adaptive serial direct path reads, oracle, SmartScan 1,943 Page views Leave a comment

If you turn on NSMTIO tracing you will see references to VLOT:

qertbFetch:[MTT &lt; OBJECT_SIZE &lt; VLOT]: Checking cost to read from caches (local/remote) and checking storage reduction factors (OLTP/EHCC Comp)

I had said you could ignore VLOT and Frits Hoogland pointed out that tracing showed it had some impact, so let me clarify:

VLOT is the absolute upper bound that cached reads can even be considered.

This defaults to 500% of the number of buffers in the cache i.e.

_very_large_object_threshold = 500

While this number is not used in any calculations, it is used in two places as a cutoff to consider those calculations

1) Can we consider using Automatic Big Table Caching (a.k.a. DWSCAN) for this object?

2) Should we do a cost analysis for Buffer Cache scan vs Direct Read scan on tables larger than the MTT?

The logic for tables above the calculated medium table threshold (MTT) and that are NOT part of searched DMLs and are NOT on Exadata with statistics based storage reduction factor enabled (_statistics_based_srf_enabled) is:

If _serial_direct_read == ALWAYS, use Direct Read
If _serial_direct_read == NEVER, use Buffer Cache
If _serial_direct_read == AUTO and #blocks in table < VLOT, use cost model
Else use Direct Read “qertbFetch:DirectRead:[OBJECT_SIZE>VLOT]”

In practice 5X buffer cache is so large the cost based decision will come to the same conclusion anyway – the default VLOT simply saves time spent doing the analysis.

For example, I got a quick count of the number of blocks in non-partitioned TPC_H Scale 1 lineitem

select segment_name,sum(blocks),sum(bytes) from user_extents where segment_name='LINEITEM'

and created my buffer cache to be exactly the same size. With this setup, setting _very_large_object_threshold=100 bypassed the cost model and went straight to DR scan, while setting it to 200 forced the use of the cost model.

The moral of this is that the default value of VLOT rarely changes the decisions made unless you reduce VLOT to a much smaller multiplier of the cache size and can start to see it cause a few more of your larger buffer cache scans move to direct read when they are no longer eligible for cost analysis. If you wish to stop some of the largest buffer cache scans from happening you would need to set _very_large_object_threshold less than 200.

What’s new in 12.2 CELLMEMORY Part 3

Posted on May 4, 2017 by Roger MacNicol Posted in 12.2, inmemory, oracle, SmartScan 1,601 Page views 1 Comment

The Cellmemory Stats in RDBMS

The RDBMS stats for Cellmemory are designed to closely follow the pattern used by the Inmemory stats

Query Stats

Each column in each one MB of disk blocks will be rewritten into one IMC format Column CU in flash and a set of Column CUs comprise an overall Compression Unit so these stats reflect the number of 1 MB rewrites that were processed (not the number of column CUs).

“cellmemory IM scan CUs processed for query”
– #1 MB chuncks scanned in MEMCOMPRESS FOR QUERY format
“cellmemory IM scan CUs processed for capacity”
– #1 MB chuncks scanned in MEMCOMPRESS FOR CAPACITY format
“cellmemory IM scan CUs processed no memcompress”
– #1 MB chuncks scanned in NO CELLMEMORY format (12.1.0.2 format)

Load Stats

“cellmemory IM load CUs for query”
– #1 MB chunks successfully rewritten from 12.1.0.2 to MEMCOMPRESS FOR QUERY format
“cellmemory IM load CUs for capacity”
– #1 MB chunks successfully rewritten from 12.1.0.2 to MEMCOMPRESS FOR CAPACITY format
“cellmemory IM load CUs no memcompress”
– #1 MB chunks successfully rewritten into 12.1.0.2 format

Before a rewrite happens a routine is called that looks through the blocks in the 1 MB chunk and determines if it is eligible for write. Reasons it may not be include transactional metadata from the commit cache, the presence of blocks formats that can’t be rewitten (although this list is getting smaller with each rpm), and the amount of space the rewrite will take up.

A rewrite into 12.1.0.2 format must fit in the original 1 MB of flash cache. An IMC format rewrite is not permitted to exceed 8 MB. This limit is highly unlikely to be reached by MEMCOMPRESS FOR CAPACITY but could be reached when trying to rewrite HCC blocks with much greater than 8X original compression capacity into MEMCOMPRESS FOR QUERY format. This is one reason that the default is FOR CAPACITY.

“cellmemory IM scan CUs rejected for query”
– #1 MB chunks that could not be rewritten into MEMCOMPRESS FOR QUERY for whatever reason
“cellmemory IM scan CUs rejected for capacity
– #1 MB chunks that could not be rewritten into MEMCOMPRESS FOR CAPACITY for whatever reason
“cellmemory IM scan CUs rejected no memcompress”
– #1 MB chunks that could not even be rewritten into 12.1.0.2 format for whatever reason

When bloggers get it wrong – part 2

Posted on May 4, 2017 by Roger MacNicol Posted in adaptive serial direct path reads, oracle, SmartScan, trace, troubleshooting 2,058 Page views 1 Comment

In Part 2 we are going to look at making use of the trace events that show what was discussed in Part 1.
NB: Oracle no longer adds new numeric trace events, going forward new trace events use the Unified Tracing Service whose grammer is much simpler. The elements we need are:

trace[[x.]y] disk = [ lowest | low | medium | high | highest ]

For example Table Scan tracing is in the DATA hierachy:

[1] DATA
[2] KDS    “Kernel Data Scan”
[3] KDSFTS  “Full Table Scan”
[3] KDSRID  “RowID”

‘trace[KDSFTS] disk low’ – only trace full table scans
‘trace[KDSRID] disk low’ – only trace fetch by rowid
‘trace[KDS.*] disk low’ – trace both table scans and fetch by rowid
NB: don’t use ‘lowest’ with KDS – it is used for memory tracing only

Tracing Full Table Scans: KDSFTS

At the beginning of a granule we see if it is possible to use Turbo Scan (which is a prerequisite for using Exadata Smart Scan) and the data object number being scanned:

Continue reading→

When bloggers get it wrong – part 1

Posted on May 4, 2017 by Roger MacNicol Posted in adaptive serial direct path reads, oracle, SmartScan 2,379 Page views 3 Comments

I’ve read a number of blog entries from various people who’ve clearly put great diligence into trying to understand how the decision to use the buffer cache for a table scan or whether it is better to use direct read for the scan is made. Since this is critical decision from a performance perspective, I’ve decided to write a more definitive account. Part 1 deals with a few principles and part 2 (which will follow very shortly) will show how this works with tracing and clear up a few misunderstandings. Part 3 will deal with PQ in more detail as soon as I have time.

Serial Scans

Small Table Threshold (STT)

Serial scans use direct path reads when the object is not a small table and, since 11.2.0.2, the size used to determine whether it is “small” has come stats rather than the size from segment header.

_small_table_threshold defaults to the 2% buffer cache or to 20 blocks whichever is bigger (NB: this parameter ignores the effects of compression). An object is small enough for buffer cache reads to be more effective than direct path read if the number of blocks it contains is less than or equal to the value of _small_table_threshold. Such scans will bump the stat “table scans (short tables)”

Medium Table Threshold (MTT)

So what happens when an object is bigger than STT? We fall into the MTT case: MTT is not directly configurable and defaults to 10% of the buffer cache blocks. But it is also depends on a cost based decision that includes how busy the cache is.
NB: PQ and Exadata have different rules.
NB: MTT also depends on _object_statistics being enabled.

For STT < #blocks < MTT, if the cost based decision means a buffer cache scan is chosen, the stats “table scans (short tables)” will still get bumped.

So when is a medium table not worth caching?

First, when MTT < #blocks (ignore VLOT in blogs, it hasn’t been used for many years but persists in the trace and blogs for historic reasons).

Second, if the scan is not a select for update: different rules apply because cache scans are more useful if the segment being updated is first the subject of a query. Table larger than MTT also do factor in storage reduction factor. The ratios used are 10X for HCC and 3.3.X for OLTP.

Third, a cost based analysis is done comprising the % currently in cache, the % buffers are dirty (check-pointing cost), the current I/O bottleneck, and any RAC access costs. If the buffer cache is still used the stat “table scans (long tables)” will get bumped. But, if Direct Read is chosen the stat “table scans (direct read)” will get bumped.

Automatic Big Table Caching (ABTC)

So does this mean the buffer cache never helps with large tables? Not at all, large tables can still use the buffer cache for a portion of their scan based on the public parameter DB_BIG_TABLE_CACHE_PERCENT_TARGET which sets aside a percentage of the buffer cache for larger tables

On a single instance, ABTC works with both Serial and PQ scans BUT mixing Serial and PQ may lead to fewer cache hits. On RAC, it only works with PQ and requires that PARALLEL_DEGREE_POLICY set to AUTO or ADAPTIVE.

ABTC works by tracking the heat of medium and large tables and is designed to take advantage of any runs of buffers currently cached. Then if the decision to use ABTC is made, it periodically checks the cache for how many buffers in the next section are already in memory to see whether Direct Read or Buffer Cache scan would be better for the next chunk. Let’s see how the heat tracking works which looking at the v$bt_scan_obj_temps view.

SQL> select DATAOBJ#,SIZE_IN_BLKS,TEMPERATURE from V$BT_SCAN_OBJ_TEMPS; 

DATAOBJ#   SIZE_IN_BLKS TEMPERATURE
---------- ------------ -----------    
     79042         1689        3000

Each time we scan a medium or larger table it will bump that object’s temperature by 1000. Then, the object with the highest temperature will get absolute priority for caching and, if needed, will evict any previously cached ABTC blocks of other objects.
If the hottest object only partially fits in the buffer cache (which is quite likely), the first section that fits will be read through the buffer cache and cached then the rest of the scan will switch to Direct Read when the ABTC allocation in the cache is fully utilized.
NB: The object will still be check-pointed at the beginning so that it is safe to switch to Direct Read at any time.

Part 2 will include tracing to how how to make use of this feature.