Oracle SQL | Yearly Archives: 2017

Why are HCC stats being bumped when Smart Scanning row major data in 12.2

Posted on May 4, 2017 by Roger MacNicol Posted in oracle, SmartScan 1,634 Page views Leave a comment

In 12.2, there is a stat “cell blocks pivoted” that points to a new optimization. When Smart Scan processes data, it has to create a new synthetic block that only contains the columns that are needed and the rows that pass the offloaded predicates.

If Smart Scan has to create a new block, why would you create a new row-major block when we can just as easily create uncompressed HCC block? It is roughly the same amount of work but once the synthetic block is returned to the RDBMS for processing, columnar data is 10X cheaper to process when the query uses rowsets. SO where does the name of this stat come from? Simple, it works just like a spreadsheet PIVOT operation, swapping rows for columns.

So what is a “rowset”? It simply means that a step in a query plan will take in a set of rows before processing and then process that entire batch of rows in one go before fetching more rows; this is significantly more efficient than processing one row at a time. You can see when a query uses rowsets via:

select * from table(dbms_xplan.display_cursor('<sql_id>',1,'-note +projection'));

in which case you will see projection metadata with details such as “(rowsets=200)”.

When Smart Scan sees that the query on the RDBMS is using rowsets, it tries to create a columnar output block instead of a row-major output block and when this arrives on the RDBMS, the table scan will see an HCC block and bump the HCC stats even though the table being scanned doesn’t use HCC. Let’s see how this works in practice:

Continue reading→

Using INMEMORY with External Tables

Posted on May 4, 2017 by Roger MacNicol Posted in 12c, External tables, inmemory, oracle, SmartScan 1,621 Page views Leave a comment

Several people have asked if there is any way to use INMEMORY on External Tables because the INMEMORY syntax is not supported in Create Table and Alter Table with Organization External (actually Create Table parses it but then ignores it).

While there is no way out of having to load the data into Oracle 12.1.0.2, there is a short cut to simplify life when you have an external process recreating flat file data on a regular basis and want to query the latest incarnation against corporate data in Oracle Tablespaces. One example we’ve seen of this is where you have a map-reduce job on Hadoop that highly summarizes large amounts of ephemeral data and it is the summary data that needs to be joined against Oracle data.

Instead of truncating the Oracle table and doing a flat file load each time, we can use a Materialized View on the external table and simply issue an on-demand refresh and let the materialized view take care of the truncate and load under the covers for us.

Since INMEMORY does support Materialized Views, we can now automatically get INMEMORY query speeds on the data from an external source.

Let’s see how this works. For example here’s a loader table I use with TPC-H Scale 1 for local testing:

Continue reading→

More on tracing the offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,700 Page views Leave a comment

I posted a while back on how to use Tracing Hybrid Columnar Compression in an offload server so this is a quick follow up.

I have trouble remembering the syntax for setting a regular parameter in an offload server without bouncing it. Since I need to keep this written down somewhere I thought it might be use to support folks and dbas.
I forgot to show you how to specify which offload group to set the trace event

So this example should do both:

CellCLI > alter cell offloadGroupEvents = "immediate cellsrv.cellsrv_setparam('my_parameter, 'TRUE')", offloadGroupName = "SYS_122110_160621"

this will, of course, set a parameter temporarily until the next time the offload server is bounced, but also adding it to the offload group’s init.ora will take care of that.

Tracing Hybrid Columnar Compression in an offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,723 Page views 3 Comments

I had previously commented on how to use the FPLIB facility in a trace event but the question came up today of how to trace HCC in an offload server. The facility name in this case is ADVCMP (Advanced Compression) and the hierarchy is:

ADVCMP_MAIN
    ADVCMP_COMP
    ADVCMP_DECOMP

No compression occurs on the cell so we are only interested in the decompression tracing in an offload server.

So in this case the correct syntax is [facility.event] so:

cellcli -e 'alter cell offloadgroupEvents = "trace[advcmp.advcmp_decomp.*] disk=lowest"'

Using trace events with an offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,889 Page views 2 Comments

I’ve noticed several people who were familiar with using trace events with cellsrv, were uncertain about how to use tracing the new offload server architecture. So whereas in the past you could have added tracing to SmartScan processing with:

> cellcli -e 'alter cell events = "trace[fplib.sage_data] disk=lowest, memory=lowest"'

the new syntax to turn it on is:

> cellcli -e 'alter cell offloadgroupEvents = "trace[fplib.sage_data] disk=lowest, memory=lowest"'

and to turn it off:

> cellcli -e 'alter cell offloadgroupEvents = "trace[fplib.sage_data] off"'

Do not use anything higher than “disk=medium, memory=medium” since the higher levels of tracing will cause the heartbeat monitor to fail with timeout issues.

The SmartScan trace events available under the fplib (Filter Processing Library) facility are:

        fplib.fplib_main,  fplib.sage_cache,  fplib.sage_txn,  fplib.sage_data,  fplib.sage_index

In the cell trace directory hierarchy each offload server will have its own trace directories

diag/asm/cell/SYS_122010_150220/trace
diag/asm/cell/SYS_112331_141117/trace

In a future post I will cover how to use named offload groups to isolate the traces for a single session.

Roger MacNicol,

Data Storage Technology Group

What’s new in 12.2 CELLMEMORY Part 3

Posted on May 4, 2017 by Roger MacNicol Posted in 12.2, inmemory, oracle, SmartScan 1,629 Page views 1 Comment

The Cellmemory Stats in RDBMS

The RDBMS stats for Cellmemory are designed to closely follow the pattern used by the Inmemory stats

Query Stats

Each column in each one MB of disk blocks will be rewritten into one IMC format Column CU in flash and a set of Column CUs comprise an overall Compression Unit so these stats reflect the number of 1 MB rewrites that were processed (not the number of column CUs).

“cellmemory IM scan CUs processed for query”
– #1 MB chuncks scanned in MEMCOMPRESS FOR QUERY format
“cellmemory IM scan CUs processed for capacity”
– #1 MB chuncks scanned in MEMCOMPRESS FOR CAPACITY format
“cellmemory IM scan CUs processed no memcompress”
– #1 MB chuncks scanned in NO CELLMEMORY format (12.1.0.2 format)

Load Stats

“cellmemory IM load CUs for query”
– #1 MB chunks successfully rewritten from 12.1.0.2 to MEMCOMPRESS FOR QUERY format
“cellmemory IM load CUs for capacity”
– #1 MB chunks successfully rewritten from 12.1.0.2 to MEMCOMPRESS FOR CAPACITY format
“cellmemory IM load CUs no memcompress”
– #1 MB chunks successfully rewritten into 12.1.0.2 format

Before a rewrite happens a routine is called that looks through the blocks in the 1 MB chunk and determines if it is eligible for write. Reasons it may not be include transactional metadata from the commit cache, the presence of blocks formats that can’t be rewitten (although this list is getting smaller with each rpm), and the amount of space the rewrite will take up.

A rewrite into 12.1.0.2 format must fit in the original 1 MB of flash cache. An IMC format rewrite is not permitted to exceed 8 MB. This limit is highly unlikely to be reached by MEMCOMPRESS FOR CAPACITY but could be reached when trying to rewrite HCC blocks with much greater than 8X original compression capacity into MEMCOMPRESS FOR QUERY format. This is one reason that the default is FOR CAPACITY.

“cellmemory IM scan CUs rejected for query”
– #1 MB chunks that could not be rewritten into MEMCOMPRESS FOR QUERY for whatever reason
“cellmemory IM scan CUs rejected for capacity
– #1 MB chunks that could not be rewritten into MEMCOMPRESS FOR CAPACITY for whatever reason
“cellmemory IM scan CUs rejected no memcompress”
– #1 MB chunks that could not even be rewritten into 12.1.0.2 format for whatever reason

What’s new in 12.2 CELLMEMORY Part 2

Posted on May 4, 2017 by Roger MacNicol Posted in inmemory, oracle, SmartScan 1,688 Page views 1 Comment

Question: do I need to know anything in this blog post?

Answer: No, it is a true cache and CELLMEMORY works automatically

Question: so why should I read this blog post?

Answer: because you like to keep a toolkit of useful ways to control the system when needed

The DDL

The Exadata engineering team has done a lot of work to make the flash cache automatically handle a variety of very different workloads happening simultaneously which is why we now typically discourage users from specifying:

SQL> Alter Table T storage (cell_flash_cache keep);

and trust the AUTOKEEP pool to recognize table scans that would most benefit from caching and to cache them in a thrash resistant way. Our Best Practice for CELLMEMORY is typically not to use the DDL. That said, there will be cases when the DBA wishes to override the system’s default behavior and we will look at a couple of those reasons. But first here’s the DDL which is a very cut down portion of the INMEMORY syntax:

Alter Table T  [ [ NO ] CELLMEMORY [ MEMCOMPRESS FOR [ QUERY | CAPACITY ] [ LOW | HIGH ] ] ]

So let’s break this down piece by piece.

Examples

SQL> Alter Table T NO CELLMEMORY

The NO CELLMEMORY clause prevents a table being eligible for the rewrite from 12.1.0.2 format to 12.2 In-Memory format. There are a variety of reasons you may wish to do this and we’ll look at those at the end of this post.

SQL> Alter Table T CELLMEMORY
SQL> Alter Table T CELLMEMORY MEMCOMPRESS FOR CAPACITY

These allows a table to be cached in the default 12.2 In-Memory format (you’d only ever need this to undo a NO CELLMEMORYyou done earlier or to revert a change in the compression level used).

SQL> Alter Table T CELLMEMORY MEMCOMPRESS FOR QUERY

I mentioned above that CELLMEMORY, by default, undergoes two rounds of compression: first a round of semantic compression and secondly a round of bitwise compression using LZO is applied. This is the default to try and keep CELLMEMORY’s flash cache footprint as close as we can to HCC flash space usage but run queries much faster than HCC. But, even though LZO has a relatively low decompression cost it is not zero. There are some workloads which run faster with MEMCOMPRESS FOR QUERY but they also use typically twice as much flash space. It would not be appropriate for the system to automatically start using twice as much flash but if you wish to experiment with this, the option is there. Also compressing with LZO takes CPU time which is not needed with MEMCOMPRESS FOR QUERY.

What about [ LOW | HIGH ] ?

Currently, unlike INMEMORY, these are throw away words but we retained them in the grammar for future expansion and because people are used to specifying them in a MEMCOMPRESS clause. this means in effect:

CELLMEMORY’S MEMCOMPRESS FOR QUERY
is roughly equivalent of INMEMORY’S MEMCOMPRESS FOR QUERY HIGH
CELLMEMORY’S MEMCOMPRESS FOR CAPACITY
is roughly equivalent of INMEMORY’S MEMCOMPRESS FOR CAPACITY LOW

Why might I want to overrule the default?

There are a few reasons:

Turning CELLMEMORY off may be useful if you know an HCC table will only be queried a few times and then dropped or truncated soon and it is not worth the CPU time to create the In-Memory formats
Turning CELLMEMORY off may be useful if your cells are already under a lot of CPU pressure and you wish to avoid the CPU of creating the In-Memory formats
Switching to MEMCOMPRESS FOR QUERY may be useful to get better query performance on some workloads (YMMV) and reduce the CPU cost of creating In-Memory formats

What about INMEMORY with CELLMEMORY?

We allow both INMEMORY and CELLMEMORY on the same table

SQL> Create Table T (c1 number) INMEMORY NO CELLMEMORY;
SQL> Create Table T (c1 number) INMEMORY CELLMEMORY MEMCOMPRESS FOR QUERY;

Why would you want both INMEMORY and CELLMEMORY?

DBIM implements a priority system where the In-Memory area is first loaded with the critical tables and then finally down to the PRIORITY LOW tables. If you have a PRIORITY LOW table that is unlikely to get loaded in memory it is fine to also specify CELLMEMORY so as to still get columnar performance. Note: an HCC encoded INMEMORY table will still get automatic CELLMEMORY if you don’t use any DDL at all.

—
Roger

What’s new in 12.2 CELLMEMORY Part 1

Posted on May 4, 2017 by Roger MacNicol Posted in 12.2, inmemory, oracle, SmartScan 1,684 Page views 1 Comment

Many people know that in 12.1.0.2 we introduced a ground-breaking columnar cache that rewrote 1 MB chunks of HCC format blocks in the flash cache into pure columnar form in a way that allowed us to only do the I/O for the columns needed but also to recreate the original block when that
was required.

This showed up in stats as “cell physical IO bytes saved by columnar cache”.

But in 12.1.0.2 we had also introduced Database In-Memory (or DBIM) that rewrote heap blocks into pure columnar form in memory. That project introduced:

new columnar formats optimized for query performance
a new way of compiling predicates that supported better columnar execution
the ability to run predicates against columns using SIMD instructions which could execute the predicate against multiple rows simultaneously

so it made perfect sense to rework the columnar cache in 12.2 to take advantage of the new In-Memory optimizations.

Quick reminder of DBIM essentials

In 12.2, tables have to be marked manually for DBIM using the INMEMORY keyword:

SQL> Alter Table INMEMORY

When a scan on a table tagged as INMEMORY is queried the background process is notified to start loading it into the INMEMORY area. This design was adopted so that the user’s scans are not impeded by loading. Subsequent scans that come along will check what percentage is currently loaded in memory and make a rule based decision:

For Exadata

Greater than 80% populated, use In-Memory and the Buffer Cache for the rest
Less than 80% populated, use Smart Scan
The 80% cutoff between BC and DR is configurable with an undocumented underscore parameter
Note: if an In-Memory scan is selected even for partially populated, Smart Scan is not used

For non-Exadata

Greater than 80% populated, use In-Memory and the Buffer Cache for the rest
Less than 80% populated, use In-Memory and Direct Read for the rest
Note: this requires that segment to be check pointed first
The 80% cutoff between BC and DR is configurable with an undocumented underscore parameter

While DBIM can provides dramatic performance improvements it is limited by the amount of usable SGA on the system that can be set aside for the In-Memory area. After that performance becomes that of disk access to heap blocks from the flash cache or disk groups. What was needed was a way to increase the In-Memory area so that cooler segments could still benefit from the In-Memory formats without using valuable RAM which is often a highly constrained resource.

Cellmemory

Cellmemory works in a similar way to the 12.1.0.2 columnar cache in that 1 MB of HCC formatted blocks are cached in columnar form automatically without the DBA needing to be involved or do anything. This means columnar cache scans are cached after Smart Scan has processed the blocks rather than before as happens with ineligible scans. The 12.1.0.2 columnar cache format simply takes all the column-1 compression units (CUs) and stores them contiguously, then all the column-2 CUs stored contiguously etc. so that each column can be read directly from the cache without reading unneeded columns. This happens during Smart Scan so that the reformated blocks are returned to the cell server along with the query results for that 1 MB chunk.

In 12.2, eligible scans continue to be cached in the 12.1.0.2 format columnar cache format after Smart Scan has processed the blocks so that columnar disk I/O is available immediately. The difference is that if and only if the In-Memory area is in use on the RDBMS node (i.e. the In-Memory feature is already in use in that database), the address of the beginning of the columnar cache entry is added to a queue so that a background process running at a lower priority can read those cache entries and invoke the DBIM loader to rewrite the cache entry into In-Memory formatted data.

Unlike the columnar cache which has multiple column-1 CUs, the IMC format creates a single column using the new formats that use semantic compression and support SIMD etc on the compressed intermediate compressed data. By default a second round of compression using LZO is then applied. When 1 MB of HCC ZLIB compressed blocks are rewritten in this way they typically take around 1.2 MB (YMMV obviously).

Coming up

Up-coming blog entries will cover:

Overriding the default behaviour with DDL
New RDBMS stats for Cellmemory
New cellsrv stats for Cellmemory
Flash cache pool changes
Tracing Cellmemory

When bloggers get it wrong – part 2

Posted on May 4, 2017 by Roger MacNicol Posted in adaptive serial direct path reads, oracle, SmartScan, trace, troubleshooting 2,110 Page views 1 Comment

In Part 2 we are going to look at making use of the trace events that show what was discussed in Part 1.
NB: Oracle no longer adds new numeric trace events, going forward new trace events use the Unified Tracing Service whose grammer is much simpler. The elements we need are:

trace[[x.]y] disk = [ lowest | low | medium | high | highest ]

For example Table Scan tracing is in the DATA hierachy:

[1] DATA
[2] KDS    “Kernel Data Scan”
[3] KDSFTS  “Full Table Scan”
[3] KDSRID  “RowID”

‘trace[KDSFTS] disk low’ – only trace full table scans
‘trace[KDSRID] disk low’ – only trace fetch by rowid
‘trace[KDS.*] disk low’ – trace both table scans and fetch by rowid
NB: don’t use ‘lowest’ with KDS – it is used for memory tracing only

Tracing Full Table Scans: KDSFTS

At the beginning of a granule we see if it is possible to use Turbo Scan (which is a prerequisite for using Exadata Smart Scan) and the data object number being scanned:

Continue reading→

When bloggers get it wrong – part 1

Posted on May 4, 2017 by Roger MacNicol Posted in adaptive serial direct path reads, oracle, SmartScan 2,460 Page views 3 Comments

I’ve read a number of blog entries from various people who’ve clearly put great diligence into trying to understand how the decision to use the buffer cache for a table scan or whether it is better to use direct read for the scan is made. Since this is critical decision from a performance perspective, I’ve decided to write a more definitive account. Part 1 deals with a few principles and part 2 (which will follow very shortly) will show how this works with tracing and clear up a few misunderstandings. Part 3 will deal with PQ in more detail as soon as I have time.

Serial Scans

Small Table Threshold (STT)

Serial scans use direct path reads when the object is not a small table and, since 11.2.0.2, the size used to determine whether it is “small” has come stats rather than the size from segment header.

_small_table_threshold defaults to the 2% buffer cache or to 20 blocks whichever is bigger (NB: this parameter ignores the effects of compression). An object is small enough for buffer cache reads to be more effective than direct path read if the number of blocks it contains is less than or equal to the value of _small_table_threshold. Such scans will bump the stat “table scans (short tables)”

Medium Table Threshold (MTT)

So what happens when an object is bigger than STT? We fall into the MTT case: MTT is not directly configurable and defaults to 10% of the buffer cache blocks. But it is also depends on a cost based decision that includes how busy the cache is.
NB: PQ and Exadata have different rules.
NB: MTT also depends on _object_statistics being enabled.

For STT < #blocks < MTT, if the cost based decision means a buffer cache scan is chosen, the stats “table scans (short tables)” will still get bumped.

So when is a medium table not worth caching?

First, when MTT < #blocks (ignore VLOT in blogs, it hasn’t been used for many years but persists in the trace and blogs for historic reasons).

Second, if the scan is not a select for update: different rules apply because cache scans are more useful if the segment being updated is first the subject of a query. Table larger than MTT also do factor in storage reduction factor. The ratios used are 10X for HCC and 3.3.X for OLTP.

Third, a cost based analysis is done comprising the % currently in cache, the % buffers are dirty (check-pointing cost), the current I/O bottleneck, and any RAC access costs. If the buffer cache is still used the stat “table scans (long tables)” will get bumped. But, if Direct Read is chosen the stat “table scans (direct read)” will get bumped.

Automatic Big Table Caching (ABTC)

So does this mean the buffer cache never helps with large tables? Not at all, large tables can still use the buffer cache for a portion of their scan based on the public parameter DB_BIG_TABLE_CACHE_PERCENT_TARGET which sets aside a percentage of the buffer cache for larger tables

On a single instance, ABTC works with both Serial and PQ scans BUT mixing Serial and PQ may lead to fewer cache hits. On RAC, it only works with PQ and requires that PARALLEL_DEGREE_POLICY set to AUTO or ADAPTIVE.

ABTC works by tracking the heat of medium and large tables and is designed to take advantage of any runs of buffers currently cached. Then if the decision to use ABTC is made, it periodically checks the cache for how many buffers in the next section are already in memory to see whether Direct Read or Buffer Cache scan would be better for the next chunk. Let’s see how the heat tracking works which looking at the v$bt_scan_obj_temps view.

SQL> select DATAOBJ#,SIZE_IN_BLKS,TEMPERATURE from V$BT_SCAN_OBJ_TEMPS; 

DATAOBJ#   SIZE_IN_BLKS TEMPERATURE
---------- ------------ -----------    
     79042         1689        3000

Each time we scan a medium or larger table it will bump that object’s temperature by 1000. Then, the object with the highest temperature will get absolute priority for caching and, if needed, will evict any previously cached ABTC blocks of other objects.
If the hottest object only partially fits in the buffer cache (which is quite likely), the first section that fits will be read through the buffer cache and cached then the rest of the scan will switch to Direct Read when the ABTC allocation in the cache is fully utilized.
NB: The object will still be check-pointed at the beginning so that it is safe to switch to Direct Read at any time.

Part 2 will include tracing to how how to make use of this feature.