Oracle SQL | Monthly Archives: May 2017

Working around heatbeat issues caused by tracing or by regexp

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,568 Page views Leave a comment

I had noted in my first post that using the highest level of tracing caused timeout issues with the offload server heartbeat monitor. Heartbeat issues can also occur with expensive (and badly formed) regexp expressions. By default the heartbeat monitor is set to 6 seconds which is the maximum permitted to process 1MB data in the offload server and mark the task completed and is far more time than is reasonably expected to take.

Operations such as expensive tracing to disk or badly formed regexp expressions that cause that time period to be exceeded lead to this in the alert log:

State dump signal delivered to CELLOFLSRV&lt;10180> by pid - 9860, uid - 3318
Thu Mar  5 12:26:31 2015 561 msec State dump completed for CELLOFLSRV&lt;10180>
Clean shutdown signal delivered to CELLOFLSRV&lt;10180> by pid - 9860, uid - 3318
CELLOFLSRV &lt;10180> is exiting with code 1

where the restart server bounces the offload server to clear the perceived hang. Increasing the timeout via:

CellCLI> alter cell events = "immediate cellsrv.cellsrv_setparam('_cell_oflsrv_heartbeat_timeout_sec','60')"

enables the tracing to proceed without causing the restart server.

My point in writing this entry was to provide a work-around when tracing is needed but also to address a couple of blog posts I’d seen that recommend leaving it set at 60 or 90 seconds. This is not a good idea. The heartbeat exists to catch genuine but rare issues and leaving this set to an increased value will hinder the offload server restarting quickly to resume work. This is one parameter that shoud be reset to the default when the work-around is no longer needed unless otherwise directed by support.

Roger MacNicol

Why you dont want to set _partition_large_extents FALSE

Posted on May 4, 2017 by Roger MacNicol Posted in oracle, SmartScan, undocumented 2,141 Page views Leave a comment

I’ve seen some blogs recommending that _partition_large_extents be set to FALSE for a variety of space conserving reasons without the authors thinking about the negative impact this is going to have on Smart Scan. Large Extents cause an INITIAL allocation of 8 MB and a NEXT allocation of 1 MB and they have been the default for table spaces on Exadata since 11.2.0.2. You can verify that large extents are in use by a given table or partition by:

Select segment_flags 
From sys_dba_segs 
where segment_name = <table_name> 
and owner = <schema_name>;

The segment flag bit for large extents is 0x40000000.

This pattern of allocation is design to work optimally with Smart Scan because Smart Scan intrinsically works in 1 MB chunks. Reads of ASM allocation units are split into maximum 1 MB chunks to be passed to the filter processing library to have their blocks sliced and diced to create the synthetic blocks that contain only the rows and columns of interest to return to the table scan driver. When less than 1 MB gets allocated at a time to a segment and then the next contiguous blocks gets allocated to a different segment, each separate run of blocks will be read by a different MBR. Each run will be passed separately to Smart Scan and we get sub-optimal chunks to work on increasing both the overhead of processing and the number of round trips needed to process the table scan. The design of Smart Scan is predicated on scooping up contiguous runs of data from disk for efficient processing.

This matters particularly for HCC data and for chained rows.

Continue reading→

Why are HCC stats being bumped when Smart Scanning row major data in 12.2

Posted on May 4, 2017 by Roger MacNicol Posted in oracle, SmartScan 1,617 Page views Leave a comment

In 12.2, there is a stat “cell blocks pivoted” that points to a new optimization. When Smart Scan processes data, it has to create a new synthetic block that only contains the columns that are needed and the rows that pass the offloaded predicates.

If Smart Scan has to create a new block, why would you create a new row-major block when we can just as easily create uncompressed HCC block? It is roughly the same amount of work but once the synthetic block is returned to the RDBMS for processing, columnar data is 10X cheaper to process when the query uses rowsets. SO where does the name of this stat come from? Simple, it works just like a spreadsheet PIVOT operation, swapping rows for columns.

So what is a “rowset”? It simply means that a step in a query plan will take in a set of rows before processing and then process that entire batch of rows in one go before fetching more rows; this is significantly more efficient than processing one row at a time. You can see when a query uses rowsets via:

select * from table(dbms_xplan.display_cursor('<sql_id>',1,'-note +projection'));

in which case you will see projection metadata with details such as “(rowsets=200)”.

When Smart Scan sees that the query on the RDBMS is using rowsets, it tries to create a columnar output block instead of a row-major output block and when this arrives on the RDBMS, the table scan will see an HCC block and bump the HCC stats even though the table being scanned doesn’t use HCC. Let’s see how this works in practice:

Continue reading→

Using INMEMORY with External Tables

Posted on May 4, 2017 by Roger MacNicol Posted in 12c, External tables, inmemory, oracle, SmartScan 1,588 Page views Leave a comment

Several people have asked if there is any way to use INMEMORY on External Tables because the INMEMORY syntax is not supported in Create Table and Alter Table with Organization External (actually Create Table parses it but then ignores it).

While there is no way out of having to load the data into Oracle 12.1.0.2, there is a short cut to simplify life when you have an external process recreating flat file data on a regular basis and want to query the latest incarnation against corporate data in Oracle Tablespaces. One example we’ve seen of this is where you have a map-reduce job on Hadoop that highly summarizes large amounts of ephemeral data and it is the summary data that needs to be joined against Oracle data.

Instead of truncating the Oracle table and doing a flat file load each time, we can use a Materialized View on the external table and simply issue an on-demand refresh and let the materialized view take care of the truncate and load under the covers for us.

Since INMEMORY does support Materialized Views, we can now automatically get INMEMORY query speeds on the data from an external source.

Let’s see how this works. For example here’s a loader table I use with TPC-H Scale 1 for local testing:

Continue reading→

More on tracing the offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,677 Page views Leave a comment

I posted a while back on how to use Tracing Hybrid Columnar Compression in an offload server so this is a quick follow up.

I have trouble remembering the syntax for setting a regular parameter in an offload server without bouncing it. Since I need to keep this written down somewhere I thought it might be use to support folks and dbas.
I forgot to show you how to specify which offload group to set the trace event

So this example should do both:

CellCLI > alter cell offloadGroupEvents = "immediate cellsrv.cellsrv_setparam('my_parameter, 'TRUE')", offloadGroupName = "SYS_122110_160621"

this will, of course, set a parameter temporarily until the next time the offload server is bounced, but also adding it to the offload group’s init.ora will take care of that.

Tracing Hybrid Columnar Compression in an offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,686 Page views 3 Comments

I had previously commented on how to use the FPLIB facility in a trace event but the question came up today of how to trace HCC in an offload server. The facility name in this case is ADVCMP (Advanced Compression) and the hierarchy is:

ADVCMP_MAIN
    ADVCMP_COMP
    ADVCMP_DECOMP

No compression occurs on the cell so we are only interested in the decompression tracing in an offload server.

So in this case the correct syntax is [facility.event] so:

cellcli -e 'alter cell offloadgroupEvents = "trace[advcmp.advcmp_decomp.*] disk=lowest"'

Using trace events with an offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,830 Page views 2 Comments

I’ve noticed several people who were familiar with using trace events with cellsrv, were uncertain about how to use tracing the new offload server architecture. So whereas in the past you could have added tracing to SmartScan processing with:

> cellcli -e 'alter cell events = "trace[fplib.sage_data] disk=lowest, memory=lowest"'

the new syntax to turn it on is:

> cellcli -e 'alter cell offloadgroupEvents = "trace[fplib.sage_data] disk=lowest, memory=lowest"'

and to turn it off:

> cellcli -e 'alter cell offloadgroupEvents = "trace[fplib.sage_data] off"'

Do not use anything higher than “disk=medium, memory=medium” since the higher levels of tracing will cause the heartbeat monitor to fail with timeout issues.

The SmartScan trace events available under the fplib (Filter Processing Library) facility are:

        fplib.fplib_main,  fplib.sage_cache,  fplib.sage_txn,  fplib.sage_data,  fplib.sage_index

In the cell trace directory hierarchy each offload server will have its own trace directories

diag/asm/cell/SYS_122010_150220/trace
diag/asm/cell/SYS_112331_141117/trace

In a future post I will cover how to use named offload groups to isolate the traces for a single session.

Roger MacNicol,

Data Storage Technology Group

What’s new in 12.2 CELLMEMORY Part 3

Posted on May 4, 2017 by Roger MacNicol Posted in 12.2, inmemory, oracle, SmartScan 1,601 Page views 1 Comment

The Cellmemory Stats in RDBMS

The RDBMS stats for Cellmemory are designed to closely follow the pattern used by the Inmemory stats

Query Stats

Each column in each one MB of disk blocks will be rewritten into one IMC format Column CU in flash and a set of Column CUs comprise an overall Compression Unit so these stats reflect the number of 1 MB rewrites that were processed (not the number of column CUs).

“cellmemory IM scan CUs processed for query”
– #1 MB chuncks scanned in MEMCOMPRESS FOR QUERY format
“cellmemory IM scan CUs processed for capacity”
– #1 MB chuncks scanned in MEMCOMPRESS FOR CAPACITY format
“cellmemory IM scan CUs processed no memcompress”
– #1 MB chuncks scanned in NO CELLMEMORY format (12.1.0.2 format)

Load Stats

“cellmemory IM load CUs for query”
– #1 MB chunks successfully rewritten from 12.1.0.2 to MEMCOMPRESS FOR QUERY format
“cellmemory IM load CUs for capacity”
– #1 MB chunks successfully rewritten from 12.1.0.2 to MEMCOMPRESS FOR CAPACITY format
“cellmemory IM load CUs no memcompress”
– #1 MB chunks successfully rewritten into 12.1.0.2 format

Before a rewrite happens a routine is called that looks through the blocks in the 1 MB chunk and determines if it is eligible for write. Reasons it may not be include transactional metadata from the commit cache, the presence of blocks formats that can’t be rewitten (although this list is getting smaller with each rpm), and the amount of space the rewrite will take up.

A rewrite into 12.1.0.2 format must fit in the original 1 MB of flash cache. An IMC format rewrite is not permitted to exceed 8 MB. This limit is highly unlikely to be reached by MEMCOMPRESS FOR CAPACITY but could be reached when trying to rewrite HCC blocks with much greater than 8X original compression capacity into MEMCOMPRESS FOR QUERY format. This is one reason that the default is FOR CAPACITY.

“cellmemory IM scan CUs rejected for query”
– #1 MB chunks that could not be rewritten into MEMCOMPRESS FOR QUERY for whatever reason
“cellmemory IM scan CUs rejected for capacity
– #1 MB chunks that could not be rewritten into MEMCOMPRESS FOR CAPACITY for whatever reason
“cellmemory IM scan CUs rejected no memcompress”
– #1 MB chunks that could not even be rewritten into 12.1.0.2 format for whatever reason

What’s new in 12.2 CELLMEMORY Part 2

Posted on May 4, 2017 by Roger MacNicol Posted in inmemory, oracle, SmartScan 1,643 Page views 1 Comment

Question: do I need to know anything in this blog post?

Answer: No, it is a true cache and CELLMEMORY works automatically

Question: so why should I read this blog post?

Answer: because you like to keep a toolkit of useful ways to control the system when needed

The DDL

The Exadata engineering team has done a lot of work to make the flash cache automatically handle a variety of very different workloads happening simultaneously which is why we now typically discourage users from specifying:

SQL> Alter Table T storage (cell_flash_cache keep);

and trust the AUTOKEEP pool to recognize table scans that would most benefit from caching and to cache them in a thrash resistant way. Our Best Practice for CELLMEMORY is typically not to use the DDL. That said, there will be cases when the DBA wishes to override the system’s default behavior and we will look at a couple of those reasons. But first here’s the DDL which is a very cut down portion of the INMEMORY syntax:

Alter Table T  [ [ NO ] CELLMEMORY [ MEMCOMPRESS FOR [ QUERY | CAPACITY ] [ LOW | HIGH ] ] ]

So let’s break this down piece by piece.

Examples

SQL> Alter Table T NO CELLMEMORY

The NO CELLMEMORY clause prevents a table being eligible for the rewrite from 12.1.0.2 format to 12.2 In-Memory format. There are a variety of reasons you may wish to do this and we’ll look at those at the end of this post.

SQL> Alter Table T CELLMEMORY
SQL> Alter Table T CELLMEMORY MEMCOMPRESS FOR CAPACITY

These allows a table to be cached in the default 12.2 In-Memory format (you’d only ever need this to undo a NO CELLMEMORYyou done earlier or to revert a change in the compression level used).

SQL> Alter Table T CELLMEMORY MEMCOMPRESS FOR QUERY

I mentioned above that CELLMEMORY, by default, undergoes two rounds of compression: first a round of semantic compression and secondly a round of bitwise compression using LZO is applied. This is the default to try and keep CELLMEMORY’s flash cache footprint as close as we can to HCC flash space usage but run queries much faster than HCC. But, even though LZO has a relatively low decompression cost it is not zero. There are some workloads which run faster with MEMCOMPRESS FOR QUERY but they also use typically twice as much flash space. It would not be appropriate for the system to automatically start using twice as much flash but if you wish to experiment with this, the option is there. Also compressing with LZO takes CPU time which is not needed with MEMCOMPRESS FOR QUERY.

What about [ LOW | HIGH ] ?

Currently, unlike INMEMORY, these are throw away words but we retained them in the grammar for future expansion and because people are used to specifying them in a MEMCOMPRESS clause. this means in effect:

CELLMEMORY’S MEMCOMPRESS FOR QUERY
is roughly equivalent of INMEMORY’S MEMCOMPRESS FOR QUERY HIGH
CELLMEMORY’S MEMCOMPRESS FOR CAPACITY
is roughly equivalent of INMEMORY’S MEMCOMPRESS FOR CAPACITY LOW

Why might I want to overrule the default?

There are a few reasons:

Turning CELLMEMORY off may be useful if you know an HCC table will only be queried a few times and then dropped or truncated soon and it is not worth the CPU time to create the In-Memory formats
Turning CELLMEMORY off may be useful if your cells are already under a lot of CPU pressure and you wish to avoid the CPU of creating the In-Memory formats
Switching to MEMCOMPRESS FOR QUERY may be useful to get better query performance on some workloads (YMMV) and reduce the CPU cost of creating In-Memory formats

What about INMEMORY with CELLMEMORY?

We allow both INMEMORY and CELLMEMORY on the same table

SQL> Create Table T (c1 number) INMEMORY NO CELLMEMORY;
SQL> Create Table T (c1 number) INMEMORY CELLMEMORY MEMCOMPRESS FOR QUERY;

Why would you want both INMEMORY and CELLMEMORY?

DBIM implements a priority system where the In-Memory area is first loaded with the critical tables and then finally down to the PRIORITY LOW tables. If you have a PRIORITY LOW table that is unlikely to get loaded in memory it is fine to also specify CELLMEMORY so as to still get columnar performance. Note: an HCC encoded INMEMORY table will still get automatic CELLMEMORY if you don’t use any DDL at all.

—
Roger

What’s new in 12.2 CELLMEMORY Part 1

Posted on May 4, 2017 by Roger MacNicol Posted in 12.2, inmemory, oracle, SmartScan 1,649 Page views 1 Comment

Many people know that in 12.1.0.2 we introduced a ground-breaking columnar cache that rewrote 1 MB chunks of HCC format blocks in the flash cache into pure columnar form in a way that allowed us to only do the I/O for the columns needed but also to recreate the original block when that
was required.

This showed up in stats as “cell physical IO bytes saved by columnar cache”.

But in 12.1.0.2 we had also introduced Database In-Memory (or DBIM) that rewrote heap blocks into pure columnar form in memory. That project introduced:

new columnar formats optimized for query performance
a new way of compiling predicates that supported better columnar execution
the ability to run predicates against columns using SIMD instructions which could execute the predicate against multiple rows simultaneously

so it made perfect sense to rework the columnar cache in 12.2 to take advantage of the new In-Memory optimizations.

Quick reminder of DBIM essentials

In 12.2, tables have to be marked manually for DBIM using the INMEMORY keyword:

SQL> Alter Table INMEMORY

When a scan on a table tagged as INMEMORY is queried the background process is notified to start loading it into the INMEMORY area. This design was adopted so that the user’s scans are not impeded by loading. Subsequent scans that come along will check what percentage is currently loaded in memory and make a rule based decision:

For Exadata

Greater than 80% populated, use In-Memory and the Buffer Cache for the rest
Less than 80% populated, use Smart Scan
The 80% cutoff between BC and DR is configurable with an undocumented underscore parameter
Note: if an In-Memory scan is selected even for partially populated, Smart Scan is not used

For non-Exadata

Greater than 80% populated, use In-Memory and the Buffer Cache for the rest
Less than 80% populated, use In-Memory and Direct Read for the rest
Note: this requires that segment to be check pointed first
The 80% cutoff between BC and DR is configurable with an undocumented underscore parameter

While DBIM can provides dramatic performance improvements it is limited by the amount of usable SGA on the system that can be set aside for the In-Memory area. After that performance becomes that of disk access to heap blocks from the flash cache or disk groups. What was needed was a way to increase the In-Memory area so that cooler segments could still benefit from the In-Memory formats without using valuable RAM which is often a highly constrained resource.

Cellmemory

Cellmemory works in a similar way to the 12.1.0.2 columnar cache in that 1 MB of HCC formatted blocks are cached in columnar form automatically without the DBA needing to be involved or do anything. This means columnar cache scans are cached after Smart Scan has processed the blocks rather than before as happens with ineligible scans. The 12.1.0.2 columnar cache format simply takes all the column-1 compression units (CUs) and stores them contiguously, then all the column-2 CUs stored contiguously etc. so that each column can be read directly from the cache without reading unneeded columns. This happens during Smart Scan so that the reformated blocks are returned to the cell server along with the query results for that 1 MB chunk.

In 12.2, eligible scans continue to be cached in the 12.1.0.2 format columnar cache format after Smart Scan has processed the blocks so that columnar disk I/O is available immediately. The difference is that if and only if the In-Memory area is in use on the RDBMS node (i.e. the In-Memory feature is already in use in that database), the address of the beginning of the columnar cache entry is added to a queue so that a background process running at a lower priority can read those cache entries and invoke the DBIM loader to rewrite the cache entry into In-Memory formatted data.

Unlike the columnar cache which has multiple column-1 CUs, the IMC format creates a single column using the new formats that use semantic compression and support SIMD etc on the compressed intermediate compressed data. By default a second round of compression using LZO is then applied. When 1 MB of HCC ZLIB compressed blocks are rewritten in this way they typically take around 1.2 MB (YMMV obviously).

Coming up

Up-coming blog entries will cover:

Overriding the default behaviour with DDL
New RDBMS stats for Cellmemory
New cellsrv stats for Cellmemory
Flash cache pool changes
Tracing Cellmemory