Oracle SQL | Tag Archives: HCC

DMLs and the Columnar Cache on ADW

Posted on August 23, 2019 by Roger MacNicol Posted in 12.2, inmemory, oracle 2,013 Page views Leave a comment

One of the main performance technologies underlying ADW is the In-Memory format column cache and a question that has come up several times is: how does the columnar cache handle DMLs and cache invalidations. This blog post attempts to answer that question.

The Two Columnar Cache Formats

The original columnar cache was an idempotent rearrangement of Hybrid Columnar Compressed blocks into pure columnar form still in the original HCC encoding. This is known internally as “CC1” and was not applicable to row format blocks. Because this is an idempotent transformation we are able to reconstitute the original raw blocks from a CC1 cache if needed.

We then introduced the In-Memory format columnar cache where, if you had an In-Memory licence, we would run the 1 MB chunks processed by Smart Scan through the In-Memory loader and write new clean columns encoded in In-Memory formats which meant that we could then use the new SIMD based predicate evaluation and other performance improvements developed for Database In-Memory. If you do not have an In-Memory licence, the original CC1 column cache is used. Another advantage of the CC2 format is that we can load row format blocks as well as HCC format blocks into pure columnar In-Memory format.

Continue reading→

Compression in a well-balanced system

Posted on August 23, 2018 by Roger MacNicol Posted in oracle, SmartScan 1,791 Page views Leave a comment

Exadata is designed to present a well-balanced system between disk scan rates, cell CPU processing rates, network bandwidth, and RDBMS CPU such that on average across a lot of workloads and a lot of applications there should not be no one obvious bottleneck. Good performance engineering means avoiding a design like this:

because that would be nuts. If you change one component in a car e.g. the engine you need to rework the other components such as the chassis and the brakes. When we rev the Exadata hardware we go through the same exercise to make sure the components match to avoid bottlenecks.

One of the components in a well-balanced system that people often leave off this list is compression algorithm. With compression you are balancing storage costs as well as the disk scan rates against the CPU’s decompression rate. In a disk bound system you can alleviate the bottleneck by compressing the data down to the point that it is being scanned at the same rate that the CPUs can decompress and process it. Squeeze it too far, say with HCC “Compress For Archive High” which uses BZ2, the CPUs become a major bottleneck because the decompression costs unbalance the system (but it’s wonderful if you think you’re unlikely to ever need the data again).

Continue reading→

Random thoughts on block sizes

Posted on November 9, 2017 by Roger MacNicol Posted in oracle, SmartScan 1,697 Page views 1 Comment

I heard “Oracle only tests on 8k and doesn’t really test 16k”

I heard someone assert that one reason you should only use 8k block sizes is that, and I quote, “Oracle only tests on 8k and doesn’t really test 16k”. I tried googling that rumour and tracked it back to AskTom. Also as disks and memory get bigger and CPUs get faster it is natural to ask if 8k is now “too small”.

So here are my thoughts:

1. A quick scan of the data layer regression tests showed a very large number running on 16k blocks

2. Oracle typically runs it DW stress tests on 16k blocks

So, clearly, the assertion is untrue but I did spot some areas where 32k testing could be improved. I also added a note to AskTom clarifying testing on 16k.

Does this mean I should look at using 16k blocks for me DW

Whoa, not so fast. Just because table scans typically run slightly faster on 16k blocks and compression algorithms typically get slightly better compression on 16k blocks does not mean your application will see improvements

1. Multiple block sizes mean added work for the DBA

2. Databases do a lot more than just scan tables – e.g. row based writes and reads could perform worse

3. Most apps have low hanging fruit that will give you far better ROI on your time than worrying about block sizes (if you don’t believe me, attend one of Jonathan Lewis’s excellent index talks).

Most applications run fine on 8k because it is a good trade off between different access paths and, in general, 8k is still the right choice for most applications.

What about ‘N’ row pieces

In general Oracle’s block layout tries to ensure row pieces are split on column boundaries but in the case of very wide columns we will split in the middle of a column if too much space would be wasted by aligning with column boundaries. When a column is split in the middle it creates what is known as an ‘N’ row piece.

Rows are split by default at 255 column boundaries assuming the row piece fits in the block If you have a table with very wide rows or some very wide inline columns, smaller block sizes will result both in rows being split more often and in columns being split in the middle. At a minimum the number of block gets to retrieve a single row will likely increase. This is one case to consider where a 16k block size may be worth investigating.

The curious case of 70X compression

We had case of a customer legitimately point out that ZLIB would give much better than 70X compression on his very repetitive data but COMPRESS FOR QUERY HIGH was stuck at 70X. And, as it turns out, this is a factor of block size.

Hybrid Columnar Compression (HCC) takes up to 8 MB of raw data and compresses it down to one CU (Compression Unit) subject to row limits depending on compression type. For example COMPRESS FOR ARCHIVE LOW (which is also ZLIB) is limited to 8,000 rows in a CU. But there is another limit which you may deduce from block dumps which is that HCC will only allow one head piece row per block because HCC rowids are the RDBA of the head piece plus the row offset into the CU. So if you have incredibly compressible data where 8,000 rows compress down to less 16k moving to a larger block size can be bad news because you will end up with wasted space at the end of each block.

Summary

There are typically better things to spend time on than worrying about block size but if you wish to run your DW on 16k blocks they are thoroughly tested just like 8k blocks are.

Why you dont want to set _partition_large_extents FALSE

Posted on May 4, 2017 by Roger MacNicol Posted in oracle, SmartScan, undocumented 2,189 Page views Leave a comment

I’ve seen some blogs recommending that _partition_large_extents be set to FALSE for a variety of space conserving reasons without the authors thinking about the negative impact this is going to have on Smart Scan. Large Extents cause an INITIAL allocation of 8 MB and a NEXT allocation of 1 MB and they have been the default for table spaces on Exadata since 11.2.0.2. You can verify that large extents are in use by a given table or partition by:

Select segment_flags 
From sys_dba_segs 
where segment_name = <table_name> 
and owner = <schema_name>;

The segment flag bit for large extents is 0x40000000.

This pattern of allocation is design to work optimally with Smart Scan because Smart Scan intrinsically works in 1 MB chunks. Reads of ASM allocation units are split into maximum 1 MB chunks to be passed to the filter processing library to have their blocks sliced and diced to create the synthetic blocks that contain only the rows and columns of interest to return to the table scan driver. When less than 1 MB gets allocated at a time to a segment and then the next contiguous blocks gets allocated to a different segment, each separate run of blocks will be read by a different MBR. Each run will be passed separately to Smart Scan and we get sub-optimal chunks to work on increasing both the overhead of processing and the number of round trips needed to process the table scan. The design of Smart Scan is predicated on scooping up contiguous runs of data from disk for efficient processing.

This matters particularly for HCC data and for chained rows.

Continue reading→

Why are HCC stats being bumped when Smart Scanning row major data in 12.2

Posted on May 4, 2017 by Roger MacNicol Posted in oracle, SmartScan 1,624 Page views Leave a comment

In 12.2, there is a stat “cell blocks pivoted” that points to a new optimization. When Smart Scan processes data, it has to create a new synthetic block that only contains the columns that are needed and the rows that pass the offloaded predicates.

If Smart Scan has to create a new block, why would you create a new row-major block when we can just as easily create uncompressed HCC block? It is roughly the same amount of work but once the synthetic block is returned to the RDBMS for processing, columnar data is 10X cheaper to process when the query uses rowsets. SO where does the name of this stat come from? Simple, it works just like a spreadsheet PIVOT operation, swapping rows for columns.

So what is a “rowset”? It simply means that a step in a query plan will take in a set of rows before processing and then process that entire batch of rows in one go before fetching more rows; this is significantly more efficient than processing one row at a time. You can see when a query uses rowsets via:

select * from table(dbms_xplan.display_cursor('<sql_id>',1,'-note +projection'));

in which case you will see projection metadata with details such as “(rowsets=200)”.

When Smart Scan sees that the query on the RDBMS is using rowsets, it tries to create a columnar output block instead of a row-major output block and when this arrives on the RDBMS, the table scan will see an HCC block and bump the HCC stats even though the table being scanned doesn’t use HCC. Let’s see how this works in practice:

Continue reading→

More on tracing the offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,687 Page views Leave a comment

I posted a while back on how to use Tracing Hybrid Columnar Compression in an offload server so this is a quick follow up.

I have trouble remembering the syntax for setting a regular parameter in an offload server without bouncing it. Since I need to keep this written down somewhere I thought it might be use to support folks and dbas.
I forgot to show you how to specify which offload group to set the trace event

So this example should do both:

CellCLI > alter cell offloadGroupEvents = "immediate cellsrv.cellsrv_setparam('my_parameter, 'TRUE')", offloadGroupName = "SYS_122110_160621"

this will, of course, set a parameter temporarily until the next time the offload server is bounced, but also adding it to the offload group’s init.ora will take care of that.

What’s new in 12.2 CELLMEMORY Part 2

Posted on May 4, 2017 by Roger MacNicol Posted in inmemory, oracle, SmartScan 1,675 Page views 1 Comment

Question: do I need to know anything in this blog post?

Answer: No, it is a true cache and CELLMEMORY works automatically

Question: so why should I read this blog post?

Answer: because you like to keep a toolkit of useful ways to control the system when needed

The DDL

The Exadata engineering team has done a lot of work to make the flash cache automatically handle a variety of very different workloads happening simultaneously which is why we now typically discourage users from specifying:

SQL> Alter Table T storage (cell_flash_cache keep);

and trust the AUTOKEEP pool to recognize table scans that would most benefit from caching and to cache them in a thrash resistant way. Our Best Practice for CELLMEMORY is typically not to use the DDL. That said, there will be cases when the DBA wishes to override the system’s default behavior and we will look at a couple of those reasons. But first here’s the DDL which is a very cut down portion of the INMEMORY syntax:

Alter Table T  [ [ NO ] CELLMEMORY [ MEMCOMPRESS FOR [ QUERY | CAPACITY ] [ LOW | HIGH ] ] ]

So let’s break this down piece by piece.

Examples

SQL> Alter Table T NO CELLMEMORY

The NO CELLMEMORY clause prevents a table being eligible for the rewrite from 12.1.0.2 format to 12.2 In-Memory format. There are a variety of reasons you may wish to do this and we’ll look at those at the end of this post.

SQL> Alter Table T CELLMEMORY
SQL> Alter Table T CELLMEMORY MEMCOMPRESS FOR CAPACITY

These allows a table to be cached in the default 12.2 In-Memory format (you’d only ever need this to undo a NO CELLMEMORYyou done earlier or to revert a change in the compression level used).

SQL> Alter Table T CELLMEMORY MEMCOMPRESS FOR QUERY

I mentioned above that CELLMEMORY, by default, undergoes two rounds of compression: first a round of semantic compression and secondly a round of bitwise compression using LZO is applied. This is the default to try and keep CELLMEMORY’s flash cache footprint as close as we can to HCC flash space usage but run queries much faster than HCC. But, even though LZO has a relatively low decompression cost it is not zero. There are some workloads which run faster with MEMCOMPRESS FOR QUERY but they also use typically twice as much flash space. It would not be appropriate for the system to automatically start using twice as much flash but if you wish to experiment with this, the option is there. Also compressing with LZO takes CPU time which is not needed with MEMCOMPRESS FOR QUERY.

What about [ LOW | HIGH ] ?

Currently, unlike INMEMORY, these are throw away words but we retained them in the grammar for future expansion and because people are used to specifying them in a MEMCOMPRESS clause. this means in effect:

CELLMEMORY’S MEMCOMPRESS FOR QUERY
is roughly equivalent of INMEMORY’S MEMCOMPRESS FOR QUERY HIGH
CELLMEMORY’S MEMCOMPRESS FOR CAPACITY
is roughly equivalent of INMEMORY’S MEMCOMPRESS FOR CAPACITY LOW

Why might I want to overrule the default?

There are a few reasons:

Turning CELLMEMORY off may be useful if you know an HCC table will only be queried a few times and then dropped or truncated soon and it is not worth the CPU time to create the In-Memory formats
Turning CELLMEMORY off may be useful if your cells are already under a lot of CPU pressure and you wish to avoid the CPU of creating the In-Memory formats
Switching to MEMCOMPRESS FOR QUERY may be useful to get better query performance on some workloads (YMMV) and reduce the CPU cost of creating In-Memory formats

What about INMEMORY with CELLMEMORY?

We allow both INMEMORY and CELLMEMORY on the same table

SQL> Create Table T (c1 number) INMEMORY NO CELLMEMORY;
SQL> Create Table T (c1 number) INMEMORY CELLMEMORY MEMCOMPRESS FOR QUERY;

Why would you want both INMEMORY and CELLMEMORY?

DBIM implements a priority system where the In-Memory area is first loaded with the critical tables and then finally down to the PRIORITY LOW tables. If you have a PRIORITY LOW table that is unlikely to get loaded in memory it is fine to also specify CELLMEMORY so as to still get columnar performance. Note: an HCC encoded INMEMORY table will still get automatic CELLMEMORY if you don’t use any DDL at all.

—
Roger

What’s new in 12.2 CELLMEMORY Part 1

Posted on May 4, 2017 by Roger MacNicol Posted in 12.2, inmemory, oracle, SmartScan 1,672 Page views 1 Comment

Many people know that in 12.1.0.2 we introduced a ground-breaking columnar cache that rewrote 1 MB chunks of HCC format blocks in the flash cache into pure columnar form in a way that allowed us to only do the I/O for the columns needed but also to recreate the original block when that
was required.

This showed up in stats as “cell physical IO bytes saved by columnar cache”.

But in 12.1.0.2 we had also introduced Database In-Memory (or DBIM) that rewrote heap blocks into pure columnar form in memory. That project introduced:

new columnar formats optimized for query performance
a new way of compiling predicates that supported better columnar execution
the ability to run predicates against columns using SIMD instructions which could execute the predicate against multiple rows simultaneously

so it made perfect sense to rework the columnar cache in 12.2 to take advantage of the new In-Memory optimizations.

Quick reminder of DBIM essentials

In 12.2, tables have to be marked manually for DBIM using the INMEMORY keyword:

SQL> Alter Table INMEMORY

When a scan on a table tagged as INMEMORY is queried the background process is notified to start loading it into the INMEMORY area. This design was adopted so that the user’s scans are not impeded by loading. Subsequent scans that come along will check what percentage is currently loaded in memory and make a rule based decision:

For Exadata

Greater than 80% populated, use In-Memory and the Buffer Cache for the rest
Less than 80% populated, use Smart Scan
The 80% cutoff between BC and DR is configurable with an undocumented underscore parameter
Note: if an In-Memory scan is selected even for partially populated, Smart Scan is not used

For non-Exadata

Greater than 80% populated, use In-Memory and the Buffer Cache for the rest
Less than 80% populated, use In-Memory and Direct Read for the rest
Note: this requires that segment to be check pointed first
The 80% cutoff between BC and DR is configurable with an undocumented underscore parameter

While DBIM can provides dramatic performance improvements it is limited by the amount of usable SGA on the system that can be set aside for the In-Memory area. After that performance becomes that of disk access to heap blocks from the flash cache or disk groups. What was needed was a way to increase the In-Memory area so that cooler segments could still benefit from the In-Memory formats without using valuable RAM which is often a highly constrained resource.

Cellmemory

Cellmemory works in a similar way to the 12.1.0.2 columnar cache in that 1 MB of HCC formatted blocks are cached in columnar form automatically without the DBA needing to be involved or do anything. This means columnar cache scans are cached after Smart Scan has processed the blocks rather than before as happens with ineligible scans. The 12.1.0.2 columnar cache format simply takes all the column-1 compression units (CUs) and stores them contiguously, then all the column-2 CUs stored contiguously etc. so that each column can be read directly from the cache without reading unneeded columns. This happens during Smart Scan so that the reformated blocks are returned to the cell server along with the query results for that 1 MB chunk.

In 12.2, eligible scans continue to be cached in the 12.1.0.2 format columnar cache format after Smart Scan has processed the blocks so that columnar disk I/O is available immediately. The difference is that if and only if the In-Memory area is in use on the RDBMS node (i.e. the In-Memory feature is already in use in that database), the address of the beginning of the columnar cache entry is added to a queue so that a background process running at a lower priority can read those cache entries and invoke the DBIM loader to rewrite the cache entry into In-Memory formatted data.

Unlike the columnar cache which has multiple column-1 CUs, the IMC format creates a single column using the new formats that use semantic compression and support SIMD etc on the compressed intermediate compressed data. By default a second round of compression using LZO is then applied. When 1 MB of HCC ZLIB compressed blocks are rewritten in this way they typically take around 1.2 MB (YMMV obviously).

Coming up

Up-coming blog entries will cover:

Overriding the default behaviour with DDL
New RDBMS stats for Cellmemory
New cellsrv stats for Cellmemory
Flash cache pool changes
Tracing Cellmemory

Improvements to HCC with wide tables in 12.2

Posted on May 4, 2017 by Roger MacNicol Posted in oracle, SmartScan 1,742 Page views Leave a comment

HCC Compression Unit Sizing

Since the beginning Oracle has provided four compression levels to offer a trade-off between the compression ratio and various other factors including table scans and the performance of single-row retrieval. I can not emphasize enough that the various trade offs mean that YMMV with the different levels and you should always test what works best with your application and hardware when choosing the level. Historically people have rarely used Query Low since the fast compression with reduced compression ratio means that the extra disk I/O is slower than the cost of decompression with Query High. The one time that Query Low makes sense on spinning disks is if you still have a significant number of row retrieval operations (including from index access joins).

NMVe Flash

X5 introduced NVMe technology which means that the extra I/O from Query Low is faster than ZLIB decompression which makes Query Low beneficial. So we needed to reassess the sizing of Compression Units. from 11.2.0.1 to 12.1.2.4 the sizing guidelines are as follows:

Continue reading→