Oracle SQL | Category Archives: cell

Triaging Smart Scan

Posted on April 8, 2021 by Roger MacNicol Posted in adaptive serial direct path reads, cell_offload, oracle, SmartScan, trace 3,042 Page views Leave a comment

This document is my attempt to bring together the available options that can be used to determine the root cause of an issue in order to create a roadmap to help support engineers narrow down the cause of concern.

It is a living document and will be edited and amended as time goes by. Please do check back again in the future.

Warning: these parameters should only be used in conjunction with an Oracle Support Engineer and are not intended for DBAs to self-triage; also they should not be left set after triage without discussion with an Oracle Support Engineer.

Continue reading→

Serial Scans failing to offload

Posted on October 13, 2020 by Roger MacNicol Posted in adaptive serial direct path reads, cell_offload, oracle, SmartScan 1,924 Page views 1 Comment

Very Large Buffer Cache

We’ve observed databases with very large buffer caches where Serial Scans don’t make use of Smart Scan when that would have executed faster: improvements to the decision making for Serial Scans have been made under bug 31626438. This fix is back-portable.

A key difference between PQ and Serial is that as part of granule generation PQ sums the sizes of all the partitions that have not been pruned and passes that total size to the buffer cache decision making logic. Because the entire size to be scanned is considered, we make an accurate determination of smart scan benefits and the risk of cache thrashing.

Serial Scans on partitioned tables do not involved the coordinator and have no opportunity to get the larger picture, instead they start work immediately so each partition is considered one at a time and only that one partition’s size is considered by the decision for using Buffer Cache or Direct Read (and hence offload). In the presence of very large buffer caches any given partition can fail the “Is Medium” test (or even the “Is Small” test) and so not get offloaded.

In order to avoid this situation an upper bound of 100MB for using a buffer cache scan has been implemented for any serially scanned segment that:

isn’t using Automatic Big Table Caching (ABTC).
hasn’t had the Small Table parameter changed to a non-default value.

Any partitions larger than 100 MB will now automatically use Direct Read and hence offload on Exadata.

NSMTIO: kxfxghwm:[HWM_NOT_FOUND]

Another case to watch out for is when NSMTIO tracing shows HWM_NOT_FOUND and then choosing a Buffer Cache scan when a Direct Read offloaded scan would have been faster. This can happen when a PQ query gets executed serially (NB: this is NOT the downgrade to serial case, this is still PQ but on a single thread). In this case the coordinator again does not have the opportunity to process all the partitions and as part of that gather the High Water Mark (HWM) for each segment and checkpoint them so we fall back on buffer cache scans. A fix for this is currently being investigated.

Mixed Block Sizes

I have consistently advised against mixing block sizes in a database without a compelling reason backed up by empirical evidence, but for those who must the “Is Medium Table” logic for whether to use buffer cache or direct read has been improved when the database has more than one block size in use. This is tracked by bug 24655250 and fixed in 20.1.

Thoughts on Big Data SQL SmartScan

Posted on August 18, 2019 by Roger MacNicol Posted in cell_offload, External tables, oracle, SmartScan 1,681 Page views Leave a comment

Just a few thoughts I’ve been meaning to blog for a while:

1. Number of columns that can be offloaded

Exadata

Very early on in the Exadata Smart Scan implementation a restriction was added to prevent offloading more than 255 columns. This was done because of performance issues observed by customers. Smart Scan works on 1 MB chunks at a time and needs all the row pieces up to the highest #segcol referenced to be present in the 1 MB chunk that is being processed. When rows with more than 255 columns are first inserted all the row pieces are contiguous and hence likely to be present in the 1 MB chunk.

However, if the DBA has not provided sufficient freespace (PCTFREE) in the block when it is first loaded subsequent DMLs may end up splitting row pieces and moving pieces to blocks in the freelist that have more space available. PCTFREE specifies how much empty space to leave for future updates when a block is first populated.

Continue reading→

Create Quarantine

Posted on August 16, 2018 by Roger MacNicol Posted in cell_offload, oracle, SmartScan 1,771 Page views Leave a comment

First if you want don’t know what an Exadata Quarantine is read this.

Someone asked whether you can create your own Exadata Cell quarantine and, if you can, why you might ever want to do it?

The first step when you don’t know how to do something is try HELP in cellcli

CellCLI> HELP
...
ALTER QUARANTINE
...
CREATE QUARANTINE
...
DROP QUARANTINE
...
LIST QUARANTINE

So we see we can create a quarantine, so we use HELP again:

Continue reading→

Shining some light on Database In-Memory vs the Exadata Columnar Cache in 12.1.0.2

Posted on August 3, 2018 by Roger MacNicol Posted in cell_offload, inmemory, oracle, SmartScan, trace 1,595 Page views Leave a comment

I posted a while back on how to use Tracing Hybrid Columnar Compression in an offload server so this is a quick follow up.

I have trouble remembering the syntax for setting a regular parameter in an offload server without bouncing it. Since I need to keep this written down somewhere I thought it might be use to support folks and dbas.
I forgot to show you how to specify which offload group to set the trace event

So this example should do both:

CellCLI > alter cell offloadGroupEvents = "immediate cellsrv.cellsrv_setparam('my_parameter, 'TRUE')", offloadGroupName = "SYS_122110_160621"

this will, of course, set a parameter temporarily until the next time the offload server is bounced, but also adding it to the offload group’s init.ora will take care of that.

The beginners guide to Oracle Table Scans

Posted on August 2, 2017 by Roger MacNicol Posted in adaptive serial direct path reads, cell_offload, inmemory, oracle, SmartScan, TurboScan 1,973 Page views Leave a comment

I was asked a question yesterday that reminded me there are always people completely new to the topic who need an introduction – somewhere to start before the other articles make sense. So, here’s my brief write-up of everything you need to know about the basic of Oracle Table Scans.

Oracle has four main ways of scanning a table: the pre-9ir2 table scan, the 9ir2 TurboScan, the 11.1.0.1 Exadata SmartScan, and the 12.1.0.1 In-Memory Scan. Before we summarize each one, the other fundamental piece of information is the Oracle dictum that all blocks much be self-describing: a table scan routine should be able to inspect a block and understand what object it belongs, whether it needs an undo applying, and how the data is laid out without reference to any external structures or secondary storage.

The original table scan routine

Oracle uses a “dataflow” query engine which means a query plan is built from nodes like a sausage machine that have three basic operations: Open, Next, Close. ‘Open’ means you ask the next node in the chain to prepare to do some work including acquiring any resources it may need, ‘Next’ means you fetch one unit of work from your child e.g. a row, and ‘Close’ means to tell your child node to shut down and release any resources it may be holding. You build a query by connecting the right kinds of nodes together in the order you want: one node just sorts, another groups, another does hash joins. The end of the sausage machine is the node seen on query plans as “Table Access Full”

This node would ask the data layer to fetch a block from disk then get rows one at a time from the data layer. This is the work horse table scan: it can scan any kind of data and do SCN manipulations like row versions but it is not the fastest way to scan a table.

9ir2 TurboScan

In 9ir2 we introduced a much faster way of scanning tables called TurboScan. The data layer function which had been handing out rows one at a time was replaced by one that stays in a tight loop retrieving rows from disk and pushing them into a callback supplied by “Table Access Full”. An automation tool was used to generate several versions of this routine that optimized out common choices that has to be made: does the user need rowids to be projected? do they need predicates applying? is the data compressed or? is the data column-major or row-major? etc etc Every time a CPU reaches a branch in the code it tries to guess which side of the branch will be taken but if it guess wrong there can be a considerable stall during which no work gets done. By removing most of the branches, the code runs much much more quickly.

TurboScan is used for all queries which do not use RAW datatypes and which do not need special SCN processing.

Both pre-9ir2 scan and TurboScan can use the buffer cache to get blocks (typically small to medium tables) or use Direct Read to get blocks (typically medium to large tables).

See: When bloggers get it wrong – part 1

TurboScan can be disabled for triage purposes by setting:

SQL> alter session set events='12099 trace name context forever, level 1';

or specifically you can disable it only for HCC tables by setting:

SQL> alter session set "_arch_comp_dbg_scan"=1;

Exadata SmartScan

In 11.1.0.1 we introduced Exadata SmartScan intelligent storage. This is where a thin layer of database processing is embedded in the storage cells and the table scan routine offloads simple search criteria and a list of the columns it needs to storage and the storage cells pre-process the blocks to remove rows that fail the search criteria and remove columns which are not needed by the table scan. If all the rows are removed, the block doesn’t have to be sent back at all.

SmartScan can drastically reduce the amount of data returned on the Interconnect and put on the RDBMS memory bus and the space used in SGA by the returned data. An additional significant benefit is gained when the CPU fetches the reduced blocks into the CPU cache since only relevant information exists on the block there is not space wasted by unwanted columns interspersing the wanted columns meaning more relevant data can fit in memory and the CPU prefetch can do a better job of predicting which memory cache line to fetch next.

Only TurboScan Direct Read scans can use this offload capability. You can disable SmartScan for triage purposes by setting:

SQL> alter session set cell_offload_processing=FALSE;

SQL> select /*+ opt_param('cell_offload_processing','false') */  <col> from <tab> where <predicate>;

In-Memory Scans

In-Memory scans were introduced in 12.1.0.1 and brought a revolutionary increase in table scan speeds. With In-Memory scans the table or partition is loaded into a in-memory tablespace in SGA known as the inmemory-area. Data is stored in compressed columnar format typically up to 500,000 values in each columnar compression unit. This tablespace is kept transactionally consistent with the data on disk via means of an invalidation bitmap.

Just like with SmartScan, only TurboScan can use In-Memory scans with In-Memory objects. Instead of getting a block from disk, the specialized version of the scan routines fetches a column run from each column of interest, process the search criteria, then returns column runs with the failing rows removed to the “Table Access Full” node.

If any rows have been modified and committed by other users or the users own transaction has modified any rows the scan will see these rows set in the invalidation bitmap. These rows are removed from the columnar results and the additional rows required are fetched from the buffer cache before moving on to the next set of column runs. This works well because the most recently modified blocks are the ones most likely to still be in the buffer cache.

Working around heatbeat issues caused by tracing or by regexp

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,615 Page views Leave a comment

I had noted in my first post that using the highest level of tracing caused timeout issues with the offload server heartbeat monitor. Heartbeat issues can also occur with expensive (and badly formed) regexp expressions. By default the heartbeat monitor is set to 6 seconds which is the maximum permitted to process 1MB data in the offload server and mark the task completed and is far more time than is reasonably expected to take.

Operations such as expensive tracing to disk or badly formed regexp expressions that cause that time period to be exceeded lead to this in the alert log:

State dump signal delivered to CELLOFLSRV&lt;10180> by pid - 9860, uid - 3318
Thu Mar  5 12:26:31 2015 561 msec State dump completed for CELLOFLSRV&lt;10180>
Clean shutdown signal delivered to CELLOFLSRV&lt;10180> by pid - 9860, uid - 3318
CELLOFLSRV &lt;10180> is exiting with code 1

where the restart server bounces the offload server to clear the perceived hang. Increasing the timeout via:

CellCLI> alter cell events = "immediate cellsrv.cellsrv_setparam('_cell_oflsrv_heartbeat_timeout_sec','60')"

enables the tracing to proceed without causing the restart server.

My point in writing this entry was to provide a work-around when tracing is needed but also to address a couple of blog posts I’d seen that recommend leaving it set at 60 or 90 seconds. This is not a good idea. The heartbeat exists to catch genuine but rare issues and leaving this set to an increased value will hinder the offload server restarting quickly to resume work. This is one parameter that shoud be reset to the default when the work-around is no longer needed unless otherwise directed by support.

Roger MacNicol

More on tracing the offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,728 Page views Leave a comment

I posted a while back on how to use Tracing Hybrid Columnar Compression in an offload server so this is a quick follow up.

I have trouble remembering the syntax for setting a regular parameter in an offload server without bouncing it. Since I need to keep this written down somewhere I thought it might be use to support folks and dbas.
I forgot to show you how to specify which offload group to set the trace event

So this example should do both:

CellCLI > alter cell offloadGroupEvents = "immediate cellsrv.cellsrv_setparam('my_parameter, 'TRUE')", offloadGroupName = "SYS_122110_160621"

this will, of course, set a parameter temporarily until the next time the offload server is bounced, but also adding it to the offload group’s init.ora will take care of that.

Tracing Hybrid Columnar Compression in an offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,752 Page views 3 Comments

I had previously commented on how to use the FPLIB facility in a trace event but the question came up today of how to trace HCC in an offload server. The facility name in this case is ADVCMP (Advanced Compression) and the hierarchy is:

ADVCMP_MAIN
    ADVCMP_COMP
    ADVCMP_DECOMP

No compression occurs on the cell so we are only interested in the decompression tracing in an offload server.

So in this case the correct syntax is [facility.event] so:

cellcli -e 'alter cell offloadgroupEvents = "trace[advcmp.advcmp_decomp.*] disk=lowest"'

Using trace events with an offload server

Posted on May 4, 2017 by Roger MacNicol Posted in cell_offload, oracle, SmartScan, trace 1,919 Page views 2 Comments

I’ve noticed several people who were familiar with using trace events with cellsrv, were uncertain about how to use tracing the new offload server architecture. So whereas in the past you could have added tracing to SmartScan processing with:

> cellcli -e 'alter cell events = "trace[fplib.sage_data] disk=lowest, memory=lowest"'

the new syntax to turn it on is:

> cellcli -e 'alter cell offloadgroupEvents = "trace[fplib.sage_data] disk=lowest, memory=lowest"'

and to turn it off:

> cellcli -e 'alter cell offloadgroupEvents = "trace[fplib.sage_data] off"'

Do not use anything higher than “disk=medium, memory=medium” since the higher levels of tracing will cause the heartbeat monitor to fail with timeout issues.

The SmartScan trace events available under the fplib (Filter Processing Library) facility are:

        fplib.fplib_main,  fplib.sage_cache,  fplib.sage_txn,  fplib.sage_data,  fplib.sage_index

In the cell trace directory hierarchy each offload server will have its own trace directories

diag/asm/cell/SYS_122010_150220/trace
diag/asm/cell/SYS_112331_141117/trace

In a future post I will cover how to use named offload groups to isolate the traces for a single session.

Roger MacNicol,

Data Storage Technology Group