Oracle SQL | Tag Archives: query optimization

Partition Pruning and Global Indexes

Posted on March 17, 2025 by Sayan Malakshinov Posted in oracle, partition pruning, query optimizing, SQL, troubleshooting 1,107 Page views Leave a comment

There is a common misconception that partition pruning does not help in the case of global indexes and only works with local indexes or full table scans (FTS).

It is understandable how this misconception arose: indeed, when operations like PARTITION RANGE ITERATOR, PARTITION RANGE SINGLE, etc., appear in execution plans, partition pruning becomes strongly associated with local indexes and FTS.

It is also clear why this is the most noticeable case: the exclusion of partitions in PARTITION RANGE ITERATOR operations is hard to miss, especially since there is a dedicated line for it in the execution plan.

However, this is not all that partition pruning can do. In fact, this way of thinking is not entirely valid, and I will demonstrate this with some simple examples.

Continue reading→

Interval Search: Part 4. Dynamic Range Segmentation – interval quantization

Posted on December 20, 2024 by Sayan Malakshinov Posted in DB architecture, interval search, oracle, query optimizing, SQL 1,960 Page views Leave a comment

Forums, mailing lists, and StackOverflow are all great resources for Oracle performance discussions, but I’ve long thought it would be useful to have a dedicated online chat/group specifically for Oracle performance specialists. A place to share news, articles, and discuss performance issues. To test the waters, I’ve created a group: https://t.me/ora_perf. If you’re interested, feel free to join! Let’s build a central hub for Oracle performance discussions.

Before diving into the main topic, let me address a frequent question I’ve received regarding the earlier parts of this series:
“You’re focusing on the rare case of date-only indexes (begin_date, end_date), but most real-world scenarios involve composite indexes with an ID field, like (id, begin_date, end_date).“

Yes, it’s true that in practice, composite indexes with an ID field are more common. And exactly such scenarios was the reason of this series. However, I intentionally started with a simplified case to focus on the date filtering mechanics. All the issues, observations, conclusions, and solutions discussed so far are equally applicable to composite indexes.

For example, many production databases have identifiers that reference tens or even hundreds of thousands of intervals. The addition of an ID-based access predicate may reduce the scanned volume for a single query, but the underlying date range filtering issues remain. These inefficiencies often go unnoticed because people don’t realize their simple queries are doing tens of LIOs when they could be doing just 3-5, with response times reduced from 100 microseconds to 2 microseconds.

Even if your queries always use an equality predicate on the ID field, you’ll still encounter challenges with huge queries with joins, such as:

select *
from IDs
    join tab_with_history h
       on IDs.id = h.id
       and :dt between h.beg_date and h.end_date

Here, lookups for each ID against the composite index can become inefficient at scale compared to retrieving a pre-filtered slice for the target date.

To clarify, everything discussed in this series applies to composite indexes as well. The solutions can easily be extended to include ID fields by modifying just a few lines of code. Let’s now move to the main topic.

Dynamic Range Segmentation – Interval Quantization

In the earlier parts, you may have noticed a skew in my test data, with many intervals of 30 days generated for every hour. This naturally leads to the idea of reducing scan volume by splitting long intervals into smaller sub-intervals.

What is Interval Quantization?

Interval quantization is a known solution for this problem, but it often comes with drawbacks. Traditional quantization requires selecting a single fixed unit (e.g., 1 minute), which may not suit all scenarios. Using a small unit to cover all cases can lead to an explosion in the number of rows.

However, since Dynamic Range Segmentation (DRS) already handles short intervals efficiently, we can focus on quantizing only long intervals. For this example, we’ll:

Leave intervals of up to 1 hour as-is, partitioning them into two categories: up to 15 minutes and up to 1 hour.
Split longer intervals into sub-intervals of 1 day.

Creating a SQL Macro for Interval Splitting

To simplify the splitting of long intervals, we’ll write a SQL Macro:

create or replace function split_interval_by_days(beg_date date, end_date date)
  return varchar2 sql_macro
is
begin
  return q'{
     select/*+ no_decorrelate */
        case 
          when n = 1 
             then beg_date 
          else trunc(beg_date)+n-1
        end as sub_beg_date
       ,case
          when n<=trunc(end_date)-trunc(beg_date)
            then trunc(beg_date)+n -1/24/60/60
          else end_date
        end as sub_end_date
     from (select/*+ no_merge */ level n
           from dual
           connect by level<=trunc(end_date)-trunc(beg_date)+1
          )
  }';
end;
/

Source on github: https://github.com/xtender/xt_scripts/blob/master/blog/1.interval_search/drs.v2/split_interval_by_days.sql

This macro returns sub-intervals for any given range:

SQL> select * from split_interval_by_days(sysdate-3, sysdate);

SUB_BEG_DATE        SUB_END_DATE
------------------- -------------------
2024-12-17 02:30:34 2024-12-17 23:59:59
2024-12-18 00:00:00 2024-12-18 23:59:59
2024-12-19 00:00:00 2024-12-19 23:59:59
2024-12-20 00:00:00 2024-12-20 02:30:34

Updating the Domain Index

1. Partitioning Changes in `ODCIIndexCreate_pr`

We’ll modify the partitioning structure:

        partition by range(DURATION_MINUTES)
        (
            partition part_15_min   values less than (15)
           ,partition part_1_hour   values less than (60)
           ,partition part_1_day    values less than (1440)  --40*24*60
        )

2. Initial Data Population

We’ll use the SQL Macro to populate the index table with split intervals:

        -- Now populate the table.
        stmt2 := q'[INSERT INTO {index_tab_name} ( beg_date, end_date, rid )
            SELECT SUB_BEG_DATE as beg_date 
                  ,SUB_END_DATE as end_date 
                  ,P.rowid
            FROM "{owner}"."{tab_name}" P
            , split_interval_by_days(
                to_date(substr(P.{col_name}, 1,19),'YYYY-MM-DD HH24:MI:SS')
               ,to_date(substr(P.{col_name},21,19),'YYYY-MM-DD HH24:MI:SS')
               )
            ]';

3. Updating `ODCIIndexInsert_pr`

  procedure ODCIIndexInsert_pr(
        ia       sys.ODCIIndexInfo,
        rid      VARCHAR2,
        newval   VARCHAR2,
        env      sys.ODCIEnv
    )
  IS
  BEGIN
        -- Insert into auxiliary table
        execute immediate 
           'INSERT INTO '|| get_index_tab_name(ia)||' (rid, beg_date, end_date)'
         ||'select 
             :rid, sub_beg_date, sub_end_date
            from split_interval_by_days(:beg_date, :end_date)'
           using rid,get_beg_date(newval),get_end_date(newval);
  END;

4. Query Generation in `ODCIIndexStart_Pr`

Update the SQL statement to account for the new partitions:

      stmt := q'{
        select rid from {tab_name} partition (part_15_min) p1
        where :cmpval between beg_date and end_date
          and end_date < :cmpval+interval'15'minute
        union all
        select rid from {tab_name} partition (part_1_hour) p2
        where :cmpval between beg_date and end_date
          and end_date < :cmpval+1/24
        union all
        select rid from {tab_name} partition (part_1_day ) p3
        where :cmpval between beg_date and end_date
          and end_date < :cmpval+1
        }';

Full code: https://github.com/xtender/xt_scripts/blob/master/blog/1.interval_search/drs.v2/2.idx_range_date_pkg.v2.pck

Results

SQL> select count(*) from test_table where DATE_IN_RANGE(virt_date_range,date'2012-02-01')=1;

  COUNT(*)
----------
       943
SQL> @last

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------
SQL_ID  17wncu9ftfzf6, child number 0
-------------------------------------
select count(*) from test_table where
DATE_IN_RANGE(virt_date_range,date'2012-02-01')=1

Plan hash value: 2131856123

-------------------------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Starts | E-Rows | Cost  | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |      1 |        |  9218 |      1 |00:00:00.01 |      15 |
|   1 |  SORT AGGREGATE  |                  |      1 |      1 |       |      1 |00:00:00.01 |      15 |
|*  2 |   DOMAIN INDEX   | TEST_RANGE_INDEX |      1 |        |       |    943 |00:00:00.01 |      15 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("XTENDER"."DATE_IN_RANGE"("VIRT_DATE_RANGE",TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))=1)

So, by applying quantization with Dynamic Range Segmentation, we reduced the number of logical reads from 30 (in the simpler version) to 15—a 2x improvement.

Conclusion

In this example, we used partitions for 15 minutes, 1 hour, and 1 day for simplicity. In practice, optimal values will depend on the actual data. While the number of rows in the index increases, the fixed maximum interval length ensures consistently efficient results.

All posts in this series:

Interval Search: Part 3. Dynamic Range Segmentation – Custom Domain Index

Posted on December 19, 2024 by Sayan Malakshinov Posted in CBO, DB architecture, interval search, oracle, query optimizing, SQL, troubleshooting 1,855 Page views 2 Comments

In this part, I’ll show how to implement Dynamic Range Segmentation (DRS) explained in the previous part using a custom Domain Index, allowing you to apply this optimization with minimal changes to your existing tables.

1. Creating the Function and Operator

First, we create a function that will be used to define the operator for the domain index:

CREATE OR REPLACE FUNCTION F_DATE_IN_RANGE(date_range varchar2, cmpval date)
RETURN NUMBER deterministic
AS
BEGIN
  -- simple concatenation: beg_date;end_date
  -- in format YYYY-MM-DD HH24:MI:SS
  if cmpval between to_date(substr(date_range, 1,19),'YYYY-MM-DD HH24:MI:SS')
                and to_date(substr(date_range,21,19),'YYYY-MM-DD HH24:MI:SS')
  then
    return 1;
  else
    return 0;
  end if;
END;
/

Next, we create the operator to use this function:

CREATE OPERATOR DATE_IN_RANGE BINDING(VARCHAR2, DATE)
  RETURN NUMBER USING F_DATE_IN_RANGE;
/

2. Creating the `idx_range_date_pkg` Package

We define a package (idx_range_date_pkg) that contains the necessary procedures to manage the domain index. The full implementation is too lengthy to include here but is available on GitHub.

3. Creating the `idx_range_date_type`

The type idx_range_date_type implements the ODCI extensible indexing interface, which handles operations for the domain index.
The code is available on GitHub.

Key Features of `idx_range_date_type`

Internal Data Segmentation:
The type and package create and maintain an internal table of segmented data. For example, the procedure ODCIIndexCreate_pr creates a partitioned table:

        stmt1 := 'CREATE TABLE ' || get_index_tab_name(ia)
        ||q'[
        (
           beg_date date
          ,end_date date
          ,rid rowid
          ,DURATION_MINUTES number as ((end_date-beg_date)*24*60)
        )
        partition by range(DURATION_MINUTES)
        (
            partition part_15_min   values less than (15)
           ,partition part_2_days   values less than (2880)   --2*24*60
           ,partition part_40_days  values less than (57600)  --40*24*60
           ,partition part_400_days values less than (576000) --400*24*60
           ,partition p_max         values less than (maxvalue)
        )
        ]';

Efficient Query Execution:
The procedure ODCIIndexStart_pr executes range queries against this internal table:

      -- This statement returns the qualifying rows for the TRUE case.
      stmt := q'{
        select rid from {tab_name} partition (part_15_min) p1
        where :cmpval between beg_date and end_date
          and end_date<=:cmpval+interval'15'minute
        union all
        select rid from {tab_name} partition (part_2_days) p1
        where :cmpval between beg_date and end_date
          and end_date<=:cmpval+2
        union all
        select rid from {tab_name} partition (part_40_days) p1
        where :cmpval between beg_date and end_date
          and end_date<=:cmpval+40
        union all
        select rid from {tab_name} partition (part_400_days) p1
        where :cmpval between beg_date and end_date
          and end_date<=:cmpval+400
        union all
        select rid from {tab_name} partition (p_max) p1
        where :cmpval between beg_date and end_date
        }';

Returning Results:
The ODCIIndexFetch_pr procedure retrieves the list of qualifying ROWID values:

    FETCH cur BULK COLLECT INTO rowids limit nrows;

Here is the corresponding function implementation:

    MEMBER FUNCTION ODCIIndexFetch(
        self in out idx_range_date_type,
        nrows NUMBER,
        rids  OUT sys.ODCIRidList,
        env   sys.ODCIEnv
    ) RETURN NUMBER 
    IS
      cnum number;
      cur sys_refcursor;
    BEGIN
      idx_range_date_pkg.p_debug('Fetch: nrows='||nrows);
      cnum:=self.curnum;
      cur:=dbms_sql.to_refcursor(cnum);
      idx_range_date_pkg.p_debug('Fetch: converted to refcursor');

      idx_range_date_pkg.ODCIIndexFetch_pr(nrows,rids,env,cur);
      
      self.curnum:=dbms_sql.to_cursor_number(cur);
      RETURN ODCICONST.SUCCESS;
    END;

4. Creating the `INDEXTYPE`

CREATE OR REPLACE INDEXTYPE idx_range_date_idxtype
FOR
  DATE_IN_RANGE(VARCHAR2,DATE)
USING idx_range_date_type;
/

Now we created all the required objects, so it’s time to create the index.

5. Adding a Virtual Generated Column

Since the ODCI interface only supports indexing a single column, we combine beg_date and end_date into a virtual generated column:

alter table test_table 
  add virt_date_range varchar2(39)
      generated always as
       (to_char(beg_date,'YYYY-MM-DD HH24:MI:SS')||';'||to_char(end_date,'YYYY-MM-DD HH24:MI:SS'))
/

6. Creating the Index

We create the domain index on the virtual column:

CREATE INDEX test_range_index ON test_table (virt_date_range)
  INDEXTYPE IS idx_range_date_idxtype
/

7. Testing the Index

Let’s test the index with a query:

SQL> select count(*) from test_table where DATE_IN_RANGE(virt_date_range,date'2012-02-01')=1;

  COUNT(*)
----------
       943

Execution Plan:

SQL> @last

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
SQL_ID  17wncu9ftfzf6, child number 0
-------------------------------------
select count(*) from test_table where
DATE_IN_RANGE(virt_date_range,date'2012-02-01')=1

Plan hash value: 2131856123

---------------------------------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Starts | E-Rows |E-Bytes| Cost  | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |      1 |        |       |  9218 |      1 |00:00:00.01 |      30 |
|   1 |  SORT AGGREGATE  |                  |      1 |      1 |    40 |       |      1 |00:00:00.01 |      30 |
|*  2 |   DOMAIN INDEX   | TEST_RANGE_INDEX |      1 |        |       |       |    943 |00:00:00.01 |      30 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("XTENDER"."DATE_IN_RANGE"("VIRT_DATE_RANGE",TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))=1)

Results: Only 30 logical reads were needed for the same date 2012-02-01!

Summary

Using a custom domain index allows us to implement this method with minimal changes to existing tables. This method efficiently handles interval queries while requiring significantly fewer logical reads.

In the next part, I will demonstrate how to enhance the Dynamic Range Segmentation method by introducing interval quantization—splitting longer intervals into smaller sub-intervals represented as separate rows.

Interval Search: Part 2. Dynamic Range Segmentation – Simplified

Posted on December 19, 2024 by Sayan Malakshinov Posted in CBO, DB architecture, interval search, oracle, query optimizing, SQL, troubleshooting 1,926 Page views 4 Comments

In the previous part, I discussed the most efficient known methods for optimizing range queries. In this part, I’ll introduce a simple version of my custom approach, which I call Dynamic Range Segmentation (DRS).

As explained earlier, a significant issue with conventional approaches is the lack of both boundaries in the ACCESS predicates. This forces the database to scan all index entries either above or below the target value, depending on the order of the indexed columns.

Dynamic Range Segmentation solves this problem by segmenting data based on the interval length.

Let’s create a table partitioned by interval lengths with the following partitions:

part_15_min: Intervals shorter than 15 minutes.
part_2_days: Intervals between 15 minutes and 2 days.
part_40_days: Intervals between 2 days and 40 days.
part_400_days: Intervals between 40 days and 400 days.
p_max: All remaining intervals

Here’s the DDL for the partitioned table:

create table Dynamic_Range_Segmentation(
   beg_date date
  ,end_date date
  ,rid rowid
  ,DURATION_MINUTES number as ((end_date-beg_date)*24*60)
)
partition by range(DURATION_MINUTES)
(
    partition part_15_min   values less than (15)
   ,partition part_2_days   values less than (2880)   --2*24*60
   ,partition part_40_days  values less than (57600)  --40*24*60
   ,partition part_400_days values less than (576000) --400*24*60
   ,partition p_max         values less than (maxvalue)
);

The DURATION_MINUTES column is a virtual generated column that computes the interval length in minutes as the difference between beg_date and end_date.

We will explore the nuances of selecting specific partition boundaries in future parts. For now, let’s focus on the approach itself.

We populate the partitioned table with the same test data and create a local index on (end_date, beg_date):

insert/*+append parallel(4) */ into Dynamic_Range_Segmentation(beg_date,end_date,rid)
select beg_date,end_date,rowid from test_table;

create index ix_drs on Dynamic_Range_Segmentation(end_date,beg_date) local;
call dbms_stats.gather_table_stats('','Dynamic_Range_Segmentation');

Optimizing the Query

By segmenting the data, we can assert with certainty that if we are searching for records in the part_15_min partition, the qualifying records must satisfy the condition
end_date <= :dt + INTERVAL '15' MINUTE
because no intervals in this partition exceed 15 minutes in length. This additional boundary provides the much-needed second predicate.

Thus, we can optimize our query by addressing each partition individually, adding upper boundaries for all partitions except the last one (p_max):

select count(*),min(beg_date),max(end_date) from (
  select * from Dynamic_Range_Segmentation partition (part_15_min) p1
  where date'2012-02-01' between beg_date and end_date
    and end_date<=date'2012-02-01'+interval'15'minute
  union all
  select * from Dynamic_Range_Segmentation partition (part_2_days) p1
  where date'2012-02-01' between beg_date and end_date
    and end_date<=date'2012-02-01'+2
  union all
  select * from Dynamic_Range_Segmentation partition (part_40_days) p1
  where date'2012-02-01' between beg_date and end_date
    and end_date<=date'2012-02-01'+40  union all
  select * from Dynamic_Range_Segmentation partition (part_400_days) p1
  where date'2012-02-01' between beg_date and end_date
    and end_date<=date'2012-02-01'+400
  union all
  select * from Dynamic_Range_Segmentation partition (p_max) p1
  where date'2012-02-01' between beg_date and end_date
);

Results:

  COUNT(*) MIN(BEG_DATE)       MAX(END_DATE)
---------- ------------------- -------------------
       943 2011-01-03 00:00:00 2013-03-03 00:00:00

SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last -alias -projection'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  204zu1xhdqcq3, child number 0
-------------------------------------
select count(*),min(beg_date),max(end_date) from (   select * from
Dynamic_Range_Segmentation partition (part_15_min) p1   where
date'2012-02-01' between beg_date and end_date     and
end_date<=date'2012-02-01'+interval'15'minute   union all   select *
from Dynamic_Range_Segmentation partition (part_2_days) p1   where
date'2012-02-01' between beg_date and end_date     and
end_date<=date'2012-02-01'+2   union all   select * from
Dynamic_Range_Segmentation partition (part_40_days) p1   where
date'2012-02-01' between beg_date and end_date     and
end_date<=date'2012-02-01'+40  union all   select * from
Dynamic_Range_Segmentation partition (part_400_days) p1   where
date'2012-02-01' between beg_date and end_date     and
end_date<=date'2012-02-01'+400   union all   select * from
Dynamic_Range_Segmentation partition (p_max) p1   where
date'2012-02-01' between beg_date and end_date )

Plan hash value: 1181465968

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |        |      1 |        |       |    24 (100)|          |       |       |      1 |00:00:00.01 |      28 |
|   1 |  SORT AGGREGATE           |        |      1 |      1 |    18 |            |          |       |       |      1 |00:00:00.01 |      28 |
|   2 |   VIEW                    |        |      1 |   1582 | 28476 |    24   (0)| 00:00:01 |       |       |    943 |00:00:00.01 |      28 |
|   3 |    UNION-ALL              |        |      1 |        |       |            |          |       |       |    943 |00:00:00.01 |      28 |
|   4 |     PARTITION RANGE SINGLE|        |      1 |      4 |    64 |     3   (0)| 00:00:01 |     1 |     1 |      3 |00:00:00.01 |       3 |
|*  5 |      INDEX RANGE SCAN     | IX_DRS |      1 |      4 |    64 |     3   (0)| 00:00:01 |     1 |     1 |      3 |00:00:00.01 |       3 |
|   6 |     PARTITION RANGE SINGLE|        |      1 |    536 |  8576 |     7   (0)| 00:00:01 |     2 |     2 |     19 |00:00:00.01 |       7 |
|*  7 |      INDEX RANGE SCAN     | IX_DRS |      1 |    536 |  8576 |     7   (0)| 00:00:01 |     2 |     2 |     19 |00:00:00.01 |       7 |
|   8 |     PARTITION RANGE SINGLE|        |      1 |    929 | 14864 |    10   (0)| 00:00:01 |     3 |     3 |    890 |00:00:00.01 |      10 |
|*  9 |      INDEX RANGE SCAN     | IX_DRS |      1 |    929 | 14864 |    10   (0)| 00:00:01 |     3 |     3 |    890 |00:00:00.01 |      10 |
|  10 |     PARTITION RANGE SINGLE|        |      1 |     29 |   464 |     2   (0)| 00:00:01 |     4 |     4 |     17 |00:00:00.01 |       2 |
|* 11 |      INDEX RANGE SCAN     | IX_DRS |      1 |     29 |   464 |     2   (0)| 00:00:01 |     4 |     4 |     17 |00:00:00.01 |       2 |
|  12 |     PARTITION RANGE SINGLE|        |      1 |     84 |  1344 |     2   (0)| 00:00:01 |     5 |     5 |     14 |00:00:00.01 |       6 |
|* 13 |      INDEX FAST FULL SCAN | IX_DRS |      1 |     84 |  1344 |     2   (0)| 00:00:01 |     5 |     5 |     14 |00:00:00.01 |       6 |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE"<=TO_DATE(' 2012-02-01 00:15:00',
              'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE"<=TO_DATE(' 2012-02-03 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   9 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE"<=TO_DATE(' 2012-03-12 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  11 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE"<=TO_DATE(' 2013-03-07 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  13 - filter(("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE">=TO_DATE(' 2012-02-01 00:00:00',
              'syyyy-mm-dd hh24:mi:ss')))

This approach reduces logical reads (LIOs) to 28, compared to the 183 in the best case from the previous parts.

Simplifying with a SQL Macro

To avoid writing such large queries repeatedly, we can create a SQL Macro:

create or replace function DRS_by_date_macro(dt date)
  return varchar2 sql_macro
is
begin
  return q'{
  select * from Dynamic_Range_Segmentation partition (part_15_min) p1
  where dt between beg_date and end_date
    and end_date<=dt+interval'15'minute
  union all
  select * from Dynamic_Range_Segmentation partition (part_2_days) p1
  where dt between beg_date and end_date
    and end_date<=dt+2
  union all
  select * from Dynamic_Range_Segmentation partition (part_40_days) p1
  where dt between beg_date and end_date
    and end_date<=dt+40
  union all
  select * from Dynamic_Range_Segmentation partition (part_400_days) p1
  where dt between beg_date and end_date
    and end_date<=dt+400
  union all
  select * from Dynamic_Range_Segmentation partition (p_max) p1
  where dt between beg_date and end_date
  }';
end;
/

With this macro, queries become concise:

SQL> select count(*),min(beg_date),max(end_date) from DRS_by_date_macro(date'2012-02-01');

  COUNT(*) MIN(BEG_DATE)       MAX(END_DATE)
---------- ------------------- -------------------
       943 2011-01-03 00:00:00 2013-03-03 00:00:00

Execution plan:

SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last -alias -projection'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7nmx3cnwrmd0c, child number 0
-------------------------------------
select count(*),min(beg_date),max(end_date) from
DRS_by_date_macro(date'2012-02-01')

Plan hash value: 1181465968

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name   | Starts | E-Rows s| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |        |      1 |         |    24 (100)|          |       |       |      1 |00:00:00.01 |      28 |
|   1 |  SORT AGGREGATE           |        |      1 |      1  |            |          |       |       |      1 |00:00:00.01 |      28 |
|   2 |   VIEW                    |        |      1 |   1582  |    24   (0)| 00:00:01 |       |       |    943 |00:00:00.01 |      28 |
|   3 |    UNION-ALL              |        |      1 |         |            |          |       |       |    943 |00:00:00.01 |      28 |
|   4 |     PARTITION RANGE SINGLE|        |      1 |      4  |     3   (0)| 00:00:01 |     1 |     1 |      3 |00:00:00.01 |       3 |
|*  5 |      INDEX RANGE SCAN     | IX_DRS |      1 |      4  |     3   (0)| 00:00:01 |     1 |     1 |      3 |00:00:00.01 |       3 |
|   6 |     PARTITION RANGE SINGLE|        |      1 |    536  |     7   (0)| 00:00:01 |     2 |     2 |     19 |00:00:00.01 |       7 |
|*  7 |      INDEX RANGE SCAN     | IX_DRS |      1 |    536  |     7   (0)| 00:00:01 |     2 |     2 |     19 |00:00:00.01 |       7 |
|   8 |     PARTITION RANGE SINGLE|        |      1 |    929  |    10   (0)| 00:00:01 |     3 |     3 |    890 |00:00:00.01 |      10 |
|*  9 |      INDEX RANGE SCAN     | IX_DRS |      1 |    929  |    10   (0)| 00:00:01 |     3 |     3 |    890 |00:00:00.01 |      10 |
|  10 |     PARTITION RANGE SINGLE|        |      1 |     29  |     2   (0)| 00:00:01 |     4 |     4 |     17 |00:00:00.01 |       2 |
|* 11 |      INDEX RANGE SCAN     | IX_DRS |      1 |     29  |     2   (0)| 00:00:01 |     4 |     4 |     17 |00:00:00.01 |       2 |
|  12 |     PARTITION RANGE SINGLE|        |      1 |     84  |     2   (0)| 00:00:01 |     5 |     5 |     14 |00:00:00.01 |       6 |
|* 13 |      INDEX FAST FULL SCAN | IX_DRS |      1 |     84  |     2   (0)| 00:00:01 |     5 |     5 |     14 |00:00:00.01 |       6 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE"<=TO_DATE(' 2012-02-01 00:15:00',
              'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE"<=TO_DATE(' 2012-02-03 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   9 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE"<=TO_DATE(' 2012-03-12 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  11 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE"<=TO_DATE(' 2013-03-07 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  13 - filter(("BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "END_DATE">=TO_DATE(' 2012-02-01 00:00:00',
              'syyyy-mm-dd hh24:mi:ss')))

Conclusion and Next Steps

This approach can also be implemented in various ways, such as using materialized views, globally partitioned indexes, or other methods.

In the next part, I will demonstrate how to create a custom domain index to further optimize this method.

Interval Search: Optimizing Date Range Queries – Part 1

Posted on December 19, 2024 by Sayan Malakshinov Posted in DB architecture, interval search, oracle, query optimizing, SQL, troubleshooting 2,339 Page views 3 Comments

One of the most common and enduring challenges in database management is performing efficient interval searches, particularly for date intervals such as: WHERE :dt BETWEEN beg_date AND end_date.

In this series of articles, I will explore various strategies for optimizing such searches. We’ll delve into well-known standard approaches, analyze their limitations, and introduce my custom method—a method I promised to share several years ago, but I had postponed writing about it because the topic’s complexity seemed daunting, requiring a deep dive into the nuances of the data itself (e.g., open intervals, extreme values, data distribution, and skew). However, after receiving yet another question about it recently, I realized that I could no longer delay. Even if it means addressing some of the finer details in later parts, it’s time to start sharing this method in manageable steps.

Defining the Problem

In many applications involving historical data, a common modeling approach is SCD (Slowly Changing Dimension) Type 2 (reference). This method often uses columns such as begin_date and end_date to represent the validity period of each record.

To find records that are valid at a specific point in time, queries often use predicates like:
WHERE :dt BETWEEN beg_date AND end_date

The challenge lies in finding a universal and efficient method to execute such queries.

Solution Approaches

Let’s begin by creating a test table and generating sample data for evaluation:

create table test_table(
   beg_date date
  ,end_date date
  ,padding  varchar2(10)
);
declare
  procedure p_insert(
    start_date       date
   ,end_date         date
   ,step_minutes     number
   ,duration_minutes number
  ) is
  begin
    insert/*+ append */ into test_table(beg_date,end_date,padding)
    select 
      start_date + n * numtodsinterval(step_minutes,'minute')
     ,start_date + n * numtodsinterval(step_minutes,'minute') + numtodsinterval(duration_minutes,'minute')
     ,'0123456789'
    from xmltable('0 to xs:integer(.)' 
          passing ceil( (end_date - start_date)*24*60/step_minutes)
          columns n int path '.'
         );
    commit;
  end;
begin
  -- 5 min intervals every 5 minutes: 00:00-00:15, 00:05-00:20,etc:
  --p_insert(date'2000-01-01',sysdate, 5, 5);
  -- 5 min intervals every 5 minutes starting from 00:02 : 00:02-00:07, 00:07-00:12,etc
  p_insert(date'2000-01-01'+interval'2'minute,sysdate, 5, 5);
  -- 15 min intervals every 5 minutes: 00:00-00:15, 00:05-00:20,etc:
  p_insert(date'2000-01-01',sysdate, 5, 15);
  -- 30 min intervals every 15 minutes: 00:00-00:30, 00:15-00:45,etc:
  p_insert(date'2000-01-01',sysdate, 15, 30);
  -- 1 hour intervals every 15 minutes: 00:00-01:00, 00:15-01:15,etc:
  p_insert(date'2000-01-01',sysdate, 15, 60);
  -- 2 hour intervals every 20 minutes: 00:00-02:00, 00:20-02:00,etc:
  p_insert(date'2000-01-01',sysdate, 20, 120);
  -- 7 days intervals every 60 minutes:
  p_insert(date'2000-01-01',sysdate, 60, 7*24*60);
  -- 30 days intervals every 1 hour:
  p_insert(date'2000-01-01',sysdate, 60, 30*24*60);
  -- 120 days intervals every 7 days:
  p_insert(date'2000-01-01',sysdate, 7*24*60, 120*24*60);
  -- 400 days intervals every 30 days:
  p_insert(date'2000-01-01',sysdate, 30*24*60, 400*24*60);
end;
/

We’ve got a table with 10mln rows with different date intervals:

SQL> select count(*),min(beg_date),max(end_date) from test_table;

  COUNT(*) MIN(BEG_DATE)       MAX(END_DATE)
---------- ------------------- -------------------
  10723261 2000-01-01 00:00:00 2026-01-24 00:00:00

1. Simple Composite Indexes

1.1 Index on `(beg_date, end_date)`

The most straightforward approach is to create a composite index on (beg_date, end_date). However, even at first glance, it’s clear that this method has significant inefficiencies.

When we use a predicate like :dt BETWEEN beg_date AND end_date, it breaks down into two sub-predicates:

Access Predicate: beg_date <= :dt
This is used for index access since beg_date is the leading column in the index. However, the query will need to scan and evaluate all index entries that satisfy this condition.

Filter Predicate: :dt <= end_date
This acts as a filter on the results from the access predicate.

As the dataset grows, both beg_date and end_date values increase over time. Consequently, because the access predicate (beg_date <= :dt) is used to locate potential matches, the query will scan an ever-growing portion of the index.

1.2 Index on `(end_date, beg_date)`

This is one of the most widely adopted approaches. By simply rearranging the order of columns in the index, placing end_date first, we can achieve significantly better performance in most cases.

Why? Queries tend to target data closer to the current time, and much less frequently target records from far in the past. By indexing on end_date first, the query engine can more effectively narrow down the relevant portion of the index.

Let’s create the indexes and assess their performance:

create index ix_beg_end on test_table(beg_date,end_date);
create index ix_end_beg on test_table(end_date,beg_date);

select segment_name,blocks,bytes/1024/1024 as mbytes 
from user_segments 
where segment_name in ('IX_BEG_END','IX_END_BEG','TEST_TABLE');

SEGMENT_NAME             BLOCKS     MBYTES
-------------------- ---------- ----------
IX_BEG_END                40960        320
IX_END_BEG                40832        319
TEST_TABLE                48128        376

Let’s query the records valid 100 days ago using the (beg_date, end_date) index:

SQL> select/*+ index(test_table (beg_date,end_date)) */ count(*),min(beg_date),max(end_date) from test_table where sysdate-100 between beg_date and end_date;

  COUNT(*) MIN(BEG_DATE)       MAX(END_DATE)
---------- ------------------- -------------------
       935 2023-08-28 00:00:00 2025-09-26 00:00:00

SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last'));


Plan hash value: 1056805589

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      1 |        |       | 40375 (100)|          |      1 |00:00:00.79 |   40200 |
|   1 |  SORT AGGREGATE   |            |      1 |      1 |    16 |            |          |      1 |00:00:00.79 |   40200 |
|*  2 |   INDEX RANGE SCAN| IX_BEG_END |      1 |  28472 |   444K| 40375   (1)| 00:00:02 |    935 |00:00:00.79 |   40200 |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("END_DATE">=SYSDATE@!-100 AND "BEG_DATE"<=SYSDATE@!-100)
       filter("END_DATE">=SYSDATE@!-100)

As seen, the query required 40,200 logical reads, almost the entire index, which contains 40,960 blocks.

Now, let’s query the same data using the (end_date, beg_date) index:

SQL> select count(*),min(beg_date),max(end_date) from test_table where sysdate-100 between beg_date and end_date;

  COUNT(*) MIN(BEG_DATE)       MAX(END_DATE)
---------- ------------------- -------------------
       935 2023-08-28 00:00:00 2025-09-26 00:00:00

SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last'));

Plan hash value: 416972780

-------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      1 |        |   450 (100)|      1 |00:00:00.01 |     453 |
|   1 |  SORT AGGREGATE   |            |      1 |      1 |            |      1 |00:00:00.01 |     453 |
|*  2 |   INDEX RANGE SCAN| IX_END_BEG |      1 |  28472 |   450   (1)|    935 |00:00:00.01 |     453 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("END_DATE">=SYSDATE@!-100 AND "END_DATE" IS NOT NULL)
       filter("BEG_DATE"<=SYSDATE@!-100)

Using this index required only 453 logical reads, a dramatic improvement compared to the 40,200 reads with the first index.

Adding an Upper Bound for `end_date`

To illustrate the importance of having both upper and lower bounds for effective range queries, let’s further restrict the query with end_date < SYSDATE - 70:

SQL> select count(*),min(beg_date),max(end_date) from test_table where sysdate-100 between beg_date and end_date and end_date<sysdate-70;

  COUNT(*) MIN(BEG_DATE)       MAX(END_DATE)
---------- ------------------- -------------------
       910 2023-08-28 00:00:00 2024-10-08 02:00:00

SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last'));

Plan hash value: 3937277202

-----------------------------------------------------------------------------------------------
| Id  | Operation          | Name       | Starts | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |      1 |   136 (100)|      1 |00:00:00.01 |     137 |
|   1 |  SORT AGGREGATE    |            |      1 |            |      1 |00:00:00.01 |     137 |
|*  2 |   FILTER           |            |      1 |            |    910 |00:00:00.01 |     137 |
|*  3 |    INDEX RANGE SCAN| IX_END_BEG |      1 |   136   (0)|    910 |00:00:00.01 |     137 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYSDATE@!-70>SYSDATE@!-100)
   3 - access("END_DATE">=SYSDATE@!-100 AND "END_DATE"<SYSDATE@!-70 AND "BEG_DATE"<=SYSDATE@!-100)
       filter("BEG_DATE"<=SYSDATE@!-100)

We retrieved nearly all required records (910 out of 935), but the number of logical I/O operations (LIO) dropped by more than threefold.

To illustrate the inherent limitations of our current indexing strategies, let’s simplify the scenario. Suppose we have a table of integer intervals (START, END) containing 10 million records: (0,1), (1,2), (2,3), …, (9999999, 10000000). When searching for a record where 5000000 BETWEEN START AND END, regardless of whether we use an index on (START, END) or (END, START), we would have to scan approximately half of the index. This clearly demonstrates that neither of these indexes can serve as a universal solution; under certain conditions, both indexes become inefficient.

Let’s illustrate this issue using our test table. We’ll select a date roughly in the middle of our dataset – date’2012-02-01′ – and examine the performance of both indexes.

First, we’ll test the query using the (beg_date, end_date) index:

SQL> select/*+ index(test_table (beg_date,end_date)) */ count(*),min(beg_date),max(end_date) from test_table where date'2012-02-01' between beg_date and end_date;

  COUNT(*) MIN(BEG_DATE)       MAX(END_DATE)
---------- ------------------- -------------------
       943 2011-01-03 00:00:00 2013-03-03 00:00:00

SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last -alias -projection'));

Plan hash value: 1056805589

--------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows | Cost (%CPU)|| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      1 |        | 19355 (100)||      1 |00:00:00.45 |   19680 |
|   1 |  SORT AGGREGATE   |            |      1 |      1 |            ||      1 |00:00:00.45 |   19680 |
|*  2 |   INDEX RANGE SCAN| IX_BEG_END |      1 |   2783K| 19355   (1)||    943 |00:00:00.45 |   19680 |
--------------------------------------------------------------------------------------------------------

The query required almost 20,000 LIO operations, a significant portion of the total index size. Next, we’ll perform the same query using the (end_date, beg_date) index:

select/*+ index(test_table (end_date,beg_date)) */ count(*),min(beg_date),max(end_date) from test_table where date'2012-02-01' between beg_date and end_date;

  COUNT(*) MIN(BEG_DATE)       MAX(END_DATE)
---------- ------------------- -------------------
       943 2011-01-03 00:00:00 2013-03-03 00:00:00

-------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      1 |        | 20929 (100)|      1 |00:00:00.38 |   20973 |
|   1 |  SORT AGGREGATE   |            |      1 |      1 |            |      1 |00:00:00.38 |   20973 |
|*  2 |   INDEX RANGE SCAN| IX_END_BEG |      1 |    655K| 20929   (1)|    943 |00:00:00.38 |   20973 |
-------------------------------------------------------------------------------------------------------

Similarly, this query also required approximately 20,000 LIO operations, illustrating that both indices suffer from similar inefficiencies for this type of query.

The high number of logical reads in both cases highlights that neither index provides an efficient solution for queries with dates in the middle of the data range. The database engine must scan a large portion of the index to find the matching records, resulting in increased I/O and slower query performance, especially when the search value lies in the middle of the data range.

2. Partitioning + Composite Indexes

This approach is far less common but offers significant advantages. In the previous examples with composite indexes, the predicate on the second column of the index did not help reduce the number of scanned index entries. However, by partitioning the table on this second column, we can leverage partition pruning to exclude irrelevant partitions, significantly reducing the scan scope.

Example: Partitioned Table by `END_DATE`

To demonstrate, let’s create a partitioned table using the same data as in the previous example, partitioned by END_DATE on a yearly interval:

create table test_table_part_1(
   beg_date date
  ,end_date date
  ,rid rowid
)
partition by range(end_date) interval (numtoyminterval(1,'year'))
(
   partition part_01 values less than (date'2000-01-01')
);
insert/*+append parallel(4) */ into test_table_part_1
select beg_date,end_date,rowid from test_table;

create index ix_tt_part_local on test_table_part_1(beg_date,end_date) local;
call dbms_stats.gather_table_stats('','test_table_part_1');

This results in 28 partitions:

SQL> select partition_name,partition_position,blevel,leaf_blocks,num_rows from user_ind_partitions where index_name='IX_TT_PART_LOCAL';

PARTITION_NAME PARTITION_POSITION     BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ------------------ ---------- ----------- ----------
PART_01                         1          0           0          0
SYS_P8333                       2          2        1621     429547
SYS_P8341                       3          2        1621     429304
SYS_P8348                       4          2        1621     429304
SYS_P8353                       5          2        1621     429304
SYS_P8355                       6          2        1625     430480
SYS_P8332                       7          2        1621     429304
SYS_P8335                       8          2        1621     429305
SYS_P8331                       9          2        1621     429305
SYS_P8336                      10          2        1625     430480
SYS_P8338                      11          2        1621     429304
SYS_P8340                      12          2        1621     429304
SYS_P8343                      13          2        1621     429304
SYS_P8345                      14          2        1625     430481
SYS_P8347                      15          2        1621     429305
SYS_P8352                      16          2        1621     429304
SYS_P8350                      17          2        1621     429304
SYS_P8351                      18          2        1625     430480
SYS_P8334                      19          2        1621     429305
SYS_P8337                      20          2        1621     429304
SYS_P8339                      21          2        1621     429305
SYS_P8342                      22          2        1625     430480
SYS_P8344                      23          2        1621     429304
SYS_P8346                      24          2        1621     429304
SYS_P8349                      25          2        1621     429305
SYS_P8354                      26          2        1561     413443
SYS_P8356                      27          1           2        391
SYS_P8357                      28          0           1          1

Let’s test the same query for the same DATE '2012-02-01' using the partitioned table:

SQL> select/*+ index(t (beg_date,end_date)) */ count(*),min(beg_date),max(end_date) from test_table_part_1 t where date'2012-02-01' between beg_date and end_date;

  COUNT(*) MIN(BEG_DATE)       MAX(END_DATE)
---------- ------------------- -------------------
       943 2011-01-03 00:00:00 2013-03-03 00:00:00

SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last -alias -projection'));

Plan hash value: 1651658810

-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name             | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                  |      1 |        | 10259 (100)|       |       |      1 |00:00:00.01 |     183 |
|   1 |  SORT AGGREGATE           |                  |      1 |      1 |            |       |       |      1 |00:00:00.01 |     183 |
|   2 |   PARTITION RANGE ITERATOR|                  |      1 |   2783K| 10259   (1)|    14 |1048575|    943 |00:00:00.01 |     183 |
|*  3 |    INDEX RANGE SCAN       | IX_TT_PART_LOCAL |     15 |   2783K| 10259   (1)|    14 |1048575|    943 |00:00:00.01 |     183 |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
       filter("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

As shown, this approach reduced the number of logical reads (LIO) to just 183, compared to 20,000 in the earlier examples. Partitioning the table on END_DATE combined with a composite local index dramatically improves query performance by limiting the scan scope through partition pruning. Even in the worst-case scenario, the number of logical reads is orders of magnitude lower than with global composite indexes. This makes it a highly effective strategy for interval searches.

Next part: Interval Search: Part 2. Dynamic Range Segmentation – Simplified

CBO and Partial indexing

Posted on November 2, 2022 by Sayan Malakshinov Posted in bug, CBO, oracle, query optimizing, SQL, trace, troubleshooting 2,653 Page views Leave a comment

Oracle 12c introduced Partial indexing, which works well for simple partitioned tables with literals. However, it has several significant issues:

Continue reading→

Slow index access “COL=:N” where :N is NULL

Posted on October 31, 2022 by Sayan Malakshinov Posted in CBO, curious, Funny, oracle, query optimizing, SQL, troubleshooting 2,433 Page views Leave a comment

All Oracle specialists know that a predicate X=NULL can never be true and we should use “X is NULL” in such cases. The Oracle optimizer knows about that, so if we create a table like this:

Continue reading→

Workarounds for JPPD with view and table(kokbf$), xmltable or json_table functions

Posted on May 30, 2019 by Sayan Malakshinov Posted in CBO, oracle, query optimizing, SQL, troubleshooting 2,264 Page views Leave a comment

You may know that table() (kokbf$ collection functions), xmltable and json_table functions block Join-Predicate PushDown(JPPD).

Simple example:

DDL

[sourcecode language=”sql”]
create table xtest(a, b, c) as
select mod(level,1000),level,rpad(‘x’,100,’x’)
from dual
connect by level<=1e4
/
create index itest on xtest(a)
/
create or replace view vtest as
select a,count(b) cnt
from xtest
group by a
/
call dbms_stats.gather_table_stats(user,’xtest’);
/
[/sourcecode]

[collapse]

select distinct v.* 
from table(sys.odcinumberlist(1,2,3)) c, vtest v
where v.a = c.column_value;

Plan hash value: 699667151

-------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |       |     1 |    19 |    80   (4)| 00:00:01 |
|   1 |  HASH UNIQUE                            |       |     1 |    19 |    80   (4)| 00:00:01 |
|*  2 |   HASH JOIN                             |       |     1 |    19 |    79   (3)| 00:00:01 |
|   3 |    COLLECTION ITERATOR CONSTRUCTOR FETCH|       |     1 |     2 |    29   (0)| 00:00:01 |
|   4 |    VIEW                                 | VTEST |  1000 | 17000 |    49   (3)| 00:00:01 |
|   5 |     HASH GROUP BY                       |       |  1000 |  8000 |    49   (3)| 00:00:01 |
|   6 |      TABLE ACCESS FULL                  | XTEST | 10000 | 80000 |    48   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("V"."A"=VALUE(KOKBF$))

same for json_table

[sourcecode language=”sql”]
select/*+ cardinality(c 1) use_nl(v) push_pred(v) */ *
from json_table(‘{"a":[1,2,3]}’, ‘$.a[*]’ COLUMNS (a int PATH ‘$’)) c
,vtest v
where c.a = v.a;

Plan hash value: 664523328

——————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 28 | 78 (2)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 28 | 78 (2)| 00:00:01 |
| 2 | JSONTABLE EVALUATION | | | | | |
|* 3 | VIEW | VTEST | 1 | 26 | 49 (3)| 00:00:01 |
| 4 | SORT GROUP BY | | 1000 | 8000 | 49 (3)| 00:00:01 |
| 5 | TABLE ACCESS FULL | XTEST | 10000 | 80000 | 48 (0)| 00:00:01 |
——————————————————————————–

Predicate Information (identified by operation id):
—————————————————

3 – filter("V"."A"="P"."A")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U – Unused (1))
—————————————————————————

1 – SEL$F534CA49 / V@SEL$1
U – push_pred(v)
[/sourcecode]

[collapse]

same for xmltable

[sourcecode language=”sql”]
select/*+ leading(c v) cardinality(c 1) use_nl(v) push_pred(v) */ v.*
from xmltable(‘(1,3)’ columns a int path ‘.’) c,vtest v
where c.a = v.a(+);

Plan hash value: 564839666

————————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 28 | 78 (2)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 1 | 28 | 78 (2)| 00:00:01 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| XQSEQUENCEFROMXMLTYPE | 1 | 2 | 29 (0)| 00:00:01 |
|* 3 | VIEW | VTEST | 1 | 26 | 49 (3)| 00:00:01 |
| 4 | SORT GROUP BY | | 1000 | 8000 | 49 (3)| 00:00:01 |
| 5 | TABLE ACCESS FULL | XTEST | 10000 | 80000 | 48 (0)| 00:00:01 |
————————————————————————————————————

Predicate Information (identified by operation id):
—————————————————

3 – filter("V"."A"(+)=CAST(TO_NUMBER(SYS_XQ_UPKXML2SQL(SYS_XQEXVAL(VALUE(KOKBF$),0,0,54525952,0),
50,1,2)) AS int ))

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U – Unused (1))
—————————————————————————

1 – SEL$6722A2F6 / V@SEL$1
U – push_pred(v)
[/sourcecode]

[collapse]

And compare with this:

create global temporary table temp_collection(a number);

insert into temp_collection select * from table(sys.odcinumberlist(1,2,3));

select/*+ cardinality(c 1) no_merge(v) */
   distinct v.* 
from temp_collection c, vtest v
where v.a = c.a;

Plan hash value: 3561835411

------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                 |     1 |    26 |    41   (3)| 00:00:01 |
|   1 |  HASH UNIQUE                             |                 |     1 |    26 |    41   (3)| 00:00:01 |
|   2 |   NESTED LOOPS                           |                 |     1 |    26 |    40   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL                     | TEMP_COLLECTION |     1 |    13 |    29   (0)| 00:00:01 |
|   4 |    VIEW PUSHED PREDICATE                 | VTEST           |     1 |    13 |    11   (0)| 00:00:01 |
|*  5 |     FILTER                               |                 |       |       |            |          |
|   6 |      SORT AGGREGATE                      |                 |     1 |     8 |            |          |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| XTEST           |    10 |    80 |    11   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | ITEST           |    10 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter(COUNT(*)>0)
   8 - access("A"="C"."A")

You can see that JPPD works fine in case of global temporary tables and, obviously, the first workaround is to avoid such functions with complex views.
But in such simple queries you have 2 other simple options:
1. you can avoid JPPD and get optimal plans using CVM(complex view merge) by just simply rewriting the query using IN or EXISTS:

select * 
from vtest v
where v.a in (select/*+ cardinality(c 1) */ c.* from table(sys.odcinumberlist(1,2,3)) c);

Plan hash value: 1474391442

---------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |       |    10 |   100 |    42   (5)| 00:00:01 |
|   1 |  SORT GROUP BY NOSORT                     |       |    10 |   100 |    42   (5)| 00:00:01 |
|   2 |   NESTED LOOPS                            |       |    10 |   100 |    41   (3)| 00:00:01 |
|   3 |    NESTED LOOPS                           |       |    10 |   100 |    41   (3)| 00:00:01 |
|   4 |     SORT UNIQUE                           |       |     1 |     2 |    29   (0)| 00:00:01 |
|   5 |      COLLECTION ITERATOR CONSTRUCTOR FETCH|       |     1 |     2 |    29   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                      | ITEST |    10 |       |     1   (0)| 00:00:01 |
|   7 |    TABLE ACCESS BY INDEX ROWID            | XTEST |    10 |    80 |    11   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("A"=VALUE(KOKBF$))

the same with json_table and xmltable

[sourcecode language=”sql”]
select *
from vtest t
where t.a in (select/*+ cardinality(v 1) */ v.a from json_table(‘{"a":[1,2,3]}’, ‘$.a[*]’ COLUMNS (a int PATH ‘$’)) v);

Plan hash value: 2910004067

—————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————
| 0 | SELECT STATEMENT | | 10 | 100 | 42 (5)| 00:00:01 |
| 1 | SORT GROUP BY NOSORT | | 10 | 100 | 42 (5)| 00:00:01 |
| 2 | NESTED LOOPS | | 10 | 100 | 41 (3)| 00:00:01 |
| 3 | NESTED LOOPS | | 10 | 100 | 41 (3)| 00:00:01 |
| 4 | SORT UNIQUE | | | | | |
| 5 | JSONTABLE EVALUATION | | | | | |
|* 6 | INDEX RANGE SCAN | ITEST | 10 | | 1 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| XTEST | 10 | 80 | 11 (0)| 00:00:01 |
—————————————————————————————

Predicate Information (identified by operation id):
—————————————————

6 – access("A"="P"."A")

select v.*
from vtest v
where exists(select/*+ cardinality(c 1) */ 1 from xmltable(‘(1,3)’ columns a int path ‘.’) c where c.a = v.a);

Plan hash value: 1646016183

—————————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————————
| 0 | SELECT STATEMENT | | 10 | 100 | 42 (5)| 00:00:01 |
| 1 | SORT GROUP BY NOSORT | | 10 | 100 | 42 (5)| 00:00:01 |
| 2 | NESTED LOOPS | | 10 | 100 | 41 (3)| 00:00:01 |
| 3 | NESTED LOOPS | | 10 | 100 | 41 (3)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 2 | 29 (0)| 00:00:01 |
| 5 | COLLECTION ITERATOR PICKLER FETCH| XQSEQUENCEFROMXMLTYPE | 1 | 2 | 29 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | ITEST | 10 | | 1 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | XTEST | 10 | 80 | 11 (0)| 00:00:01 |
—————————————————————————————————————

Predicate Information (identified by operation id):
—————————————————

6 – access("A"=CAST(TO_NUMBER(SYS_XQ_UPKXML2SQL(SYS_XQEXVAL(VALUE(KOKBF$),0,0,54525952,0),50,1,2)) AS int ))
[/sourcecode]

[collapse]

2. Avoid JPPD using lateral():

select/*+ cardinality(c 1) no_merge(lat) */
   distinct lat.* 
from table(sys.odcinumberlist(1,2,3)) c, 
     lateral(select * from vtest v where v.a = c.column_value) lat;

Plan hash value: 18036714

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |    10 |   190 |    41   (3)| 00:00:01 |
|   1 |  HASH UNIQUE                            |                 |    10 |   190 |    41   (3)| 00:00:01 |
|   2 |   NESTED LOOPS                          |                 |    10 |   190 |    40   (0)| 00:00:01 |
|   3 |    COLLECTION ITERATOR CONSTRUCTOR FETCH|                 |     1 |     2 |    29   (0)| 00:00:01 |
|   4 |    VIEW                                 | VW_LAT_4DB60E85 |    10 |   170 |    11   (0)| 00:00:01 |
|   5 |     SORT GROUP BY                       |                 |    10 |    80 |    11   (0)| 00:00:01 |
|   6 |      TABLE ACCESS BY INDEX ROWID BATCHED| XTEST           |    10 |    80 |    11   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN                  | ITEST           |    10 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("A"=VALUE(KOKBF$))

Let’s see a bit more complex query:

Test tables 2

[sourcecode language=”sql”]
create table xtest1(id primary key, a) as
select level,level from dual connect by level<=1000;

create table xtest2(a, b, c) as
select mod(level,1000),level,rpad(‘x’,100,’x’)
from dual
connect by level<=1e4
/
create index itest2 on xtest2(a)
/
create or replace view vtest2 as
select a,count(b) cnt
from xtest2
group by a
/
[/sourcecode]

[collapse]

select v.* 
from xtest1 t1,
     vtest2 v
where t1.id in (select/*+ cardinality(c 1) */ * from table(sys.odcinumberlist(1,2,3)) c)
  and v.a = t1.a;

Plan hash value: 4293766070

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |              |     1 |    36 |    80   (3)| 00:00:01 |
|*  1 |  HASH JOIN                                 |              |     1 |    36 |    80   (3)| 00:00:01 |
|   2 |   JOIN FILTER CREATE                       | :BF0000      |     1 |    10 |    31   (4)| 00:00:01 |
|   3 |    NESTED LOOPS                            |              |     1 |    10 |    31   (4)| 00:00:01 |
|   4 |     NESTED LOOPS                           |              |     1 |    10 |    31   (4)| 00:00:01 |
|   5 |      SORT UNIQUE                           |              |     1 |     2 |    29   (0)| 00:00:01 |
|   6 |       COLLECTION ITERATOR CONSTRUCTOR FETCH|              |     1 |     2 |    29   (0)| 00:00:01 |
|*  7 |      INDEX UNIQUE SCAN                     | SYS_C0026365 |     1 |       |     0   (0)| 00:00:01 |
|   8 |     TABLE ACCESS BY INDEX ROWID            | XTEST1       |     1 |     8 |     1   (0)| 00:00:01 |
|   9 |   VIEW                                     | VTEST2       |  1000 | 26000 |    49   (3)| 00:00:01 |
|  10 |    HASH GROUP BY                           |              |  1000 |  8000 |    49   (3)| 00:00:01 |
|  11 |     JOIN FILTER USE                        | :BF0000      | 10000 | 80000 |    48   (0)| 00:00:01 |
|* 12 |      TABLE ACCESS FULL                     | XTEST2       | 10000 | 80000 |    48   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("V"."A"="T1"."A")
   7 - access("T1"."ID"=VALUE(KOKBF$))
  12 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"A"))

As you see, CVM can’t help in this case, but we can use lateral():

select/*+ no_merge(lat) */ lat.* 
from xtest1 t1,
     lateral(select * from vtest2 v where v.a = t1.a) lat
where t1.id in (select/*+ cardinality(c 1) */ * from table(sys.odcinumberlist(1,2,3)) c);

Plan hash value: 1798023704

------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                 |    10 |   360 |    42   (3)| 00:00:01 |
|   1 |  NESTED LOOPS                            |                 |    10 |   360 |    42   (3)| 00:00:01 |
|   2 |   NESTED LOOPS                           |                 |     1 |    10 |    31   (4)| 00:00:01 |
|   3 |    SORT UNIQUE                           |                 |     1 |     2 |    29   (0)| 00:00:01 |
|   4 |     COLLECTION ITERATOR CONSTRUCTOR FETCH|                 |     1 |     2 |    29   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID           | XTEST1          |     1 |     8 |     1   (0)| 00:00:01 |
|*  6 |     INDEX UNIQUE SCAN                    | SYS_C0026365    |     1 |       |     0   (0)| 00:00:01 |
|   7 |   VIEW                                   | VW_LAT_A18161FF |    10 |   260 |    11   (0)| 00:00:01 |
|   8 |    SORT GROUP BY                         |                 |    10 |    80 |    11   (0)| 00:00:01 |
|   9 |     TABLE ACCESS BY INDEX ROWID BATCHED  | XTEST2          |    10 |    80 |    11   (0)| 00:00:01 |
|* 10 |      INDEX RANGE SCAN                    | ITEST2          |    10 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("T1"."ID"=VALUE(KOKBF$))
  10 - access("A"="T1"."A")

There is also another workaround with non-documented “precompute_subquery” hint:

select v.* 
from xtest1 t1,
     vtest2 v 
where t1.id in (select/*+ precompute_subquery */ * from table(sys.odcinumberlist(1,2,3)) c)
and v.a = t1.a;

Plan hash value: 1964829099

------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |              |    30 |   480 |    37   (3)| 00:00:01 |
|   1 |  HASH GROUP BY                  |              |    30 |   480 |    37   (3)| 00:00:01 |
|   2 |   NESTED LOOPS                  |              |    30 |   480 |    36   (0)| 00:00:01 |
|   3 |    NESTED LOOPS                 |              |    30 |   480 |    36   (0)| 00:00:01 |
|   4 |     INLIST ITERATOR             |              |       |       |            |          |
|   5 |      TABLE ACCESS BY INDEX ROWID| XTEST1       |     3 |    24 |     3   (0)| 00:00:01 |
|*  6 |       INDEX UNIQUE SCAN         | SYS_C0026365 |     3 |       |     2   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN            | ITEST2       |    10 |       |     1   (0)| 00:00:01 |
|   8 |    TABLE ACCESS BY INDEX ROWID  | XTEST2       |    10 |    80 |    11   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("T1"."ID"=1 OR "T1"."ID"=2 OR "T1"."ID"=3)
   7 - access("A"="T1"."A")

It can help even in most difficult cases, for example if you can’t rewrite query (in this case you can create sql patch or sql profile with “precompute_subquery”), but I wouldn’t suggest it since “precompute_subquery” is non-documented, it can be used only with simple collections and has limitation in 1000 values.
I’d suggest to use the workaround with lateral, since it’s most reliable and very simple.

v$sql_hint.target_level

Posted on May 28, 2019 by Sayan Malakshinov Posted in CBO, oracle, SQL, undocumented 2,071 Page views Leave a comment

Today I wanted to give a link to the description of v$sql_hint.target_level to show that no_parallel can be specified for statement or object, and though it’s pretty obvious, but surprisingly I haven’t found any articles or posts about it, so this short post describes it.
v$sql_hint.target_level is a bitset, where
1st bit set to 1 means that the hint can be specified on statement level,
2nd – on query block level,
3rd – on object level,
4th – on join level(for multiple objects).
Short example:

   select name,sql_feature
          ,class,inverse
          ,version,version_outline
          ,target_level
         ,decode(bitand(target_level,1),0,'no','yes') Statement_level
         ,decode(bitand(target_level,2),0,'no','yes') Query_block_level
         ,decode(bitand(target_level,4),0,'no','yes') Object_level
         ,decode(bitand(target_level,8),0,'no','yes') Join_level
   from v$sql_hint h;

with hints as (
   select name,sql_feature
          ,class,inverse
          ,version,version_outline
          ,target_level
         ,decode(bitand(target_level,1),0,'no','yes') Statement_level
         ,decode(bitand(target_level,2),0,'no','yes') Query_block_level
         ,decode(bitand(target_level,4),0,'no','yes') Object_level
         ,decode(bitand(target_level,8),0,'no','yes') Join_level
   from v$sql_hint h
)
select *
from hints
where statement_level='yes'
  and to_number(regexp_substr(version,'^\d+')) >= 18
order by version;

Result:

NAME              SQL_FEATURE     CLASS                VERSION  TARGET_LEVEL STATEMENT_LEVEL QUERY_BLOCK_LEVEL OBJECT_LEVEL JOIN_LEVEL
----------------- --------------- -------------------- -------- ------------ --------------- ----------------- ------------ ----------
PDB_LOCAL_ONLY    QKSFM_DML       PDB_LOCAL_ONLY       18.1.0              1 yes             no                no           no
SUPPRESS_LOAD     QKSFM_DDL       SUPPRESS_LOAD        18.1.0              1 yes             no                no           no
SYSTEM_STATS      QKSFM_ALL       SYSTEM_STATS         18.1.0              1 yes             no                no           no
MEMOPTIMIZE_WRITE QKSFM_EXECUTION MEMOPTIMIZE_WRITE    18.1.0              1 yes             no                no           no
SKIP_PROXY        QKSFM_ALL       SKIP_PROXY           18.1.0              1 yes             no                no           no
CURRENT_INSTANCE  QKSFM_ALL       CURRENT_INSTANCE     18.1.0              1 yes             no                no           no
JSON_LENGTH       QKSFM_EXECUTION JSON_LENGTH          19.1.0              1 yes             no                no           no
QUARANTINE        QKSFM_EXECUTION QUARANTINE           19.1.0              1 yes             no                no           no

Top time-consuming predicates from ASH

Posted on May 13, 2019 by Sayan Malakshinov Posted in oracle, query optimizing, SQL, statistics, troubleshooting 1,918 Page views Leave a comment

Sometimes it might be useful to analyze top time-consuming filter and access predicates from ASH, especially in cases when db load is spread evenly enough by different queries and top segments do not show anything interesting, except usual things like “some tables are requested more often than others”.
Of course, we can start from analysis of SYS.COL_USAGE$: col_usage.sql

col_usage.sql

[sourcecode language=”sql”]
col owner format a30
col oname format a30 heading "Object name"
col cname format a30 heading "Column name"
accept owner_mask prompt "Enter owner mask: ";
accept tab_name prompt "Enter tab_name mask: ";
accept col_name prompt "Enter col_name mask: ";

SELECT a.username as owner
,o.name as oname
,c.name as cname
,u.equality_preds as equality_preds
,u.equijoin_preds as equijoin_preds
,u.nonequijoin_preds as nonequijoin_preds
,u.range_preds as range_preds
,u.like_preds as like_preds
,u.null_preds as null_preds
,to_char(u.timestamp, ‘yyyy-mm-dd hh24:mi:ss’) when
FROM
sys.col_usage$ u
, sys.obj$ o
, sys.col$ c
, all_users a
WHERE a.user_id = o.owner#
AND u.obj# = o.obj#
AND u.obj# = c.obj#
AND u.intcol# = c.col#
AND a.username like upper(‘&owner_mask’)
AND o.name like upper(‘&tab_name’)
AND c.name like upper(‘&col_name’)
ORDER BY a.username, o.name, c.name
;
col owner clear;
col oname clear;
col cname clear;
undef tab_name col_name owner_mask;
[/sourcecode]

[collapse]

But it’s not enough, for example it doesn’t show predicates combinations. In this case we can use v$active_session_history and v$sql_plan:

Top 50 predicates

[sourcecode language=”sql”]
with
ash as (
select
sql_id
,plan_hash_value
,table_name
,alias
,ACCESS_PREDICATES
,FILTER_PREDICATES
,count(*) cnt
from (
select
h.sql_id
,h.SQL_PLAN_HASH_VALUE plan_hash_value
,decode(p.OPERATION
,’TABLE ACCESS’,p.OBJECT_OWNER||’.’||p.OBJECT_NAME
,(select i.TABLE_OWNER||’.’||i.TABLE_NAME from dba_indexes i where i.OWNER=p.OBJECT_OWNER and i.index_name=p.OBJECT_NAME)
) table_name
,OBJECT_ALIAS ALIAS
,p.ACCESS_PREDICATES
,p.FILTER_PREDICATES
— поля, которые могут быть полезны для анализа в других разрезах:
— ,h.sql_plan_operation
— ,h.sql_plan_options
— ,decode(h.session_state,’ON CPU’,’ON CPU’,h.event) event
— ,h.current_obj#
from v$active_session_history h
,v$sql_plan p
where h.sql_opname=’SELECT’
and h.IN_SQL_EXECUTION=’Y’
and h.sql_plan_operation in (‘INDEX’,’TABLE ACCESS’)
and p.SQL_ID = h.sql_id
and p.CHILD_NUMBER = h.SQL_CHILD_NUMBER
and p.ID = h.SQL_PLAN_LINE_ID
— если захотим за последние 3 часа:
— and h.sample_time >= systimestamp – interval ‘3’ hour
)
— если захотим анализируем предикаты только одной таблицы:
— where table_name=’&OWNER.&TABNAME’
group by
sql_id
,plan_hash_value
,table_name
,alias
,ACCESS_PREDICATES
,FILTER_PREDICATES
)
,agg_by_alias as (
select
table_name
,regexp_substr(ALIAS,’^[^@]+’) ALIAS
,listagg(ACCESS_PREDICATES,’ ‘) within group(order by ACCESS_PREDICATES) ACCESS_PREDICATES
,listagg(FILTER_PREDICATES,’ ‘) within group(order by FILTER_PREDICATES) FILTER_PREDICATES
,sum(cnt) cnt
from ash
group by
sql_id
,plan_hash_value
,table_name
,alias
)
,agg as (
select
table_name
,’ALIAS’ alias
,replace(access_predicates,’"’||alias||’".’,’"ALIAS".’) access_predicates
,replace(filter_predicates,’"’||alias||’".’,’"ALIAS".’) filter_predicates
,sum(cnt) cnt
from agg_by_alias
group by
table_name
,replace(access_predicates,’"’||alias||’".’,’"ALIAS".’)
,replace(filter_predicates,’"’||alias||’".’,’"ALIAS".’)
)
,cols as (
select
table_name
,cols
,access_predicates
,filter_predicates
,sum(cnt)over(partition by table_name,cols) total_by_cols
,cnt
from agg
,xmltable(
‘string-join(for $c in /ROWSET/ROW/COL order by $c return $c,",")’
passing
xmltype(
cursor(
(select distinct
nvl(
regexp_substr(
access_predicates||’ ‘||filter_predicates
,'("’||alias||’"\.|[^.]|^)"([A-Z0-9#_$]+)"([^.]|$)’
,1
,level
,’i’,2
),’ ‘)
col
from dual
connect by
level<=regexp_count(
access_predicates||’ ‘||filter_predicates
,'("’||alias||’"\.|[^.]|^)"([A-Z0-9#_$]+)"([^.]|$)’
)
)
))
columns cols varchar2(400) path ‘.’
)(+)
order by total_by_cols desc, table_name, cnt desc
)
select
table_name
,cols
,sum(cnt)over(partition by table_name,cols) total_by_cols
,access_predicates
,filter_predicates
,cnt
from cols
where rownum<=50
order by total_by_cols desc, table_name, cnt desc;
[/sourcecode]

[collapse]

As you can see it shows top 50 predicates and their columns for last 3 hours. Despite the fact that ASH stores just sampled data, its results are representative enough for high-load databases.
Just few details:

Column “COLS” shows “search columns”, and total_by_cols – their number of occurrences
I think it’s obvious, that this info is not unambiguous marker of the problem, because for example few full table scans can misrepresent the statistics, so sometimes you will need to analyze such queries deeper (v$sqlstats,dba_hist_sqlstat)
We need to group data by OBJECT_ALIAS within SQL_ID and plan_hash_value, because in case of index access with lookup to table(“table access by rowid”) some predicates are in the row with index access and others are in the row with table access.

Depending on the needs, we can modify this query to analyze ASH data by different dimensions, for example with additional analysis of partitioning or wait events.

Dynamic Range Segmentation – Interval Quantization

What is Interval Quantization?

Creating a SQL Macro for Interval Splitting

Updating the Domain Index

1. Partitioning Changes in ODCIIndexCreate_pr

2. Initial Data Population

3. Updating ODCIIndexInsert_pr

4. Query Generation in ODCIIndexStart_Pr

Results

Conclusion

1. Creating the Function and Operator

2. Creating the idx_range_date_pkg Package

3. Creating the idx_range_date_type

Key Features of idx_range_date_type

4. Creating the INDEXTYPE

5. Adding a Virtual Generated Column

6. Creating the Index

7. Testing the Index

Summary

Optimizing the Query

Simplifying with a SQL Macro

Conclusion and Next Steps

Defining the Problem

Solution Approaches

1. Simple Composite Indexes

1.1 Index on (beg_date, end_date)

1.2 Index on (end_date, beg_date)

Adding an Upper Bound for end_date

2. Partitioning + Composite Indexes

Example: Partitioned Table by END_DATE

1. Partitioning Changes in `ODCIIndexCreate_pr`

3. Updating `ODCIIndexInsert_pr`

4. Query Generation in `ODCIIndexStart_Pr`

2. Creating the `idx_range_date_pkg` Package

3. Creating the `idx_range_date_type`

Key Features of `idx_range_date_type`

4. Creating the `INDEXTYPE`

1.1 Index on `(beg_date, end_date)`

1.2 Index on `(end_date, beg_date)`

Adding an Upper Bound for `end_date`

Example: Partitioned Table by `END_DATE`