One of the most common and enduring challenges in database management is performing efficient interval searches, particularly for date intervals such as: WHERE :dt BETWEEN beg_date AND end_date
.
In this series of articles, I will explore various strategies for optimizing such searches. We’ll delve into well-known standard approaches, analyze their limitations, and introduce my custom method—a method I promised to share several years ago, but I had postponed writing about it because the topic’s complexity seemed daunting, requiring a deep dive into the nuances of the data itself (e.g., open intervals, extreme values, data distribution, and skew). However, after receiving yet another question about it recently, I realized that I could no longer delay. Even if it means addressing some of the finer details in later parts, it’s time to start sharing this method in manageable steps.
Defining the Problem
In many applications involving historical data, a common modeling approach is SCD (Slowly Changing Dimension) Type 2 (reference). This method often uses columns such as begin_date
and end_date
to represent the validity period of each record.
To find records that are valid at a specific point in time, queries often use predicates like:WHERE :dt BETWEEN beg_date AND end_date
The challenge lies in finding a universal and efficient method to execute such queries.
Solution Approaches
Let’s begin by creating a test table and generating sample data for evaluation:
create table test_table(
beg_date date
,end_date date
,padding varchar2(10)
);
declare
procedure p_insert(
start_date date
,end_date date
,step_minutes number
,duration_minutes number
) is
begin
insert/*+ append */ into test_table(beg_date,end_date,padding)
select
start_date + n * numtodsinterval(step_minutes,'minute')
,start_date + n * numtodsinterval(step_minutes,'minute') + numtodsinterval(duration_minutes,'minute')
,'0123456789'
from xmltable('0 to xs:integer(.)'
passing ceil( (end_date - start_date)*24*60/step_minutes)
columns n int path '.'
);
commit;
end;
begin
-- 5 min intervals every 5 minutes: 00:00-00:15, 00:05-00:20,etc:
--p_insert(date'2000-01-01',sysdate, 5, 5);
-- 5 min intervals every 5 minutes starting from 00:02 : 00:02-00:07, 00:07-00:12,etc
p_insert(date'2000-01-01'+interval'2'minute,sysdate, 5, 5);
-- 15 min intervals every 5 minutes: 00:00-00:15, 00:05-00:20,etc:
p_insert(date'2000-01-01',sysdate, 5, 15);
-- 30 min intervals every 15 minutes: 00:00-00:30, 00:15-00:45,etc:
p_insert(date'2000-01-01',sysdate, 15, 30);
-- 1 hour intervals every 15 minutes: 00:00-01:00, 00:15-01:15,etc:
p_insert(date'2000-01-01',sysdate, 15, 60);
-- 2 hour intervals every 20 minutes: 00:00-02:00, 00:20-02:00,etc:
p_insert(date'2000-01-01',sysdate, 20, 120);
-- 7 days intervals every 60 minutes:
p_insert(date'2000-01-01',sysdate, 60, 7*24*60);
-- 30 days intervals every 1 hour:
p_insert(date'2000-01-01',sysdate, 60, 30*24*60);
-- 120 days intervals every 7 days:
p_insert(date'2000-01-01',sysdate, 7*24*60, 120*24*60);
-- 400 days intervals every 30 days:
p_insert(date'2000-01-01',sysdate, 30*24*60, 400*24*60);
end;
/
We’ve got a table with 10mln rows with different date intervals:
SQL> select count(*),min(beg_date),max(end_date) from test_table;
COUNT(*) MIN(BEG_DATE) MAX(END_DATE)
---------- ------------------- -------------------
10723261 2000-01-01 00:00:00 2026-01-24 00:00:00
1. Simple Composite Indexes
1.1 Index on (beg_date, end_date)
The most straightforward approach is to create a composite index on (beg_date, end_date)
. However, even at first glance, it’s clear that this method has significant inefficiencies.
When we use a predicate like :dt BETWEEN beg_date AND end_date
, it breaks down into two sub-predicates:
Access Predicate: beg_date <= :dt
This is used for index access since beg_date
is the leading column in the index. However, the query will need to scan and evaluate all index entries that satisfy this condition.
Filter Predicate: :dt <= end_date
This acts as a filter on the results from the access predicate.
As the dataset grows, both beg_date
and end_date
values increase over time. Consequently, because the access predicate (beg_date <= :dt
) is used to locate potential matches, the query will scan an ever-growing portion of the index.
1.2 Index on (end_date, beg_date)
This is one of the most widely adopted approaches. By simply rearranging the order of columns in the index, placing end_date
first, we can achieve significantly better performance in most cases.
Why? Queries tend to target data closer to the current time, and much less frequently target records from far in the past. By indexing on end_date
first, the query engine can more effectively narrow down the relevant portion of the index.
Let’s create the indexes and assess their performance:
create index ix_beg_end on test_table(beg_date,end_date);
create index ix_end_beg on test_table(end_date,beg_date);
select segment_name,blocks,bytes/1024/1024 as mbytes
from user_segments
where segment_name in ('IX_BEG_END','IX_END_BEG','TEST_TABLE');
SEGMENT_NAME BLOCKS MBYTES
-------------------- ---------- ----------
IX_BEG_END 40960 320
IX_END_BEG 40832 319
TEST_TABLE 48128 376
Let’s query the records valid 100 days ago using the (beg_date, end_date)
index:
SQL> select/*+ index(test_table (beg_date,end_date)) */ count(*),min(beg_date),max(end_date) from test_table where sysdate-100 between beg_date and end_date;
COUNT(*) MIN(BEG_DATE) MAX(END_DATE)
---------- ------------------- -------------------
935 2023-08-28 00:00:00 2025-09-26 00:00:00
SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last'));
Plan hash value: 1056805589
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 40375 (100)| | 1 |00:00:00.79 | 40200 |
| 1 | SORT AGGREGATE | | 1 | 1 | 16 | | | 1 |00:00:00.79 | 40200 |
|* 2 | INDEX RANGE SCAN| IX_BEG_END | 1 | 28472 | 444K| 40375 (1)| 00:00:02 | 935 |00:00:00.79 | 40200 |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("END_DATE">=SYSDATE@!-100 AND "BEG_DATE"<=SYSDATE@!-100)
filter("END_DATE">=SYSDATE@!-100)
As seen, the query required 40,200 logical reads, almost the entire index, which contains 40,960 blocks.
Now, let’s query the same data using the (end_date, beg_date)
index:
SQL> select count(*),min(beg_date),max(end_date) from test_table where sysdate-100 between beg_date and end_date;
COUNT(*) MIN(BEG_DATE) MAX(END_DATE)
---------- ------------------- -------------------
935 2023-08-28 00:00:00 2025-09-26 00:00:00
SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last'));
Plan hash value: 416972780
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 450 (100)| 1 |00:00:00.01 | 453 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 453 |
|* 2 | INDEX RANGE SCAN| IX_END_BEG | 1 | 28472 | 450 (1)| 935 |00:00:00.01 | 453 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("END_DATE">=SYSDATE@!-100 AND "END_DATE" IS NOT NULL)
filter("BEG_DATE"<=SYSDATE@!-100)
Using this index required only 453 logical reads, a dramatic improvement compared to the 40,200 reads with the first index.
Adding an Upper Bound for end_date
To illustrate the importance of having both upper and lower bounds for effective range queries, let’s further restrict the query with end_date < SYSDATE - 70
:
SQL> select count(*),min(beg_date),max(end_date) from test_table where sysdate-100 between beg_date and end_date and end_date<sysdate-70;
COUNT(*) MIN(BEG_DATE) MAX(END_DATE)
---------- ------------------- -------------------
910 2023-08-28 00:00:00 2024-10-08 02:00:00
SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last'));
Plan hash value: 3937277202
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | Cost (%CPU)| A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 136 (100)| 1 |00:00:00.01 | 137 |
| 1 | SORT AGGREGATE | | 1 | | 1 |00:00:00.01 | 137 |
|* 2 | FILTER | | 1 | | 910 |00:00:00.01 | 137 |
|* 3 | INDEX RANGE SCAN| IX_END_BEG | 1 | 136 (0)| 910 |00:00:00.01 | 137 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYSDATE@!-70>SYSDATE@!-100)
3 - access("END_DATE">=SYSDATE@!-100 AND "END_DATE"<SYSDATE@!-70 AND "BEG_DATE"<=SYSDATE@!-100)
filter("BEG_DATE"<=SYSDATE@!-100)
We retrieved nearly all required records (910 out of 935), but the number of logical I/O operations (LIO) dropped by more than threefold.
To illustrate the inherent limitations of our current indexing strategies, let’s simplify the scenario. Suppose we have a table of integer intervals (START, END)
containing 10 million records: (0,1)
, (1,2)
, (2,3)
, …, (9999999, 10000000)
. When searching for a record where 5000000 BETWEEN START AND END
, regardless of whether we use an index on (START, END)
or (END, START)
, we would have to scan approximately half of the index. This clearly demonstrates that neither of these indexes can serve as a universal solution; under certain conditions, both indexes become inefficient.
Let’s illustrate this issue using our test table. We’ll select a date roughly in the middle of our dataset – date’2012-02-01′ – and examine the performance of both indexes.
First, we’ll test the query using the (beg_date, end_date)
index:
SQL> select/*+ index(test_table (beg_date,end_date)) */ count(*),min(beg_date),max(end_date) from test_table where date'2012-02-01' between beg_date and end_date;
COUNT(*) MIN(BEG_DATE) MAX(END_DATE)
---------- ------------------- -------------------
943 2011-01-03 00:00:00 2013-03-03 00:00:00
SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last -alias -projection'));
Plan hash value: 1056805589
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)|| A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 19355 (100)|| 1 |00:00:00.45 | 19680 |
| 1 | SORT AGGREGATE | | 1 | 1 | || 1 |00:00:00.45 | 19680 |
|* 2 | INDEX RANGE SCAN| IX_BEG_END | 1 | 2783K| 19355 (1)|| 943 |00:00:00.45 | 19680 |
--------------------------------------------------------------------------------------------------------
The query required almost 20,000 LIO operations, a significant portion of the total index size. Next, we’ll perform the same query using the (end_date, beg_date)
index:
select/*+ index(test_table (end_date,beg_date)) */ count(*),min(beg_date),max(end_date) from test_table where date'2012-02-01' between beg_date and end_date;
COUNT(*) MIN(BEG_DATE) MAX(END_DATE)
---------- ------------------- -------------------
943 2011-01-03 00:00:00 2013-03-03 00:00:00
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 20929 (100)| 1 |00:00:00.38 | 20973 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.38 | 20973 |
|* 2 | INDEX RANGE SCAN| IX_END_BEG | 1 | 655K| 20929 (1)| 943 |00:00:00.38 | 20973 |
-------------------------------------------------------------------------------------------------------
Similarly, this query also required approximately 20,000 LIO operations, illustrating that both indices suffer from similar inefficiencies for this type of query.
The high number of logical reads in both cases highlights that neither index provides an efficient solution for queries with dates in the middle of the data range. The database engine must scan a large portion of the index to find the matching records, resulting in increased I/O and slower query performance, especially when the search value lies in the middle of the data range.
2. Partitioning + Composite Indexes
This approach is far less common but offers significant advantages. In the previous examples with composite indexes, the predicate on the second column of the index did not help reduce the number of scanned index entries. However, by partitioning the table on this second column, we can leverage partition pruning to exclude irrelevant partitions, significantly reducing the scan scope.
Example: Partitioned Table by END_DATE
To demonstrate, let’s create a partitioned table using the same data as in the previous example, partitioned by END_DATE
on a yearly interval:
create table test_table_part_1(
beg_date date
,end_date date
,rid rowid
)
partition by range(end_date) interval (numtoyminterval(1,'year'))
(
partition part_01 values less than (date'2000-01-01')
);
insert/*+append parallel(4) */ into test_table_part_1
select beg_date,end_date,rowid from test_table;
create index ix_tt_part_local on test_table_part_1(beg_date,end_date) local;
call dbms_stats.gather_table_stats('','test_table_part_1');
This results in 28 partitions:
SQL> select partition_name,partition_position,blevel,leaf_blocks,num_rows from user_ind_partitions where index_name='IX_TT_PART_LOCAL';
PARTITION_NAME PARTITION_POSITION BLEVEL LEAF_BLOCKS NUM_ROWS
-------------- ------------------ ---------- ----------- ----------
PART_01 1 0 0 0
SYS_P8333 2 2 1621 429547
SYS_P8341 3 2 1621 429304
SYS_P8348 4 2 1621 429304
SYS_P8353 5 2 1621 429304
SYS_P8355 6 2 1625 430480
SYS_P8332 7 2 1621 429304
SYS_P8335 8 2 1621 429305
SYS_P8331 9 2 1621 429305
SYS_P8336 10 2 1625 430480
SYS_P8338 11 2 1621 429304
SYS_P8340 12 2 1621 429304
SYS_P8343 13 2 1621 429304
SYS_P8345 14 2 1625 430481
SYS_P8347 15 2 1621 429305
SYS_P8352 16 2 1621 429304
SYS_P8350 17 2 1621 429304
SYS_P8351 18 2 1625 430480
SYS_P8334 19 2 1621 429305
SYS_P8337 20 2 1621 429304
SYS_P8339 21 2 1621 429305
SYS_P8342 22 2 1625 430480
SYS_P8344 23 2 1621 429304
SYS_P8346 24 2 1621 429304
SYS_P8349 25 2 1621 429305
SYS_P8354 26 2 1561 413443
SYS_P8356 27 1 2 391
SYS_P8357 28 0 1 1
Let’s test the same query for the same DATE '2012-02-01'
using the partitioned table:
SQL> select/*+ index(t (beg_date,end_date)) */ count(*),min(beg_date),max(end_date) from test_table_part_1 t where date'2012-02-01' between beg_date and end_date;
COUNT(*) MIN(BEG_DATE) MAX(END_DATE)
---------- ------------------- -------------------
943 2011-01-03 00:00:00 2013-03-03 00:00:00
SQL> select * from table(dbms_xplan.display_cursor('','','all allstats last -alias -projection'));
Plan hash value: 1651658810
-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10259 (100)| | | 1 |00:00:00.01 | 183 |
| 1 | SORT AGGREGATE | | 1 | 1 | | | | 1 |00:00:00.01 | 183 |
| 2 | PARTITION RANGE ITERATOR| | 1 | 2783K| 10259 (1)| 14 |1048575| 943 |00:00:00.01 | 183 |
|* 3 | INDEX RANGE SCAN | IX_TT_PART_LOCAL | 15 | 2783K| 10259 (1)| 14 |1048575| 943 |00:00:00.01 | 183 |
-------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "BEG_DATE"<=TO_DATE(' 2012-02-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
filter("END_DATE">=TO_DATE(' 2012-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
As shown, this approach reduced the number of logical reads (LIO) to just 183, compared to 20,000 in the earlier examples. Partitioning the table on END_DATE
combined with a composite local index dramatically improves query performance by limiting the scan scope through partition pruning. Even in the worst-case scenario, the number of logical reads is orders of magnitude lower than with global composite indexes. This makes it a highly effective strategy for interval searches.
Next part: Interval Search: Part 2. Dynamic Range Segmentation – Simplified