Oracle SQL
  • LICENSE

External Tables Part 1 – Project Columns All vs Referenced

Posted on October 25, 2017 by Roger MacNicol Posted in External tables, oracle, SmartScan 2,032 Page views

I normally blog about table scans on Oracle native data but Oracle also supports a wide variety of features for scanning external tables and I need to cover these too. One of the ways I learn new things is being sent a bug report and saying to myself “Oh! I didn’t know you could do that”. So today I’m going to start with the grammar:

Alter Table <xt> Project Columns [ All | Referenced ]  

This DDL changes the value in the Property column displayed in user_external_tables:

SQL> select property
2    from user_external_tables
3    where table_name='C_ET';

PROPERTY
----------
ALL       

Here we have an external table defined using the legacy driver ORACLE_LOADER. This driver defaults to projecting all the columns in the base table rather than just those needed to satisfy the query (i.e. the referenced columns) and discarding rows that have data conversion issues up to the reject limit. 

So for example we have a DATE column in our external table that contains dirty data that won’t convert cleanly to Oracle internal dates using the supplied mask we can either import it as a VARCHAR2 to not lose values or import it as a date but lose rows even on queries that don’t need that date column. We can change the behaviour to only project the referenced columns by 

SQL> alter table c_et project column referenced;
 
Table altered.
 
SQL> 
SQL> select property 
2    from user_external_tables
3    where table_name='C_ET';
 
PROPERTY   
----------
REFERENCED 

The driver will now ignore unreferenced columns and if the date column is not used we will get all the rows in the external data and the query will run faster since datatype conversion is expensive but we may get an inconsistent number of rows depending on which columns are used in any given query. This is OK if the DBA knows a priori that there are no conversion errors or if they are willing to live with inconsistency in the number of rows returned.

The big data drivers such as ORACLE_HIVE have a different default behaviour which is to only project referenced columns and to replace data with conversion errors with NULL values and i.e. they default to returning a consistent number of rows with best performance.

Take away: In order to get consistency and fast table scan performance with ORACLE_LOADER,  the trick is to define the external table with the ‘convert_error store_null’ option and switch to ‘Project Column Referenced’. For example:

CREATE TABLE "T_XT"
(
  c0 varchar(10),
  c1 varchar(10)
)
ORGANIZATION external
(
  TYPE oracle_loader
  DEFAULT DIRECTORY DMPDIR
  ACCESS PARAMETERS
  (
    RECORDS DELIMITED BY NEWLINE
    CHARACTERSET  al32utf8
    ESCAPE
    FIELDS CSV WITH EMBEDDED terminated by ',' enclosed by '|'
    REJECT ROWS WITH ALL NULL FIELDS
    nullif equal blanks
    truncate_columns
    convert_error store_null
    (
      "C0" CHAR(10),
      "C1" CHAR(10)
    )
  )
  location
  (
    'simple.dat'
  )
) REJECT LIMIT UNLIMITED
parallel 4;

If you want more information on handling conversion errors when scanning an external table please check the Utilities Guide section on each driver type

external tables oracle Roger MacNicol SmartScan
« PL/SQL functions: Iterate and keys for associative arrays
Random thoughts on block sizes »
Page views: 2,032
photo Sayan Malakshinov

Oracle ACE Pro Oracle ACE Pro Alumni

DEVVYOracle Database Developer Choice Award winner

Oracle performance tuning expert

UK / Cambridge

LinkedIn   Twitter
sayan@orasql.org

Recent Posts

  • Oracle Telegram Bot
  • Partition Pruning and Global Indexes
  • Interval Search: Part 4. Dynamic Range Segmentation – interval quantization
  • Interval Search Series: Simplified, Advanced, and Custom Solutions
  • Interval Search: Part 3. Dynamic Range Segmentation – Custom Domain Index

Popular posts

Recent Comments

  • Oracle SQL | Interval Search: Part 4. Dynamic Range Segmentation – interval quantization on Interval Search: Part 3. Dynamic Range Segmentation – Custom Domain Index
  • Oracle SQL | Interval Search: Part 4. Dynamic Range Segmentation – interval quantization on Interval Search: Part 2. Dynamic Range Segmentation – Simplified
  • Oracle SQL | Interval Search: Part 4. Dynamic Range Segmentation – interval quantization on Interval Search: Optimizing Date Range Queries – Part 1
  • Oracle SQL | Interval Search Series: Simplified, Advanced, and Custom Solutions on Interval Search: Part 2. Dynamic Range Segmentation – Simplified
  • Oracle SQL | Interval Search: Part 2. Dynamic Range Segmentation – Simplified on Interval Search: Part 3. Dynamic Range Segmentation – Custom Domain Index

Blogroll

  • Alex Fatkulin
  • Alexander Anokhin
  • Andrey Nikolaev
  • Charles Hooper
  • Christian Antognini
  • Coskan Gundogar
  • David Fitzjarrell
  • Igor Usoltsev
  • Jonathan Lewis
  • Karl Arao
  • Mark Bobak
  • Martin Bach
  • Martin Berger
  • Neil Chandler
  • Randolf Geist
  • Richard Foote
  • Riyaj Shamsudeen
  • Tanel Poder
  • Timur Akhmadeev
  • Valentin Nikotin
©Sayan Malakshinov. Oracle SQL