Oracle SQL
  • LICENSE

External Tables Part 1 – Project Columns All vs Referenced

Posted on October 25, 2017 by Roger MacNicol Posted in External tables, oracle, SmartScan

I normally blog about table scans on Oracle native data but Oracle also supports a wide variety of features for scanning external tables and I need to cover these too. One of the ways I learn new things is being sent a bug report and saying to myself “Oh! I didn’t know you could do that”. So today I’m going to start with the grammar:

Alter Table <xt> Project Columns [ All | Referenced ]  

This DDL changes the value in the Property column displayed in user_external_tables:

SQL> select property
2    from user_external_tables
3    where table_name='C_ET';

PROPERTY
----------
ALL       

Here we have an external table defined using the legacy driver ORACLE_LOADER. This driver defaults to projecting all the columns in the base table rather than just those needed to satisfy the query (i.e. the referenced columns) and discarding rows that have data conversion issues up to the reject limit. 

So for example we have a DATE column in our external table that contains dirty data that won’t convert cleanly to Oracle internal dates using the supplied mask we can either import it as a VARCHAR2 to not lose values or import it as a date but lose rows even on queries that don’t need that date column. We can change the behaviour to only project the referenced columns by 

SQL> alter table c_et project column referenced;
 
Table altered.
 
SQL> 
SQL> select property 
2    from user_external_tables
3    where table_name='C_ET';
 
PROPERTY   
----------
REFERENCED 

The driver will now ignore unreferenced columns and if the date column is not used we will get all the rows in the external data and the query will run faster since datatype conversion is expensive but we may get an inconsistent number of rows depending on which columns are used in any given query. This is OK if the DBA knows a priori that there are no conversion errors or if they are willing to live with inconsistency in the number of rows returned.

The big data drivers such as ORACLE_HIVE have a different default behaviour which is to only project referenced columns and to replace data with conversion errors with NULL values and i.e. they default to returning a consistent number of rows with best performance.

Take away: In order to get consistency and fast table scan performance with ORACLE_LOADER,  the trick is to define the external table with the ‘convert_error store_null’ option and switch to ‘Project Column Referenced’. For example:

CREATE TABLE "T_XT"
(
  c0 varchar(10),
  c1 varchar(10)
)
ORGANIZATION external
(
  TYPE oracle_loader
  DEFAULT DIRECTORY DMPDIR
  ACCESS PARAMETERS
  (
    RECORDS DELIMITED BY NEWLINE
    CHARACTERSET  al32utf8
    ESCAPE
    FIELDS CSV WITH EMBEDDED terminated by ',' enclosed by '|'
    REJECT ROWS WITH ALL NULL FIELDS
    nullif equal blanks
    truncate_columns
    convert_error store_null
    (
      "C0" CHAR(10),
      "C1" CHAR(10)
    )
  )
  location
  (
    'simple.dat'
  )
) REJECT LIMIT UNLIMITED
parallel 4;

If you want more information on handling conversion errors when scanning an external table please check the Utilities Guide section on each driver type

external tables oracle Roger MacNicol SmartScan
« PL/SQL functions: Iterate and keys for associative arrays
Random thoughts on block sizes »
photo Sayan Malakshinov

Oracle ACE Pro Oracle ACE Pro

DEVVYOracle Database Developer Choice Award winner

Oracle performance tuning expert

UK / Cambridge

LinkedIn   Twitter
sayan@orasql.org

Recent Posts

  • CBO and Partial indexing
  • Slow index access “COL=:N” where :N is NULL
  • Where does the commit or rollback happen in PL/SQL code?
  • :1 and SP2-0553: Illegal variable name “1”.
  • ORA exceptions that can’t be caught by exception handler

Recent Comments

  • Oracle SGA 값을 증가 시킬 때 발생 장애 원인 – DBA의 정석 on Example of controlling “direct path reads” decision through SQL profile hints (index_stats/table_stats)
  • Oracle SQL | Oracle diagnostic events — Cheat sheet on Where does the commit or rollback happen in PL/SQL code?
  • Functions & Subqueries | Oracle Scratchpad on Deterministic function vs scalar subquery caching. Part 3
  • Materialized views state turns into compilation_error after refresh - kranar.top - Answering users questions... on Friday prank: select from join join join
  • Exadata Catalogue | Oracle Scratchpad on When bloggers get it wrong – part 1
  • Exadata Catalogue | Oracle Scratchpad on Serial Scans failing to offload
  • lateral join – decorrelation gone wrong – svenweller on Lateral view decorrelation(VW_DCL) causes wrong results with rownum
  • 255 column catalogue | Oracle Scratchpad on Intra-block row chaining optimization in 12.2
  • 255 column catalogue | Oracle Scratchpad on row pieces, 255 columns, intra-block row chaining in details
  • opt_estimate catalogue | Oracle Scratchpad on Correct syntax for the table_stats hint

Blogroll

  • Alex Fatkulin
  • Alexander Anokhin
  • Andrey Nikolaev
  • Charles Hooper
  • Christian Antognini
  • Coskan Gundogar
  • David Fitzjarrell
  • Igor Usoltsev
  • Jonathan Lewis
  • Karl Arao
  • Mark Bobak
  • Martin Bach
  • Martin Berger
  • Neil Chandler
  • Randolf Geist
  • Richard Foote
  • Riyaj Shamsudeen
  • Tanel Poder
  • Timur Akhmadeev
  • Valentin Nikotin

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
©Sayan Malakshinov. Oracle SQL