I normally blog about table scans on Oracle native data but Oracle also supports a wide variety of features for scanning external tables and I need to cover these too. One of the ways I learn new things is being sent a bug report and saying to myself “Oh! I didn’t know you could do that”. So today I’m going to start with the grammar:
Alter Table <xt> Project Columns [ All | Referenced ]
This DDL changes the value in the Property column displayed in user_external_tables:
SQL> select property
2 from user_external_tables
3 where table_name='C_ET';
PROPERTY
----------
ALL
Here we have an external table defined using the legacy driver ORACLE_LOADER. This driver defaults to projecting all the columns in the base table rather than just those needed to satisfy the query (i.e. the referenced columns) and discarding rows that have data conversion issues up to the reject limit.
So for example we have a DATE column in our external table that contains dirty data that won’t convert cleanly to Oracle internal dates using the supplied mask we can either import it as a VARCHAR2 to not lose values or import it as a date but lose rows even on queries that don’t need that date column. We can change the behaviour to only project the referenced columns by
SQL> alter table c_et project column referenced;
Table altered.
SQL>
SQL> select property
2 from user_external_tables
3 where table_name='C_ET';
PROPERTY
----------
REFERENCED
The driver will now ignore unreferenced columns and if the date column is not used we will get all the rows in the external data and the query will run faster since datatype conversion is expensive but we may get an inconsistent number of rows depending on which columns are used in any given query. This is OK if the DBA knows a priori that there are no conversion errors or if they are willing to live with inconsistency in the number of rows returned.
The big data drivers such as ORACLE_HIVE have a different default behaviour which is to only project referenced columns and to replace data with conversion errors with NULL values and i.e. they default to returning a consistent number of rows with best performance.
Take away: In order to get consistency and fast table scan performance with ORACLE_LOADER, the trick is to define the external table with the ‘convert_error store_null’ option and switch to ‘Project Column Referenced’. For example:
CREATE TABLE "T_XT"
(
c0 varchar(10),
c1 varchar(10)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY DMPDIR
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
CHARACTERSET al32utf8
ESCAPE
FIELDS CSV WITH EMBEDDED terminated by ',' enclosed by '|'
REJECT ROWS WITH ALL NULL FIELDS
nullif equal blanks
truncate_columns
convert_error store_null
(
"C0" CHAR(10),
"C1" CHAR(10)
)
)
location
(
'simple.dat'
)
) REJECT LIMIT UNLIMITED
parallel 4;
If you want more information on handling conversion errors when scanning an external table please check the Utilities Guide section on each driver type