Saturday, June 30, 2007

Partitioned External Tables

If you ever have the need of accessing multiple similar, external files of differing sizes, and at the same time want to ensure that only those files are accessed that are actually required, here is an approach that could be interesting to you.

Before Oracle 8 introduced real partitioned tables so called "Partition Views" were the only option for support of table partitioning. It's quite interesting that the even the Oracle 10gR2 optimizer still supports this feature which has been deprecated long time ago. You can see this by looking at the execution plan shown below: It contains a line mentioning a "UNION-ALL PARTITION" operation which means that the optimizer has noticed that this is not a usual UNION ALL operator but is meant to be used as "Partition View" and therefore will be treated differently resp. special rules apply.

Using "Partition Views" you can add a kind of pseudo partitioning capabilities to external tables. This might be useful if you have external files that you want to treat similarly within Oracle but you frequently need to access only a part and not all of them.

I came recently across such a requirement where "external" feeds of very different sizes (starting from a few hundred records up to several millions) but of same layout should be loaded efficiently using a generic ETL process.

One potential solution I found should be shown here. The sample script creates three external tables. Important is that all of them have the same number and type of columns. This is one of the requirements that have to met in order to use the "Partition View" functionality. The interesting thing about external tables in this context is that the external files could be even of different physical layout given that the resulting Oracle column and datatype definition match.

On top of each of these three external tables a view is created that introduces the "partition" criteria of the underlying table. Finally the "Partition View" is created that unions all the partitions together using the "UNION ALL" operator. For the curious, the separate views on the external tables are not required, the "Partition View" can reference directly the external tables which might lower the maintenance effort if you need to work with many partitions. Another interesting thing about "Partition Views" is that you can create partition schemas that are impossible to implement using "real" partitioning, e.g. you could specify overlapping ranges in the "where" clauses of the views and the optimizer is then supposed to able to identify which of the underlying tables it needs to visit in order to fulfill the request.

The following sample shows further that you can also simulate a kind of list partitioning. If you check carefully the resulting execution plan you can see that the artificial filter expressions introduced by the optimizer ("NULL IS NOT NULL") efficiently prevent access of the table related to the filter expression. If you create the external table using the "logfile" option this can be proven very easily as Oracle appends (or creates) the logfile each time it accesses the external table. In this example you'll notice that only a single logfile is going to be created, which proves that the filter expression works as desired. It would be even nicer if the execution plan would show something like "1" or "0" rows as cardinality of the tables omitted. That would make it more obvious.

Using this approach you can handle all the external tables using the single "Partition View". If you apply suitable filter clauses when accessing the view the optimizer can efficiently prune the number of the underlying tables visited.

So in summary all the pro's and con's of "Partition Views" apply to this solution. One of con's is that you need to maintain the view definition whenever you want to modify the partition scheme, e.g. adding, dropping or merging/splitting a partition. In case you miss to keep the view definition in sync with the actual underlying tables your view will return incomplete or wrong data.

Among the pro's of the solution is that another feature can be implemented that unfortunately is not possible using real partitioning: You can mark each individual "partition" as "parallel" or "noparallel" as required. So small chunks can be accessed serially whereas large external tables could be created using the "parallel" option.

Here comes the script illustrating the solution and the corresponding required steps. It has been tested using Oracle 10.2.0.3.0 Enterprise Edition on Windows XP (32bit):

--ִcreateִanyִtextִasִfileִ"test.txt" in
[your_test_directory_goes_here]

CREATEִORִREPLACEִDIRECTORY
DATA_DIRִAS
'[your_test_directory_goes_here]';

dropִtableִext_table_test1ִpurge;

dropִtableִext_table_test2ִpurge;

dropִtableִext_table_test3ִpurge;

CREATEִTABLEִEXT_TABLE_TEST1
(
ִִTEXTִִVARCHAR2(4000ִBYTE)
)
ORGANIZATIONִEXTERNAL
ִִ(ִִTYPEִORACLE_LOADER
ִִִִִDEFAULTִDIRECTORYִDATA_DIR
ִִִִִACCESSִPARAMETERS
ִִִִִִִ(ִrecordsִdelimitedִbyִnewline
ִִִִnobadfile
ִִִִnodiscardfile
ִִִִlogfileִ'ext_table_test1.log'
ִִִִ)
ִִִִִLOCATIONִ(DATA_DIR:'test.txt')
ִִ)
REJECTִLIMITִUNLIMITED
NOPARALLEL;

CREATEִTABLEִEXT_TABLE_TEST2
(
ִִTEXTִִVARCHAR2(4000ִBYTE)
)
ORGANIZATIONִEXTERNAL
ִִ(ִִTYPEִORACLE_LOADER
ִִִִִDEFAULTִDIRECTORYִDATA_DIR
ִִִִִACCESSִPARAMETERS
ִִִִִִִ(ִrecordsִdelimitedִbyִnewline
ִִִִnobadfile
ִִִִnodiscardfile
ִִִִlogfileִ'ext_table_test2.log'
ִִִִ)
ִִִִִLOCATIONִ(DATA_DIR:'test.txt')
ִִ)
REJECTִLIMITִUNLIMITED
NOPARALLEL;

CREATEִTABLEִEXT_TABLE_TEST3
(
ִִTEXTִִVARCHAR2(4000ִBYTE)
)
ORGANIZATIONִEXTERNAL
ִִ(ִִTYPEִORACLE_LOADER
ִִִִִDEFAULTִDIRECTORYִDATA_DIR
ִִִִִACCESSִPARAMETERS
ִִִִִִִ(ִrecordsִdelimitedִbyִnewline
ִִִִnobadfile
ִִִִnodiscardfile
ִִִִlogfileִ'ext_table_test3.log'
ִִִִ)
ִִִִִLOCATIONִ(DATA_DIR:'test.txt')
ִִ)
REJECTִLIMITִUNLIMITED
NOPARALLEL;

createִorִreplaceִviewִv_ext_table_test1ִasִselectִ*ִfromִext_table_test1ִwhereִtextִ=ִ'ABC';

createִorִreplaceִviewִv_ext_table_test2ִasִselectִ*ִfromִext_table_test2ִwhereִtextִ=ִ'DEF';

createִorִreplaceִviewִv_ext_table_test3ִasִselectִ*ִfromִext_table_test3ִwhereִtextִ=ִ'GHI';

createִorִreplaceִviewִv_ext_table_allִas
selectִ*ִfromִv_ext_table_test1
unionִall
selectִ*ִfromִv_ext_table_test2
unionִall
selectִ*ִfromִv_ext_table_test3;

/*ִaboveִcanִbeִmergedִintoִoneִsingleִview
CREATEִORִREPLACEִVIEWִv_ext_table_allִ(
ִִtext
)ִAS
selectִ"TEXT"ִfromִext_table_test1ִwhereִtextִ=ִ'ABC'
unionִall
selectִ"TEXT"ִfromִext_table_test2ִwhereִtextִ=ִ'DEF'
unionִall
selectִ"TEXT"ִfromִext_table_test3ִwhereִtextִ=ִ'GHI'
/
*/

selectִ*ִfromִv_ext_table_allִwhereִtextִ=ִ'DEF';

--ִTheִplan
--
--ִִPlanִhashִvalue:ִ3090276507
--
---------------------------------------------------------------------------------------------------
--|ִIdִִ|ִOperationִִִִִִִִִִִִִִִִִִִִִ|ִNameִִִִִִִִִִִִ|ִRowsִִ|ִBytesִ|ִCostִ(%CPU)|ִTimeִִִִִ|
---------------------------------------------------------------------------------------------------
--|ִִִ0ִ|ִSELECTִSTATEMENTִִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִִִ96ִ|ִִִ187K|ִִִִ68ִִִ(2)|ִ00:00:01ִ|
--|ִִִ1ִ|ִִVIEWִִִִִִִִִִִִִִִִִִִִִִִִִ|ִV_EXT_TABLE_ALLִ|ִִִִ96ִ|ִִִ187K|ִִִִ68ִִִ(2)|ִ00:00:01ִ|
--|ִִִ2ִ|ִִִUNION-ALLִPARTITIONִִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִִִִִִ|ִִִִִִִ|ִִִִִִִִִִִִ|ִִִִִִִִִִ|
--|*ִִ3ִ|ִִִִFILTERִִִִִִִִִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִִִִִִ|ִִִִִִִ|ִִִִִִִִִִִִ|ִִִִִִִִִִ|
--|*ִִ4ִ|ִִִִִEXTERNALִTABLEִACCESSִFULL|ִEXT_TABLE_TEST1ִ|ִִִִ82ִ|ִִִ160K|ִִִִ24ִִִ(0)|ִ00:00:01ִ|
--|*ִִ5ִ|ִִִִEXTERNALִTABLEִACCESSִFULLִ|ִEXT_TABLE_TEST2ִ|ִִִִ82ִ|ִִִ160K|ִִִִ24ִִִ(0)|ִ00:00:01ִ|
--|*ִִ6ִ|ִִִִFILTERִִִִִִִִִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִִִִִִ|ִִִִִִִ|ִִִִִִִִִִִִ|ִִִִִִִִִִ|
--|*ִִ7ִ|ִִִִִEXTERNALִTABLEִACCESSִFULL|ִEXT_TABLE_TEST3ִ|ִִִִ82ִ|ִִִ160K|ִִִִ24ִִִ(0)|ִ00:00:01ִ|
---------------------------------------------------------------------------------------------------
--
--PredicateִInformationִ(identifiedִbyִoperationִid):
-----------------------------------------------------
--
--ִִִ3ִ-ִfilter(NULLִISִNOTִNULL)
--ִִִ4ִ-ִfilter("TEXT"='ABC')
--ִִִ5ִ-ִfilter("TEXT"='DEF')
--ִִִ6ִ-ִfilter(NULLִISִNOTִNULL)
--ִִִ7ִ-ִfilter("TEXT"='GHI')