Sunday, July 26, 2015

12c Parallel Execution New Features: Parallel FILTER Subquery Evaluation - Part 1: Introduction

12c introduces another interesting new Parallel Execution feature - the parallel evaluation of FILTER subqueries. In pre-12c FILTER subqueries always had to be evaluated in the Query Coordinator. This had several consequences, in particular the data driving the FILTER subquery always had to flow through the Query Coordinator, and hence represented a forced serial execution part of a parallel execution plan. This limitation also meant that depending on the overall plan shape the parallel plan was possibly decomposed into multiple DFO trees, leading to other side effects I've outlined in some of my other publications already.

In 12c now the FILTER subquery can be evaluated in the Parallel Slaves, and the driving data no longer needs to be processed in the Query Coordinator. However, the resulting plan shape can be a little bit confusing. Let's have a look at a simple example:
create table t_1
compress
as
select  /*+ use_nl(a b) */
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_1', method_opt=>'for all columns size 1')

alter table t_1 parallel 4;

create index t_1_idx on t_1 (id) invisible;

explain plan for
select /*+ 
           --optimizer_features_enable('11.2.0.4')
        */ count(*) from
        t_1 t
        where exists (select /*+ no_unnest */ null from t_1 where t.id = t_1.id);

-- 11.2.0.4 plan shape with index invisible
----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |     6 |   440M  (2)| 04:47:04 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     6 |            |          |        |      |            |
|*  2 |   FILTER               |          |       |       |            |          |        |      |            |
|   3 |    PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   4 |     PX SEND QC (RANDOM)| :TQ20000 |  2000K|    11M|   221   (1)| 00:00:01 |  Q2,00 | P->S | QC (RAND)  |
|   5 |      PX BLOCK ITERATOR |          |  2000K|    11M|   221   (1)| 00:00:01 |  Q2,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| T_1      |  2000K|    11M|   221   (1)| 00:00:01 |  Q2,00 | PCWP |            |
|   7 |    PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   8 |     PX SEND QC (RANDOM)| :TQ10000 |     1 |     6 |   222   (2)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   9 |      PX BLOCK ITERATOR |          |     1 |     6 |   222   (2)| 00:00:01 |  Q1,00 | PCWC |            |
|* 10 |       TABLE ACCESS FULL| T_1      |     1 |     6 |   222   (2)| 00:00:01 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T_1" "T_1" WHERE "T_1"."ID"=:B1))
  10 - filter("T_1"."ID"=:B1)

-- 12.1.0.2 plan shape with index invisible
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |     1 |     6 |  1588M  (2)| 17:14:09 |        |      |            |
|   1 |  SORT AGGREGATE         |          |     1 |     6 |            |          |        |      |            |
|   2 |   PX COORDINATOR        |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)  | :TQ10000 |     1 |     6 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE      |          |     1 |     6 |            |          |  Q1,00 | PCWP |            |
|*  5 |      FILTER             |          |       |       |            |          |  Q1,00 | PCWC |            |
|   6 |       PX BLOCK ITERATOR |          |  2000K|    11M|   221   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| T_1      |  2000K|    11M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|*  8 |       TABLE ACCESS FULL | T_1      |     1 |     6 |   798   (2)| 00:00:01 |        |      |            |
-----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T_1" "T_1" WHERE "T_1"."ID"=:B1))
   8 - filter("T_1"."ID"=:B1)

-- 11.2.0.4 plan shape with index visible
----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |     6 |  5973K  (1)| 00:03:54 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     6 |            |          |        |      |            |
|*  2 |   FILTER               |          |       |       |            |          |        |      |            |
|   3 |    PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   4 |     PX SEND QC (RANDOM)| :TQ10000 |  2000K|    11M|   221   (1)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   5 |      PX BLOCK ITERATOR |          |  2000K|    11M|   221   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| T_1      |  2000K|    11M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|*  7 |    INDEX RANGE SCAN    | T_1_IDX  |     1 |     6 |     3   (0)| 00:00:01 |        |      |            |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T_1" "T_1" WHERE "T_1"."ID"=:B1))
   7 - access("T_1"."ID"=:B1)

-- 12.1.0.2 plan shape with index visible
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |     1 |     6 |  5973K  (1)| 00:03:54 |        |      |            |
|   1 |  SORT AGGREGATE         |          |     1 |     6 |            |          |        |      |            |
|   2 |   PX COORDINATOR        |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)  | :TQ10000 |     1 |     6 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE      |          |     1 |     6 |            |          |  Q1,00 | PCWP |            |
|*  5 |      FILTER             |          |       |       |            |          |  Q1,00 | PCWC |            |
|   6 |       PX BLOCK ITERATOR |          |  2000K|    11M|   221   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| T_1      |  2000K|    11M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|*  8 |       INDEX RANGE SCAN  | T_1_IDX  |     1 |     6 |     3   (0)| 00:00:01 |        |      |            |
-----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T_1" "T_1" WHERE "T_1"."ID"=:B1))
   8 - access("T_1"."ID"=:B1)

I've included two variations of the setup, one without available index for evaluating the FILTER subquery and one with index.

The pre-12c plan shape without index makes the former limitation particularly obvious: The FILTER operator is above the PX COORDINATOR and marked serial, and the table scan in the FILTER subquery gets parallelized as separate DFO tree (indicated among others by the two PX COORDINATOR operators), which means that each time this separate DFO tree starts, a separate set of Parallel Slave will be allocated/deallocated, adding possibly a lot of overhead to a probably already inefficient execution plan anyway - assuming the FILTER subquery needs to be evaluated many times.

In 12c the FILTER operator is marked parallel and the need for a separate DFO tree is gone. What might be confusing with this plan shape is that the operations of the FILTER subquery are not marked parallel. In my opinion this is misleading and should actually be marked parallel, because at runtime the operations will be performed by the Parallel Slaves, and in case of a Full Table Scan each slave will run the entire full table scan (so no PX ITERATOR for dividing the scan into chunks / granules), which is comparable to what happens when a parallel Nested Loop join runs or the new PQ_REPLICATE feature gets used - and in those cases the operations are marked parallel:
-- 11.2.0.4 / 12.1.0.2 plan shape with index invisible
-- and subquery unnested using NL SEMI join
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |     1 |    12 |   442M  (2)| 04:48:03 |        |      |            |
|   1 |  SORT AGGREGATE         |          |     1 |    12 |            |          |        |      |            |
|   2 |   PX COORDINATOR        |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)  | :TQ10000 |     1 |    12 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE      |          |     1 |    12 |            |          |  Q1,00 | PCWP |            |
|   5 |      NESTED LOOPS SEMI  |          |  2000K|    22M|   442M  (2)| 04:48:03 |  Q1,00 | PCWP |            |
|   6 |       PX BLOCK ITERATOR |          |       |       |            |          |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| T_1      |  2000K|    11M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|*  8 |       TABLE ACCESS FULL | T_1      |  2000K|    11M|   796   (2)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   8 - filter("T"."ID"="T_1"."ID")

Summary


So the operators of the FILTER subquery can now be run in the slaves, which is the main point of this feature, although being represented in a confusing way in the execution plan. Avoiding the potential decomposition into multiple DFO trees is another possible side effect. Decreased query duration should be possible if the evaluation of the FILTER subquery requires significant time and can now be run in the Parallel Slaves instead of serial execution through the Query Coordinator.

Note that depending on the plan shape and SQL features used, it's still possible that 12c reverts to the old serial FILTER subquery evaluation plan shape, so the new feature doesn't get used always.

There is more to say about this feature. In the next part of this instalment I'll focus on the different distribution methods possible with the new parallel FILTER operator - there is a new PQ_FILTER hint that allows controlling the distribution, but there are also some interesting points to make about how the optimizer seems to make its choice which distribution method to use automatically. In the examples shown here there's no separate distribution for the FILTER, by the way, but this can look differently, as I'll show in the next part.