Friday, December 7, 2012

DBMS_XPLAN.DISPLAY_CURSOR And Parallel Execution


Introduction


DBMS_XPLAN.DISPLAY_CURSOR can be used to get more insights into the actual resource consumption on execution plan operation level when using the GATHER_PLAN_STATISTICS hint (from 10g on), or increasing the STATISTICS_LEVEL parameter to ALL (on session level, on system level the overhead is probably prohibitive).

As soon as a SQL execution is done (either successfully, cancelled or with error) the corresponding extended data in the child cursor gets populated/updated and the additional information about the actual runtime profile can be accessed via V$SQL_PLAN_STATISTICS resp. V$SQL_PLAN_STATISTICS_ALL - this is what DISPLAY_CURSOR uses to populate the additional columns in the formatted output of the execution plan.

This works well for normal, serial execution where a single session performs the SQL execution and allows gathering extended information about the following statistics on execution plan line level:

- Actual cardinalities vs. estimated cardinalities
- Actual time spent on each operation (only reliable when using STATISTICS_LEVEL = ALL or setting "_rowsource_sample_freq" to 1 which can have a significant impact on runtime due to overhead)
- Logical I/O
- Physical I/O
- Memory usage of workareas
- TEMP space usage

One crucial information that is not available via this interface is how the actual time was spent (CPU vs. Wait Events).


Parallel Execution


However getting information about these Rowsource Statistics by default doesn't work well with Parallel Execution. When using the formatting option "ALLSTATS LAST" that is usually recommended to obtain the extended statistics for the last execution of the SQL statement you'll only see the statistics related to the work performed by the Query Coordinator, but no information related to the work performed by the Parallel Slaves.

Here is an example DBMS_XPLAN.DISPLAY_CURSOR output for a Parallel Execution using the "ALLSTATS LAST" option:

-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |      1 |        |      0 |00:02:41.57 |      10 |       |       |          |         |
|   1 |  COUNT                    |          |      1 |        |      0 |00:02:41.57 |      10 |       |       |          |         |
|*  2 |   FILTER                  |          |      1 |        |      0 |00:02:41.57 |      10 |       |       |          |         |
|   3 |    PX COORDINATOR         |          |      1 |        |   2000K|00:02:29.30 |      10 |       |       |          |         |
|   4 |     PX SEND QC (RANDOM)   | :TQ10002 |      0 |   2000K|      0 |00:00:00.01 |       0 |       |       |          |         |
|*  5 |      HASH JOIN BUFFERED   |          |      0 |   2000K|      0 |00:00:00.01 |       0 |    56M|  7910K|   14M (0)|   30720 |
|   6 |       PX RECEIVE          |          |      0 |   2000K|      0 |00:00:00.01 |       0 |       |       |          |         |
|   7 |        PX SEND HASH       | :TQ10000 |      0 |   2000K|      0 |00:00:00.01 |       0 |       |       |          |         |
|   8 |         PX BLOCK ITERATOR |          |      0 |   2000K|      0 |00:00:00.01 |       0 |       |       |          |         |
|*  9 |          TABLE ACCESS FULL| T4       |      0 |   2000K|      0 |00:00:00.01 |       0 |       |       |          |         |
|  10 |       PX RECEIVE          |          |      0 |   2000K|      0 |00:00:00.01 |       0 |       |       |          |         |
|  11 |        PX SEND HASH       | :TQ10001 |      0 |   2000K|      0 |00:00:00.01 |       0 |       |       |          |         |
|  12 |         PX BLOCK ITERATOR |          |      0 |   2000K|      0 |00:00:00.01 |       0 |       |       |          |         |
|* 13 |          TABLE ACCESS FULL| T2       |      0 |   2000K|      0 |00:00:00.01 |       0 |       |       |          |         |
-------------------------------------------------------------------------------------------------------------------------------------

You can partially work around this limitation by using the format option "ALLSTATS ALL" instead. This option means that the information provided by DBMS_XPLAN.DISPLAY_CURSOR is based on a different column set of V$SQL_PLAN_STATISTICS_ALL that aggregates across all executions of the SQL statement. For Parallel Exection the statistics representing the work performed by the Parallel Slaves will be added to these columns, so using "ALLSTATS ALL" includes that information.

However, you need to be careful, since this means that you can't distinguish between the last and previous executions of the same cursor. So if you execute the same cursor multiple times in parallel, "ALLSTATS ALL" will show statistics that cover all these executions. You can work around this in a test case scenario by deliberately modifying the SQL text, for example by using a corresponding comment, that leads to the creation of a separate, unique parent cursor. This way it is ensured that "ALLSTATS ALL" effectively only displays information related to the last execution, since there is only a single (parallel) execution of that cursor.

Here is again the same Parallel Execution as above, this time using the "ALLSTATS ALL" option:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   | Max-Tmp |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |      1 |        |       |   831 (100)|          |        |      |            |      0 |00:02:41.57 |      10 |      2 |      0 |       |       |          |         |
|   1 |  COUNT                    |          |      1 |        |       |            |          |        |      |            |      0 |00:02:41.57 |      10 |      2 |      0 |       |       |          |         |
|*  2 |   FILTER                  |          |      1 |        |       |            |          |        |      |            |      0 |00:02:41.57 |      10 |      2 |      0 |       |       |          |         |
|   3 |    PX COORDINATOR         |          |      1 |        |       |            |          |        |      |            |   2000K|00:02:29.30 |      10 |      2 |      0 |       |       |          |         |
|   4 |     PX SEND QC (RANDOM)   | :TQ10002 |      0 |   2000K|   223M|   831   (2)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|*  5 |      HASH JOIN BUFFERED   |          |      8 |   2000K|   223M|   831   (2)| 00:00:01 |  Q1,02 | PCWP |            |   2000K|00:16:51.21 |     815 |  29239 |  29202 |    56M|  7910K|     8/0/0|   30720 |
|   6 |       PX RECEIVE          |          |      8 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,02 | PCWP |            |   2000K|00:05:23.83 |       0 |      0 |      0 |       |       |          |         |
|   7 |        PX SEND HASH       | :TQ10000 |      0 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,00 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   8 |         PX BLOCK ITERATOR |          |      8 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,00 | PCWC |            |   2000K|00:05:44.20 |    4838 |   4735 |      0 |       |       |          |         |
|*  9 |          TABLE ACCESS FULL| T4       |    103 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,00 | PCWP |            |   2000K|00:04:50.92 |    4838 |   4735 |      0 |       |       |          |         |
|  10 |       PX RECEIVE          |          |      8 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,02 | PCWP |            |   2000K|00:07:16.22 |       0 |      0 |      0 |       |       |          |         |
|  11 |        PX SEND HASH       | :TQ10001 |      0 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,01 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|  12 |         PX BLOCK ITERATOR |          |      8 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,01 | PCWC |            |   2000K|00:02:05.78 |    4748 |   4228 |      0 |       |       |          |         |
|* 13 |          TABLE ACCESS FULL| T2       |    104 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,01 | PCWP |            |   2000K|00:01:29.86 |    4748 |   4228 |      0 |       |       |          |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Notice the difference - the work performed by the Parallel Slaves is (mostly) visible now. Apart from that the "ALL" formatting option added some columns that are not shown when using the "LAST" option, which you can customize using the more granular formatting options of DBMS_XPLAN.

Reading the output is not that simple as for serial executions, in particular because there is a mixture of wall-clock / DB Time time for the activities related to the Query Coordinator and the aggregated DB Time for the Parallel Slaves.

Furthermore the rule that applies to serial execution plans that the values for time / work are cumulative is not adhered to for Parallel Execution, at least not across Parallel Slave Sets / Table Queues and the Query Coordinator.


Multiple DFOs And Cross-Instance


However, depending on the exact details of the execution plan and the actual execution of the SQL statement, V$SQL_PLAN_STATISTICS_ALL respectively DISPLAY_CURSOR still might miss information about the Parallel Execution even when using "ALLSTATS ALL".

In particular the following two points are important to consider:

1. If the Parallel Execution is cross-instance (runs on multiple nodes of a RAC) then DBMS_XPLAN.DISPLAY_CURSOR will only show information about the work performed on the local instance, since it only gathers information from the local V$SQL_PLAN_STATISTICS_ALL dynamic performance view. DBMS_XPLAN.DISPLAY_CURSOR doesn't show the complete picture in such cases.

Here is again the same execution plan as above, this time executed cross instance on two nodes participating in the execution:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   | Max-Tmp |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |      1 |        |       |   831 (100)|          |        |      |            |      0 |00:01:42.95 |      10 |      0 |      0 |       |       |          |         |
|   1 |  COUNT                    |          |      1 |        |       |            |          |        |      |            |      0 |00:01:42.95 |      10 |      0 |      0 |       |       |          |         |
|*  2 |   FILTER                  |          |      1 |        |       |            |          |        |      |            |      0 |00:01:42.95 |      10 |      0 |      0 |       |       |          |         |
|   3 |    PX COORDINATOR         |          |      1 |        |       |            |          |        |      |            |   2000K|00:01:29.20 |      10 |      0 |      0 |       |       |          |         |
|   4 |     PX SEND QC (RANDOM)   | :TQ10002 |      0 |   2000K|   223M|   831   (2)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|*  5 |      HASH JOIN BUFFERED   |          |      4 |   2000K|   223M|   831   (2)| 00:00:01 |  Q1,02 | PCWP |            |   1000K|00:04:30.07 |       0 |  14601 |  14601 |    56M|  7910K|     4/0/0|   30720 |
|   6 |       PX RECEIVE          |          |      4 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,02 | PCWP |            |   1000K|00:01:17.90 |       0 |      0 |      0 |       |       |          |         |
|   7 |        PX SEND HASH       | :TQ10000 |      0 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,00 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   8 |         PX BLOCK ITERATOR |          |      4 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,00 | PCWC |            |   1038K|00:01:27.13 |    2585 |   2530 |      0 |       |       |          |         |
|*  9 |          TABLE ACCESS FULL| T4       |     55 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,00 | PCWP |            |   1038K|00:01:07.22 |    2585 |   2530 |      0 |       |       |          |         |
|  10 |       PX RECEIVE          |          |      4 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,02 | PCWP |            |   1000K|00:02:11.36 |       0 |      0 |      0 |       |       |          |         |
|  11 |        PX SEND HASH       | :TQ10001 |      0 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,01 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|  12 |         PX BLOCK ITERATOR |          |      4 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,01 | PCWC |            |   1011K|00:00:46.56 |    2402 |      0 |      0 |       |       |          |         |
|* 13 |          TABLE ACCESS FULL| T2       |     53 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,01 | PCWP |            |   1011K|00:00:26.53 |    2402 |      0 |      0 |       |       |          |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note how only "half" of the work is reported (except for the Query Coordinator work). When running DBMS_XPLAN.DISPLAY_CURSOR for the corresponding cursor on the second participating node, I get the other "half":

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   | Max-Tmp |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |      0 |        |       |   831 (100)|          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   1 |  COUNT                    |          |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|*  2 |   FILTER                  |          |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   3 |    PX COORDINATOR         |          |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   4 |     PX SEND QC (RANDOM)   | :TQ10002 |      0 |   2000K|   223M|   831   (2)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|*  5 |      HASH JOIN BUFFERED   |          |      4 |   2000K|   223M|   831   (2)| 00:00:01 |  Q1,02 | PCWP |            |    999K|00:04:30.16 |     215 |  14590 |  14570 |    56M|  7910K|     4/0/0|   30720 |
|   6 |       PX RECEIVE          |          |      4 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,02 | PCWP |            |    999K|00:01:17.35 |       0 |      0 |      0 |       |       |          |         |
|   7 |        PX SEND HASH       | :TQ10000 |      0 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,00 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   8 |         PX BLOCK ITERATOR |          |      4 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,00 | PCWC |            |    961K|00:01:29.64 |    2253 |   2205 |      0 |       |       |          |         |
|*  9 |          TABLE ACCESS FULL| T4       |     48 |   2000K|    11M|   180   (2)| 00:00:01 |  Q1,00 | PCWP |            |    961K|00:01:10.14 |    2253 |   2205 |      0 |       |       |          |         |
|  10 |       PX RECEIVE          |          |      4 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,02 | PCWP |            |    999K|00:01:51.26 |       0 |      0 |      0 |       |       |          |         |
|  11 |        PX SEND HASH       | :TQ10001 |      0 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,01 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|  12 |         PX BLOCK ITERATOR |          |      4 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,01 | PCWC |            |    988K|00:00:51.52 |    2346 |      0 |      0 |       |       |          |         |
|* 13 |          TABLE ACCESS FULL| T2       |     51 |   2000K|   211M|   647   (2)| 00:00:01 |  Q1,01 | PCWP |            |    988K|00:00:30.00 |    2346 |      0 |      0 |       |       |          |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2. If the parallel execution plan consists of multiple so called "Data Flow Operations" (DFOs, you can read more about those DFOs in my OTN mini series about Parallel Execution), indicated by multiple PX COORDINATOR operations, then these different DFOs will be represented by multiple child cursors at execution time. So each DFO ends up with a separate child cursor.

Since DBMS_XPLAN.DISPLAY_CURSOR cannot aggregate information across multiple child cursors the information displayed again will be incomplete in such cases.

You can run DISPLAY_CURSOR for each of the child cursor generated, but this doesn't give you the same level of information. Furthermore, depending on the version and actual circumstances, the additional child cursors might not inherit the corresponding rowsource statistics setting, so these child cursors might not even contain any additional information in V$SQL_PLAN_STATISTICS_ALL

Here is again a similar execution plan as above, this time using a parallel TEMP table transformation that automatically results in a separate DFO and therefore a separate child cursor at runtime. The execution in this case was using a degree of 2 and was single instance:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   | Max-Tmp |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                           |      1 |        |       |  1635 (100)|          |        |      |            |      0 |00:02:23.91 |     229 |      1 |    127 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION  |                           |      1 |        |       |            |          |        |      |            |      0 |00:02:23.91 |     229 |      1 |    127 |       |       |          |         |
|   2 |   PX COORDINATOR            |                           |      1 |        |       |            |          |        |      |            |      2 |00:00:34.99 |      33 |      0 |      0 |       |       |          |         |
|   3 |    PX SEND QC (RANDOM)      | :TQ10000                  |      0 |   2000K|    11M|   721   (2)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   4 |     LOAD AS SELECT          |                           |      0 |        |       |            |          |  Q1,00 | PCWP |            |      0 |00:00:00.01 |       0 |      0 |      0 |   256K|   256K|          |         |
|   5 |      PX BLOCK ITERATOR      |                           |      0 |   2000K|    11M|   721   (2)| 00:00:01 |  Q1,00 | PCWC |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|*  6 |       TABLE ACCESS FULL     | T4                        |      0 |   2000K|    11M|   721   (2)| 00:00:01 |  Q1,00 | PCWP |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   7 |   COUNT                     |                           |      1 |        |       |            |          |        |      |            |      0 |00:01:48.83 |      13 |      0 |      0 |       |       |          |         |
|*  8 |    FILTER                   |                           |      1 |        |       |            |          |        |      |            |      0 |00:01:48.83 |      13 |      0 |      0 |       |       |          |         |
|   9 |     PX COORDINATOR          |                           |      1 |        |       |            |          |        |      |            |   2000K|00:01:35.86 |      13 |      0 |      0 |       |       |          |         |
|  10 |      PX SEND QC (RANDOM)    | :TQ20002                  |      0 |   2000K|   236M|   914   (4)| 00:00:01 |  Q2,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|* 11 |       HASH JOIN BUFFERED    |                           |      2 |   2000K|   236M|   914   (4)| 00:00:01 |  Q2,02 | PCWP |            |   2000K|00:03:01.81 |     160 |  29295 |  29295 |    55M|  7910K|     2/0/0|     123K|
|  12 |        PX RECEIVE           |                           |      2 |   2000K|    24M|   254   (4)| 00:00:01 |  Q2,02 | PCWP |            |   2000K|00:01:07.27 |       0 |      0 |      0 |       |       |          |         |
|  13 |         PX SEND HASH        | :TQ20000                  |      0 |   2000K|    24M|   254   (4)| 00:00:01 |  Q2,00 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|  14 |          VIEW               |                           |      2 |   2000K|    24M|   254   (4)| 00:00:01 |  Q2,00 | PCWP |            |   2000K|00:01:34.53 |    3355 |   3328 |      0 |       |       |          |         |
|  15 |           PX BLOCK ITERATOR |                           |      2 |   2000K|    11M|   254   (4)| 00:00:01 |  Q2,00 | PCWC |            |   2000K|00:01:09.08 |    3355 |   3328 |      0 |       |       |          |         |
|* 16 |            TABLE ACCESS FULL| SYS_TEMP_0FD9D6603_158141 |     26 |   2000K|    11M|   254   (4)| 00:00:01 |  Q2,00 | PCWP |            |   2000K|00:00:43.39 |    3355 |   3328 |      0 |       |       |          |         |
|  17 |        PX RECEIVE           |                           |      2 |   2000K|   211M|   647   (2)| 00:00:01 |  Q2,02 | PCWP |            |   2000K|00:01:16.59 |       0 |      0 |      0 |       |       |          |         |
|  18 |         PX SEND HASH        | :TQ20001                  |      0 |   2000K|   211M|   647   (2)| 00:00:01 |  Q2,01 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|  19 |          PX BLOCK ITERATOR  |                           |      2 |   2000K|   211M|   647   (2)| 00:00:01 |  Q2,01 | PCWC |            |   2000K|00:00:47.97 |    4358 |      0 |      0 |       |       |          |         |
|* 20 |           TABLE ACCESS FULL | T2                        |     26 |   2000K|   211M|   647   (2)| 00:00:01 |  Q2,01 | PCWP |            |   2000K|00:00:17.82 |    4358 |      0 |      0 |       |       |          |         |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Notice how the output suggests that the parallel execution part of the TEMP table transformation didn't start at all and didn't perform any work. If however the second child cursor related to the other DFO is analyzed, the following information gets reported:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------          
| Id  | Operation                   | Name                      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   |          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------          
|   0 | SELECT STATEMENT            |                           |      0 |        |       |  1635 (100)|          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|   1 |  TEMP TABLE TRANSFORMATION  |                           |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|   2 |   PX COORDINATOR            |                           |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|   3 |    PX SEND QC (RANDOM)      | :TQ10000                  |      0 |   2000K|    11M|   721   (2)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|   4 |     LOAD AS SELECT          |                           |      2 |        |       |            |          |  Q1,00 | PCWP |            |      0 |00:01:07.92 |    8027 |   4737 |   3198 |  1058K|  1058K|     2/0/0|          
|   5 |      PX BLOCK ITERATOR      |                           |      2 |   2000K|    11M|   721   (2)| 00:00:01 |  Q1,00 | PCWC |            |   2000K|00:00:33.10 |    4761 |   4735 |      0 |       |       |          |          
|*  6 |       TABLE ACCESS FULL     | T4                        |     26 |   2000K|    11M|   721   (2)| 00:00:01 |  Q1,00 | PCWP |            |   2000K|00:00:11.27 |    4761 |   4735 |      0 |       |       |          |          
|   7 |   COUNT                     |                           |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|*  8 |    FILTER                   |                           |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|   9 |     PX COORDINATOR          |                           |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|  10 |      PX SEND QC (RANDOM)    | :TQ20002                  |      0 |   2000K|   236M|   914   (4)| 00:00:01 |  Q2,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|* 11 |       HASH JOIN BUFFERED    |                           |      0 |   2000K|   236M|   914   (4)| 00:00:01 |  Q2,02 | PCWP |            |      0 |00:00:00.01 |       0 |      0 |      0 |    83M|  5864K|          |          
|  12 |        PX RECEIVE           |                           |      0 |   2000K|    24M|   254   (4)| 00:00:01 |  Q2,02 | PCWP |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|  13 |         PX SEND HASH        | :TQ20000                  |      0 |   2000K|    24M|   254   (4)| 00:00:01 |  Q2,00 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|  14 |          VIEW               |                           |      0 |   2000K|    24M|   254   (4)| 00:00:01 |  Q2,00 | PCWP |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|  15 |           PX BLOCK ITERATOR |                           |      0 |   2000K|    11M|   254   (4)| 00:00:01 |  Q2,00 | PCWC |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|* 16 |            TABLE ACCESS FULL| SYS_TEMP_0FD9D6603_158141 |      0 |   2000K|    11M|   254   (4)| 00:00:01 |  Q2,00 | PCWP |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|  17 |        PX RECEIVE           |                           |      0 |   2000K|   211M|   647   (2)| 00:00:01 |  Q2,02 | PCWP |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|  18 |         PX SEND HASH        | :TQ20001                  |      0 |   2000K|   211M|   647   (2)| 00:00:01 |  Q2,01 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|  19 |          PX BLOCK ITERATOR  |                           |      0 |   2000K|   211M|   647   (2)| 00:00:01 |  Q2,01 | PCWC |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
|* 20 |           TABLE ACCESS FULL | T2                        |      0 |   2000K|   211M|   647   (2)| 00:00:01 |  Q2,01 | PCWP |            |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------          

Here you can see the missing parallel execution work actually performed.


Real-Time SQL Monitoring And XPLAN_ASH


If you are on 11g already and have the corresponding Diagnostic + Tuning Pack license, the best way to get a complete picture about Parallel Execution is using the "active" Real-Time SQL Monitoring report. It shows information already while the statement executes and doesn't have above mentioned limitations, so can work with cross-instance executions and execution plans using multiple DFOs (although there are bugs in the current versions related to such plans), besides that offers even more insights into the execution details than DBMS_XPLAN.DISPLAY_CURSOR / V$SQL_PLAN_STATISTICS_ALL.

It is interesting to note that Real-Time SQL Monitoring doesn't show the actual time consumed on execution plan line level as extended Rowsource Statistics does, which explains why it doesn't come with the same overhead. Since Real-Time SQL Monitoring analyzes ASH data instead, it can still come up with some reasonable execution plan line level work distribution information (including the differentation between CPU time and waits), although not as accurate as the actual timing information that can be gathered via Rowsource Statistics.

If you don't have a Tuning Pack license but at least Diagnostic Pack, or you're still on 10g (+ Diagnostic Pack license) then you can use my XPLAN_ASH tool to gather some interesting information about Parallel Execution. Since Active Session History is available cross instance and isn't limited to particular child cursors, it doesn't have above limitations and therefore can provide the full picture about a SQL execution based on ASH data. In 10g however, the ASH data doesn't have a relation to execution plan lines and misses some other information available from 11g on, so some important analysis on execution plan line level that can be done with 11g ASH data is not available in 10g.


Footnote


If you look carefully at above execution plans you'll notice the HASH JOIN BUFFERED operations that are reported as "Optimal" Hash Joins. This in principle means that the Hash Join operation itself could be done completely in memory. Why does DBMS_XPLAN.DISPLAY_CURSOR then show Read/Write/TEMP activity for the HASH JOIN BUFFERED operation? I'll cover this in detail in my next post - I believe the explanations (including my own) published so far for that type of operation are incorrect.


Summary


DBMS_XPLAN.DISPLAY_CURSOR doesn't work very well with Parallel Execution. For simple execution plans consisting only of a single DFO, and single-instance executions the "ALLSTATS ALL" option can be used as a workaround.

If available, Real-Time SQL Monitoring is the tool to use for Parallel Execution analysis. My XPLAN_ASH tool also offers some unique insights, in particular regarding systematic analysis of Parallel Execution work distribution skew.

4 comments:

  1. Hi Randolf,

    I hope the parallel execution details would be extracted using V$execution view. Which was externalized for user.

    Let me know your comments/inputs on this.

    Pavan Kumar N

    ReplyDelete
  2. Hi Pavan,

    I'm not aware of a V$EXECUTION view. You might mean V$PQ_TQSTAT, or you might mean one of the V$SQL related views, so I don't know exactly what you are referring to.

    Randolf

    ReplyDelete
  3. Hi Randolf,

    recently I have seen (in 11.1.0.7) that there is a view v$execution and the documentation says it "displays information on parallel execution" - http://docs.oracle.com/cd/B28359_01/server.111/b28320/dynviews_1119.htm#REFRN30080. But I have never seen any rows in it. According to v$fixed_view_definition it is based on x$kstex (and it's the only view based on this object). But I did not find further information on the object - so I assumed it's perhaps not functional.

    Martin

    ReplyDelete
  4. Hi Martin,

    thanks a lot, a quick look into the official documentation was something I should have done right away.

    It looks like the Dynamic Performance View was the first time officially documented in the Oracle 8 timeframe and can be found in most documentation versions still officially available since then (8.1, 9.2, 10.1, 10.2 etc.).

    Either it never was functional, or ceased to function in one of the more recent releases.

    I'll have a look at some older versions may be some time later, just for curiosity.

    The official documentation of the view leaves a lot unclear what information it was supposed to provide, by the way.

    Randolf

    ReplyDelete