Saturday, July 28, 2012

Interesting blog notes

Just a pointer to two blog posts that I find worth mentioning:

1. Christo Kutrovsky from Pythian writes about some quirks he found regarding Parallel Distribution of aggregation and analytic functions. In particular the lower part of the post (which is not about the initial Interval Partitioning issue) gives a lot of food for thought how the chosen Parallel Distribution can influence the performance of operations

2. Alexander Anokhin gives some geek insights into how Oracle performs logical I/O and in particular how the "buffer is pinned count" statistics is maintained by Oracle. He even introduces another cool tool ("Digger") for performing such investigations based on DTrace.

That post quite nicely complements my "Logical I/O evolution" series.

Sunday, July 22, 2012

Free Webinar

In a couple of days, on Wednesday, 1st of August, I'll be presenting another free webinar hosted at AllThingsOracle.com.

Although it is called "Oracle Cost-Based Optimizer Advanced Session", don't be mislead by the title.

It is not a truly "advanced" session, but rather I'll try to delve into various topics that I could only mention briefly or had to omit completely during the first webinar on the Cost-Based Optimizer.

In principle it's going to be a selection of the most recurring issues that I come across during my consultancy work:

- I'm going to spend some time on statistics and histograms in particular and what I believe are the most important aspects to understand regarding them

- Why the clustering factor of indexes gathered by the statistics collection might be wrong (this was only briefly mentioned in the first webinar)

- Using the wrong datatype and how this impacts the estimates of the optimizer is going to be another potential topic

- Implicit datatype conversions are also an issue I come across quite frequently (Tom Kyte covered this already to some extent in one of the previous AllThingsOracle.com webinars I believe)

- Last but not least Dynamic Sampling, Virtual Columns and Extended Statistics are also worth to talk about if time permits it

This time I'm going to use only a couple of slides and mainly will run through some test cases that hopefully allow a live demonstration of what I'm talking about.

You can find more details and register here.

Wednesday, July 11, 2012

AllThingsOracle.com - Dynamic Sampling (III) Part II

Just again a short note that the third installment's final part of the series on Dynamic Sampling has been published on AllThingsOracle.com.

I show there how indexes can be used to get Dynamic Sampling working with rare / clustered values, and expand on further ideas how to deal with tricky situations for the optimizer.

Sunday, July 8, 2012

Forced Cursor Sharing And Virtual Columns

So you have that application that cannot be changed but makes use of some weird expressions that screw up the cardinality estimates of the optimizer.

Consider this simple example:

create table t as select rownum as id , case when mod(rownum, 100000) = 0 then 'y' else 'n' end as flag , rpad('x', 100) as filler from dual connect by level <= 1000000 ; exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns flag size 2', estimate_percent => 30) select flag, count(*) from t group by flag; explain plan for select count(*) from t where flag = 'y' ; set linesize 200 tab off pagesize 0 select * from table(dbms_xplan.display);

We have a table with a very skewed "FLAG" column with just a few occurrences of 'y'. The histogram on the column allows the optimizer to recognize this.

Notice that I haven't used the AUTO_SAMPLE_SIZE default, due to the potential problem of 11g in particular when using low sample sizes for a histogram on a column having rare values leading to inconsistent column and histogram statistics - but for larger tables the AUTO_SAMPLE_SIZE in 11g gives you much better basic column statistics, so I don't advise in general to not use AUTO_SAMPLE_SIZE. It might be worth to use a separate GATHER_TABLE_STATS call only for the columns with histograms using an explicit sample size, but using AUTO_SAMPLE_SIZE for the remaining columns.

From running this script on 11.2.0.1 I get this output:

SQL> create table t 2 as 3 select 4 rownum as id 5 , case 6 when mod(rownum, 100000) = 0 7 then 'y' 8 else 'n' 9 end as flag 10 , rpad('x', 100) as filler 11 from 12 dual 13 connect by 14 level <= 1000000 15 ; Table created. SQL> SQL> exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns flag size 2', estimate_percent => 30) PL/SQL procedure successfully completed. SQL> SQL> select flag, count(*) from t group by flag; F COUNT(*) - ---------- y 10 n 999990 SQL> SQL> explain plan for 2 select count(*) from t where flag = 'y' 3 ; Explained. SQL> SQL> set linesize 200 tab off pagesize 0 SQL> SQL> select * from table(dbms_xplan.display); Plan hash value: 2966233522 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2 | 4292 (2)| 00:00:52 | | 1 | SORT AGGREGATE | | 1 | 2 | | | |* 2 | TABLE ACCESS FULL| T | 8 | 16 | 4292 (2)| 00:00:52 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("FLAG"='y') 14 rows selected.

So the cardinality estimate for the FLAG column is in the right ballpark as I've prevented problems with the histogram. But now we have this expression, so let's see what happens then to the estimates:

explain plan for select count(*) from t where nvl(upper(flag), 'Y') = 'Y' ; select * from table(dbms_xplan.display);

which gives me this output from my 11.2.0.1 session:

SQL> explain plan for 2 select count(*) from t where nvl(upper(flag), 'Y') = 'Y' 3 ; Explained. SQL> SQL> select * from table(dbms_xplan.display); Plan hash value: 2966233522 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2 | 4308 (2)| 00:00:52 | | 1 | SORT AGGREGATE | | 1 | 2 | | | |* 2 | TABLE ACCESS FULL| T | 9992 | 19984 | 4308 (2)| 00:00:52 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(NVL(UPPER("FLAG"),'Y')='Y') 14 rows selected. SQL>

As expected, the optimizer doesn't have a clue and falls back to a hardcoded one percent default estimate.

Since this is 11g, let's generate extended statistics for the expression with a histogram and re-check the estimates:

exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for columns (nvl(upper(flag), ''Y'')) size 2', estimate_percent => 30) explain plan for select count(*) from t where nvl(upper(flag), 'Y') = 'Y' ; select * from table(dbms_xplan.display);

which gives me

SQL> exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns flag size 2 for columns (nvl(upper(flag), ''Y'')) size 2', estimate_percent => 30) PL/SQL procedure successfully completed. SQL> SQL> explain plan for 2 select count(*) from t where nvl(upper(flag), 'Y') = 'Y' 3 ; Explained. SQL> SQL> select * from table(dbms_xplan.display); Plan hash value: 2966233522 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 4 | 4308 (2)| 00:00:52 | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL| T | 8 | 32 | 4308 (2)| 00:00:52 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(NVL(UPPER("FLAG"),'Y')='Y') 14 rows selected.

So that looks good again. Now let's combine this with CURSOR_SHARING = FORCE:

alter session set cursor_sharing = force; explain plan for select count(*) from t where nvl(upper(flag), 'Y') = 'Y' ; select * from table(dbms_xplan.display); select count(*) from t where nvl(upper(flag), 'Y') = 'Y'; select * from table(dbms_xplan.display_cursor);

Notice in particular the EXPLAIN PLAN output and the actual execution plan used at runtime:

SQL> alter session set cursor_sharing = force; Session altered. SQL> SQL> explain plan for 2 select count(*) from t where nvl(upper(flag), 'Y') = 'Y' 3 ; Explained. SQL> SQL> select * from table(dbms_xplan.display); Plan hash value: 2966233522 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 4 | 4308 (2)| 00:00:52 | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL| T | 8 | 32 | 4308 (2)| 00:00:52 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(NVL(UPPER("FLAG"),'Y')='Y') 14 rows selected. SQL> SQL> select count(*) from t where nvl(upper(flag), 'Y') = 'Y'; 10 SQL> SQL> select * from table(dbms_xplan.display_cursor); SQL_ID 8jzw0jwwuk83f, child number 0 ------------------------------------- select count(*) from t where nvl(upper(flag), :"SYS_B_0") = :"SYS_B_1" Plan hash value: 2966233522 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 4308 (100)| | | 1 | SORT AGGREGATE | | 1 | 2 | | | |* 2 | TABLE ACCESS FULL| T | 9998 | 19996 | 4308 (2)| 00:00:52 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(NVL(UPPER("FLAG"),:SYS_B_0)=:SYS_B_1) 19 rows selected.

So there are two important takeaways:

1. EXPLAIN PLAN doesn't care about CURSOR_SHARING=FORCE. You have to be very careful when CURSOR_SHARING=FORCE is active regarding the execution plans you see from EXPLAIN PLAN and the actual ones used at runtime. This also means that the AUTOTRACE feature of SQL*Plus might lie to you, too.

2. The literal replacement performed by CURSOR_SHARING=FORCE prevents the optimizer from making use of the Virtual Column / Extended Statistics, because the expression has changed, hence we are back to the bad cardinality estimate and the additional information is useless to the optimizer in this scenario

Footnote

If you really need to use CURSOR_SHARING=FORCE due to some badly written application, I always advise to minimize the scope. If feasible, don't run the whole instance in that mode, but limit the setting to either the corresponding sessions (for example, via LOGON triggers) or even to certain processing parts of the application (via corresponding ALTER SESSION SET CURSOR_SHARING = FORCE/EXACT calls).

Note that all this applies to CURSOR_SHARING=SIMILAR, too, but since its use is deprecated anyway (see MOS document 1169017.1) I haven't mentioned it here before.

Finally, related technologies like Function Based Indexes and Virtual Columns are very likely affected in a similar way regarding the resulting cardinality estimates.