Wednesday, April 18, 2012

Webinar Recording

So to wrap this up, the webinar recording (this time hosted on YouTube, the recording available from the webinar archive is hosted somewhere else) along with the webinar material for download is now officially available on AllThingsOracle.com.

I suggest that if you still have any questions regarding this webinar then you can comment on the post over there and I'll try to address them.

If you interested in more stuff like that, then stay tuned as there are more of them planned covering many of the aspects that I deliberately left out or mentioned only briefly in this basic introduction.

Thursday, April 12, 2012

Webinar Material

Thanks everyone who attended yesterday's webinar at AllThingsOracle.com.

I hope the technical difficulties with my new mic (which of course work flawlessly during several test runs) weren't too severe so that you could still understand what I said.

As usual when I do presentations in English it took me far too long so probably some of you missed the end of the session. If you did you may want to check the webinar recording once available at the webinar archive at AllThingsOracle.com, because we had a very good and extensive Q+A session with excellent questions asked.

Thanks again to AllThingsOracle.com and James Murtagh for hosting the event.

An archive containing the presentation along with the demo scripts and screenshots of the live demos will be made available at AllThingsOracle.com, but if you can't wait until it's published there, you can download the package also from here.

Tuesday, April 10, 2012

Column Groups - Edge Cases

Oracle 11g added Extended Statistics support for column groups in order to detect correlated columns for filter predicates using an equal comparison.

Note that Oracle 11g also added the ability to use the number of distinct keys of a composite index as an upper limit for the cardinality estimates for matching column predicates, which means that the optimizer is now capable of detecting correlated columns without the explicit addition of Extended Statistics / Column Groups.

The "composite index" feature was also backported to 10.2.0.4 and 10.2.0.5 but needs to be activated explicitly via a FIX_CONTROL. You can read more about that on Jonathan Lewis' blog.

The "Column Group" feature (either by explicit Extended Statistics or via a matching composite index) however has a weakness in a very special case: If there is only a single distinct value in the statistics then the "out-of-range" detection of the optimizer is not working correctly.

This can be easily demonstrated with a very simple test case:


set echo on linesize 200 pagesize 999 tab off trimspool on trimout on

drop table t;

purge table t;

create table t
as
select
rownum as id
, mod(rownum, 1) + 1 as attr1
, mod(rownum, 1) + 1 as attr2
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;

select
attr1
, attr2
, count(*) as cnt
from
t
group by
attr1
, attr2
;

exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1')

create index t_idx on t (attr1, attr2);

explain plan for
select * from t where attr1 = 1 and attr2 = 1;

select * from table(dbms_xplan.display(null, null, 'BASIC PREDICATE ROWS'));

explain plan for
select * from t where attr1 = 0 and attr2 = 0;

select * from table(dbms_xplan.display(null, null, 'BASIC PREDICATE ROWS'));

explain plan for
select * from t where attr1 = 2 and attr2 = 2;

select * from table(dbms_xplan.display(null, null, 'BASIC PREDICATE ROWS'));

explain plan for
select * from t where attr1 = 100000 and attr2 = 100000;

select * from table(dbms_xplan.display(null, null, 'BASIC PROJECTION ROWS'));

explain plan for
select * from t where attr1 = -100000 and attr2 = 100000;

select * from table(dbms_xplan.display(null, null, 'BASIC PROJECTION ROWS'));


The relevant output is:


SQL> select
2 attr1
3 , attr2
4 , count(*) as cnt
5 from
6 t
7 group by
8 attr1
9 , attr2
10 ;

ATTR1 ATTR2 CNT
---------- ---------- ----------
1 1 1000000

select * from t where attr1 = 1 and attr2 = 1;

------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ATTR1"=1 AND "ATTR2"=1)

select * from t where attr1 = 0 and attr2 = 0;

------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ATTR1"=0 AND "ATTR2"=0)

select * from t where attr1 = 2 and attr2 = 2;

------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ATTR1"=2 AND "ATTR2"=2)

select * from t where attr1 = 100000 and attr2 = 100000;

------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ATTR1"=100000 AND "ATTR2"=100000)

select * from t where attr1 = -100000 and attr2 = 100000;

------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ATTR1"=(-100000) AND "ATTR2"=100000)


As you can see the optimizer still thinks to return "all" rows even in obvious cases where no rows will be returned.

In 11g the problem does not appear if there is more than a single distinct value, however the 10g backport seems to produce similar silly estimates even with more than a single distinct value.

One possible workaround is to generate a histogram on the column group in 11g. This way the out-of-range detection works again as expected for the single distinct value case.

It's also possible to generate a histogram on one or more of the affected columns each to avoid the problem, but this "breaks" the column group and correlation detection and hence might lead to worse cardinality estimates in the case of more than a single distinct value.

Note that when generating histograms you need to carefully check if you run into the problems described here: If you end up with a histogram containing only a single bucket but the underlying basic column statistics show more than a single distinct value then in principle unfortunately again the same issue will show up as demonstrated in the other blog post.

Note that this inconsistency between basic column statistics (Number Of Distinct Values, NDV) and the corresponding histogram is a consequence of how 11g gathers statistics when using the AUTO_SAMPLE_SIZE default.

Maria Colgan has just published a note about that.

Friendly Reminder - Free Webinar

This is just a friendly reminder that tomorrow my free Webinar on Cost-Based Optimizer Basics will take place at AllThingsOracle.com.