Monday, May 21, 2012

Nested Loop Join Costing

The basic formula for calculating the costs of a Nested Loop Join is pretty straightforward and has been described and published several times.

In principle it is the cost of acquiring the driving row source plus the cost of acquiring the inner row source of the Nested Loop as many times as the driving row source dictates via the cardinality of the driving row source.

Cost (outer rowsource) + Cost (inner rowsource) * Card (outer rowsource)

Obviously there are cases where Oracle has introduced refinements to the above formula where this is no longer true. Here is one of these cases that is probably not uncommon.

Let's start with a simple two table join that shows above formula in action. It represents a parent-child relationship where the parent table has 10,000 rows with a unique identifier, and a child table with 100 rows each that map to a single parent row, having 1,000,000 rows in total.

set echo on create table t as select rownum as id , rownum as attr1 , rpad('x', 100) as filler from dual connect by level <= 10000 ; exec dbms_stats.gather_table_stats(null, 't') create table t2 as select rownum as id , mod(rownum, 10000) + 1 as fk , mod(rownum, 20) + 1 as attr1 , rpad('x', 100) as filler from dual connect by level <= 1000000 ; exec dbms_stats.gather_table_stats(null, 't2') create index t2_idx on t2 (fk); explain plan for select /*+ use_nl(t t2) leading(t) index(t2) */ * from t , t2 where t.attr1 <= 500 and t2.fk = t.id; set pagesize 0 linesize 200 tab off select * from table(dbms_xplan.display);

which gives the following execution plan in 11.2:

--------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50005 | 10M| 51087 (1)| 00:10:14 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 50005 | 10M| 51087 (1)| 00:10:14 | |* 3 | TABLE ACCESS FULL | T | 500 | 54500 | 45 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | T2_IDX | 100 | | 2 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T2 | 100 | 11300 | 102 (0)| 00:00:02 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T"."ATTR1"<=500) 4 - access("T2"."FK"="T"."ID")

There are a couple of noteworthy comments:

1. T2.FK and T.ID have the same number of distinct keys each (10,000), so assuming corresponding primary and foreign key constraints in place this means that there is at least one matching row for every parent row.

2. There is a filter on T that restricts the driving row source to 500 rows. It is interesting to note that this filter results in a "biased join" where Oracle picks the NUM_DISTINCT from the other table for the join selectivity calculation rather than using the greatest NUM_DISTINCT of both, in this case arriving at a correct cardinality estimate of approx. 50,000

3. The T2_IDX index has a worst-case clustering factor due to the scattering of T2.FK via the MOD function, hence the resulting cost of a single iteration of the Nested Loop join is 100 for picking up 100 rows from the table (plus 2 for the index access)

The overall cost for the loop is 45 for acquiring the driving row source plus 500 times 102, the remaining part being CPU costing overhead. This matches above formula.

Now let's re-create table T2 using a different distribution of the T2.FK column:

drop table t2; purge table t2; create table t2 as select rownum as id , mod(rownum, 500) + 1 as fk , mod(rownum, 20) + 1 as attr1 , rpad('x', 100) as filler from dual connect by level <= 1000000 ; exec dbms_stats.gather_table_stats(null, 't2') create index t2_idx on t2 (fk);

The only difference is now that the FK column no longer has 10,000 distinct keys but only 500.

Of course on average now 2,000 rows in T2 match a parent row in T, but obviously no longer all of them have a match in T2.

Let's review the execution plan for the same SQL statement as above:

--------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 998K| 211M| 52609 (1)| 00:10:32 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 998K| 211M| 52609 (1)| 00:10:32 | |* 3 | TABLE ACCESS FULL | T | 500 | 54500 | 45 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | T2_IDX | 2000 | | 5 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T2 | 1996 | 220K| 2007 (1)| 00:00:25 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T"."ATTR1"<=500) 4 - access("T2"."FK"="T"."ID")

Since now 2,000 rows match a single parent key on average, the cost per iteration has increased accordingly. The index range scan needs to visit some root / branch blocks plus four leaf blocks on average, resulting in a cost of 5. The table access needs to visit 2,000 rows on average, hence with the same worst-case clustering factor the cost per iteration is 2,000 plus CPU overhead.

Given the fact that we need to do this 500 times the final cost of the Nested Loop join ought to be something close to one million, but surprisingly it is only a little bit higher than the cost of the previous scenario.

So clearly the original formula doesn't apply here, although the cost for a single iteration in the execution plan seems to match the expectations.

It looks like Oracle a long way back introduced a refinement to the original formula in the case of less distinct keys of the inner row source join column than the driving row source join column.

The idea behind it seems to be that this is what Oracle calls a "sparse" join. Obviously not every row from the driving row source will find a match in the inner row source, hence some of the loop iterations should end with the index lookup, not finding a match and therefore no need to visit the inner table afterwards. The refinement hence calculates a lower average cost of the table visit per loop iteration.

This is of course true when looking at the join by itself. But if you are unlucky and a corresponding filter on the driving row source turns the "sparse" join into a (rather) "dense" join, then this refinement can lead to a significant cost underestimate of a Nested Loop join, potentially driving the optimizer towards favouring that join method over others.

And it is exactly that scenario what above query simulates: The filter on T results in a full match of all 500 driving rows, and the actual cost of the Nested Loop join ought to be closer to one million than 50,000 in this case.

The refinement to the cost calculation seems to be based on the ratio between the NUM_DISTINCT of two join coin columns: In my example the ratio is 10,000:500, so the overall cost is only approx. 1/20 of the original cost.

There are some further details how the formula deals with a small number of loop iterations. For example, the first iteration will get the full original cost, a number of further iterations (again seems to correspond to the ratio, here for example 20) get only the index cost added, and from then on the costing of the table access levels at the original cost downscaled by the ratio (2000 / 20 which is 100 in this case).

The refinement obviously has been added in release 10.1.0.3 and can be found in the Fix Control list as bug number 3120429. The text for this fix is: "account for join key sparsity in computing NL index access cost" and apparently only applies if the inner row source uses an index access.

This also means that the original costing can be activated by disabling the fix:

explain plan for select /*+ opt_param('_fix_control' '3120429:0') use_nl(t t2) leading(t) index(t2) */ * from t , t2 where t.attr1 <= 500 and t2.fk = t.id; set pagesize 0 linesize 200 tab off select * from table(dbms_xplan.display); --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 998K| 211M| 1003K (1)| 03:20:41 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 998K| 211M| 1003K (1)| 03:20:41 | |* 3 | TABLE ACCESS FULL | T | 500 | 54500 | 45 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | T2_IDX | 2000 | | 5 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T2 | 1996 | 220K| 2007 (1)| 00:00:25 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T"."ATTR1"<=500) 4 - access("T2"."FK"="T"."ID")
So if you happen to have such cases where the filter on a driving row source leads to an underestimate of the Nested Loop join cost as outlined here, using the fix control allows arriving at more reasonable costing figures.

I recently had such a case at a client where only a small part of a fully populated time dimension was used in the foreign key of a fact table, but the filter on the time dimension lead to exactly the scenario described here - the join wasn't really sparse and the Nested Loop join cost was significantly underestimated.

12 comments:

Flado said...

"fully populated time dimension" - what, all 14 billion years of past plus the infinity of future? ;-)

Randolf said...

Of course :-)

Well, at least you've proven that you've read the article to the end :-)

Martin Preiss said...

@Flado: I am sure you know that the world was created on 01.01.4712 B.C. And that the end of the world will come on '31.12.9999'.

Antony said...

Hi Randolf,

Please help me in calculating join cardinality for the below SQL.
I couldn't get it right.

SQL> @xplan
Plan hash value: 4119620020

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96199 | 53M| | 114K (1)| 00:08:48 |
|* 1 | HASH JOIN | | 96199 | 53M| 4984K| 114K (1)| 00:08:48 |
|* 2 | TABLE ACCESS BY INDEX ROWID| PS_PC_RES_PA_TA121 | 96199 | 3851K| | 6517 (1)| 00:00:31 |
|* 3 | INDEX RANGE SCAN | PSAPC_RES_PA_TA121 | 299K| | | 227 (1)| 00:00:02 |
|* 4 | TABLE ACCESS BY INDEX ROWID| PS_PC_RES_PA_TA021 | 785K| 403M| | 72751 (1)| 00:05:36 |
|* 5 | INDEX RANGE SCAN | PSAPC_RES_PA_TA021 | 2902K| | | 2308 (1)| 00:00:11 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("PR"."BUSINESS_UNIT_PO"="PRT"."BUSINESS_UNIT_PO" AND "PR"."PO_ID"="PRT"."PO_ID" AND
"PR"."LINE_NBR"="PRT"."LINE_NBR" AND "PR"."SCHED_NBR"="PRT"."SCHED_NBR" AND
"PR"."DISTRIB_LINE_NUM"="PRT"."DISTRIB_LINE_NUM" AND "PR"."DST_ACCT_TYPE"="PRT"."DST_ACCT_TYPE")
2 - filter("PRT"."ANALYSIS_TYPE"='CCR' OR "PRT"."ANALYSIS_TYPE"='CRV')
3 - access("PRT"."PROCESS_INSTANCE"=28022762)
4 - filter("PR"."ANALYSIS_TYPE"='CCR' OR "PR"."ANALYSIS_TYPE"='CRV')
5 - access("PR"."PROCESS_INSTANCE"=28022762)

Join Predicated & NDVs

PR.BUSINESS_UNIT_PO = 219
PR.PO_ID = 308320
PR.LINE_NBR = 142
PR.SCHED_NBR = 24
PR.DISTRIB_LINE_NUM = 56
PR.DST_ACCT_TYPE = 3


PRT.BUSINESS_UNIT_PO = 177
PRT.PO_ID = 139136
PRT.LINE_NBR = 133
PRT.SCHED_NBR = 23
PRT.DISTRIB_LINE_NUM = 42
PRT.DST_ACCT_TYPE = 2

Filter Predicates & NDVs

PRT.ANALYSIS_TYPE = 3
PR.ANALYSIS_TYPE = 5

Table row count

PS_PC_RES_PA_TA121 PRT Rows= 299160
PS_PC_RES_PA_TA021 PR Rows= 2902548


= 1/5 + 1/5 -(1/5 *1/5)=2/10+1/25= 0.24 = 0.24 * 2902548 = 696611
=1/3+1/3 -(1/3*1/3) = 2/6+1/9=0.444 = 0.444 * 299160 = 132817




Join Selectivity = ((num_rows(t1) - num_nulls(t1.c1)) / num_rows(t1)) * ((num_rows(t2) - num_nulls(t2.c2)) / num_rows(t2)) / greater(num_distinct(t1.c1), num_distinct(t2.c2))

Join Selectivity = 1 / greater(ndv(all join predicates)) = 1/308320 --- No NULLS

Join Cardinality = Join Selectivity * filtered cardinality(t1) * filtered cardinality(t2)

Join Cardinality = 1/308320* 132827 * 696611 = 29989

Thanks

Randolf said...

Hi Antony,

first of all you need to check the two input cardinalities of the join, which are 96199 and 785K respectively. So I believe your filtered cardinality calculations are already incorrect.

Second you need to be aware of the so called "Multi-column" join cardinality sanity checks. Very likely these checks have kicked in, see this post for more information.

You would need to check your particular database version (which you haven't mention) and the corresponding optimizer trace file (event 10053 or 11g debug framework) to see if these have been used, which would explain why your final join cardinality doesn't fall below the 96199 (the smaller of the two input cardinalities to the join).

Randolf

Antony said...

Hi Randolf,
Sorry!I was terribly wrong while calculating filtered cardinality.
It's supposed to be (a+b)-(a*b) for predicates with OR clause.

You're absolutely correct as the optimizer has applied "Multi-column" join cardinality sanity checks.

DB version:11.2.0.2

Thank you again.

Antony said...

Hi Randolf,
Referring Jonathan's CBO book on page 269,what are you supposed to do if you have two or more join columns?
In my case,I have more than two join columns.Does it mean that I can't use the standard join cardinality formula?

Thanks

Randolf said...

Antony,

I don't have Jonathan's book at hand at present and may be I'm not getting your question, but in the simplest case where you have multiple join columns ANDED together you simply multiply the different join selectivities determined for each column - it's no different from what you do with normal selectivity calculations.

I'm pretty sure Jonathan covers this... I've just had a quick look at google/books and page 272ff. should clarify your doubts.

Randolf

Antony said...

Thanks Randolf!
I am still wondering why I couldn't estimate the filtered cardinality of the tables correctly?

4 - filter("PR"."ANALYSIS_TYPE"='CCR' OR "PR"."ANALYSIS_TYPE"='CRV')
PR.Analysis_type(NDV)=5
Since there is a OR clause in the predicate,i am using (a+b)-(a*b) for calculating selectivity.
Sel= (1/5+1/5)-(1/5*1/5)=0.16
Filtered cardinality(PR)=0.16*2902548=464408

But the estimated cardinality is 785k.

Please correct me where i am wrong.

Thanks

Randolf said...

Hi Antony,

you need to (at least):

- check for NULLs
- check for histograms on the columns
- check if this is an INLIST, where the cardinality is calculated differently

Randolf

Antony said...

18Hi Randolf,
The field doesn't have null values and has a frequency histogram with a density 1.7347E-07.
It also has a INLIST in the query.

ANALYSIS_TYPE IN ('CRV', 'CCR')

NAME NDV DENS SAMPLE_SIZE #NULL AVGLEN HISTOGRAM #BKT
------------------------- ---------- ---------- ----------- ---------- ------- ---------------- ----------
ANALYSIS_TYPE 5 1.7347E-07 12951 0 4 FREQUENCY 5

Could you please explain now?

Thanks

Randolf said...

Hi Antony,

no I can't, still insufficient information.

You would need to check the exact information from the histogram - check Jonathan's book, the "Single Table Cardinality" chapter and the "Histograms" chapter.

Randolf