Wednesday, August 10, 2011

Logical I/O Evolution - Part 3: 11g

Preface (with apologies to Kevin Closson)

This blog post is too long

Introduction

In the previous part of this series I've already demonstrated that the logical I/O optimization of the Table Prefetching feature depends on the order of the row sources - and 11g takes this approach a big step further.

It is very interesting that 11g does not require any particular feature like Table Prefetching or Nested Loop Join Batching (another new feature introduced in 11g) to take advantage of the Logical I/O optimization - it seems to be available even with the most basic form of a Nested Loop join.

Note that this optimization has already been mentioned several times, but there was always some confusion so far whether this optimization was related to another new feature that has been introduced with 11g - the so called "fastpath" consistent gets.

Buffer Pinning Optimization

So, let's repeat the already known test case from the previous parts in 11g. Another nice feature of 11g is that we have now full control over the Nested Loop plan shapes / features used by Oracle - we can choose from "classic" Nested Loop Join, Table Prefetching and Nested Loop Join Batching.

This is controlled via the [NO_]NLJ_BATCHING and [NO_]NLJ_PREFETCH hints which you will also find in the "outline" hint list generated for Plan Stability from 11g on.

Interestingly if I wanted to have the "classic" Nested Loop shape then I couldn't achieve that by combining the NO_NLJ_BATCHING and NO_NLJ_PREFETCH hint - one seemed to disable the other one - so I had to resort to the "_nlj_batching_enabled" parameter to disable Nested Loop Join Batching.

So this is what the query hints need to look like if we want to have the classic Nested Loop Join shape in 11g:

select
max(b_filler), max(a_filler)
from (
select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */
a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler
from
t2 a
, t1 b
where
a.id = b.id
);


If you want to test with different plan shapes you can simply modify the hint section as required, for example you can get the Table Prefetching shape by changing above hint from NO_NLJ_PREFETCH to NLJ_PREFETCH etc.

Let's start with the data set where T1 and T2 are not in the same order, and stick to the classic plan shape:

11.2.0.1 Classic Nested Loop - Random order

Inner row source Unique Index - T1 different order than T2

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.67 | 310K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.67 | 310K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:03.47 | 310K|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:02.54 | 300K|
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.76 | 200K|
---------------------------------------------------------------------------------------------------------------


Inner row source Non-Unique Index - T1 different order than T2

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:04.40 | 311K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:04.40 | 311K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:04.20 | 311K|
| 3 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.20 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:03.28 | 301K|
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:01.08 | 201K|
---------------------------------------------------------------------------------------------------------------


So far no difference to previous results, the non-unique index variant is still slower than the unique one, and we do not see any special buffer pinning optimization apart from the one we've already seen in the baseline test.

The relevant session statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..consistent gets 310,012 311,108 1,096
STAT..consistent gets from cache 310,012 311,108 1,096
STAT..session logical reads 310,012 311,108 1,096
STAT..buffer is not pinned count 200,002 100,012 -99,990
STAT..buffer is pinned count 99,999 199,993 99,994
STAT..index fetch by key 100,000 2 -99,998
STAT..rows fetched via callback 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000
STAT..consistent gets - examination 300,001 100,007 -199,994
STAT..consistent gets from cache (fastpath) 10,011 211,101 201,090
STAT..no work - consistent read gets 10,001 211,091 201,090
LATCH.cache buffers chains 320,024 522,216 202,192


Nothing spectacular here either, but there are at least some interesting points to mention:

- We can see that Oracle took advantage of the so called "fastpath" consistent gets for the "normal" consistent gets - they still took two latch acquisitions per get though. The "fastpath" seems to be about a code optimization when buffers get pinned that probably requires less CPU cycles. I don't know if the code change addresses any further contention/concurrency issues apart from being "faster" (faster is always better, isn't it :-)

- The "buffer is pinned count" statistics are not consistent with what we've seen from 10g:

* The "unique index" variant already misses 90,000 pins, but does not produce more consistent gets, so in total we do not arrive at the anticipated 500,000 buffer visits any more - either something seems to be missing from the instrumentation or Oracle does something fundamentally different
* The "non-unique index" variant however records 10,000 excess pinned buffers, so we end up with 510,000 buffer visits recorded in total

Let's repeat the same with the T1 and T2 data ordered in the same way - but not ordered by ID (so simply uncomment the second call to DBMS_RANDOM.SEED(0)):

11.2.0.1 Classic Nested Loop - Same random order

Inner row source Unique Index - T1 same random order as T2

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.55 | 310K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.55 | 310K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:03.35 | 310K|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.22 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:02.41 | 300K|
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.77 | 200K|
---------------------------------------------------------------------------------------------------------------


Inner row source Non-Unique Index - T1 same random order as T2

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:04.23 | 221K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:04.23 | 221K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:04.02 | 221K|
| 3 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:03.10 | 211K|
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:01.09 | 201K|
---------------------------------------------------------------------------------------------------------------


So, that's interesting: We can already see here the same optimization for the non-unique index kicking in as we saw in 10g with Table Prefetching, although the classic plan shape gets used.

The statistics correspond to the result - but there is a slight difference to the 10.2 Table Prefetching case: The "buffer is pinned count" is at least "self-consistent" for the "non-unique index" variant, so there is no "excess" pinning recorded as with the Table Prefetching.

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
LATCH.cache buffers chains 320,030 342,242 22,212
STAT..consistent gets 310,012 221,124 -88,888
STAT..consistent gets from cache 310,012 221,124 -88,888
STAT..session logical reads 310,012 221,124 -88,888
STAT..Cached Commit SCN referenced 110,000 20,007 -89,993
STAT..index fetch by key 100,000 2 -99,998
STAT..rows fetched via callback 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000
STAT..consistent gets from cache (fastpath) 10,011 121,117 111,106
STAT..no work - consistent read gets 10,001 121,107 111,106
STAT..buffer is not pinned count 200,002 10,028 -189,974
STAT..buffer is pinned count 99,999 289,977 189,978
STAT..consistent gets - examination 300,001 100,007 -199,994


Redundant Filter Optimization

As I've just demonstrated the inner table lookup for the "unique index" variant does not use the buffer pinning optimization. It's an interesting little detail that in 11.1.0.7 and 11.2.0.1 putting a filter on the inner table lookup changes the result for the "unique index" variant, so running a query like this using a redundant filter that doesn't change the overall result:

select
max(b_filler), max(a_filler)
from (
select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */
a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler
from
t2 a
, t1 b
where
a.id = b.id
and substr(b.filler, 1, 1) = 'x'
);


will result in such an output:

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.25 | 220K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.25 | 220K|
| 2 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:03.05 | 220K|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
|* 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:02.12 | 210K|
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.77 | 200K|
---------------------------------------------------------------------------------------------------------------


Session Statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..buffer is pinned count 289,998 289,977 -21
STAT..buffer is not pinned count 10,003 10,028 25
LATCH.JS queue state obj latch 0 36 36
LATCH.row cache objects 67 110 43
STAT..CPU used when call started 59 119 60
STAT..DB time 59 119 60
STAT..CPU used by this session 56 119 63
LATCH.enqueues 2 78 76
LATCH.enqueue hash chains 3 80 77
LATCH.simulator hash latch 9,111 9,304 193
STAT..consistent gets 220,012 221,124 1,112
STAT..consistent gets from cache 220,012 221,124 1,112
STAT..session logical reads 220,012 221,124 1,112
STAT..consistent gets - examination 200,001 100,007 -99,994
STAT..index fetch by key 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000
STAT..consistent gets from cache (fastpath) 20,011 121,117 101,106
STAT..no work - consistent read gets 20,001 121,107 101,106
LATCH.cache buffers chains 240,024 342,248 102,224


The interesting part here is that the "unique index" variant now uses the same buffer pinning optimization as the "non-unique index" one - but resorts to "normal" consistent gets (using the "fastpath" version in this case) for the random table access.

I don't know if this is feature or a side-effect of a bug because it ceases to work in 11.2.0.2 - there the "unique index" variant can not be convinced to make use of the "buffer pinning" optimization, it always performs the "shortcut" logical I/O on in the inner table lookup even with a filter specified.

We'll see later on that this has some interesting consequences with concurrent executions.

Ordered Data Sets

OK, now finally the big one: Let's repeat the test case with data sorted by ID, so by using the ORDER BY ID instead of ORDER BY DBMS_RANDOM.VALUE when populating the tables:

11.2.0.1 Classic Nested Loop - data ordered by ID

Inner row source Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.42 | 122K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.42 | 122K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:03.21 | 122K|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:02.27 | 112K|
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.55 | 12314 |
---------------------------------------------------------------------------------------------------------------


Inner row source Non-Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.64 | 33143 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.64 | 33143 |
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:03.45 | 33143 |
| 3 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.20 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:02.55 | 23133 |
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.67 | 13126 |
---------------------------------------------------------------------------------------------------------------


The result is staggering: The "non-unique" index variant apparently manages to visit 500,000 buffers with just 33K logical I/Os. It is also almost as fast as the "unique index" variant that obviously does not keep the buffers pinned for the inner table random lookup - let's check the session statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..consistent gets from cache (fastpath) 11,059 26,587 15,528
STAT..no work - consistent read gets 11,049 26,577 15,528
LATCH.cache buffers chains 133,384 60,829 -72,555
STAT..consistent gets 122,324 33,149 -89,175
STAT..consistent gets from cache 122,324 33,149 -89,175
STAT..session logical reads 122,324 33,149 -89,175
STAT..index fetch by key 100,000 2 -99,998
STAT..rows fetched via callback 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000
STAT..consistent gets - examination 111,265 5,470 -105,795
STAT..buffer is not pinned count 200,014 10,028 -189,986
STAT..buffer is pinned count 5,107 195,440 190,333


We can tell now different things from these statistics:

- The "non-unique index" variant requires just 60,000 latch acquisitions - which corresponds to the reduced number of logical I/Os

- The session statistics only "explain" 195,000 buffer visits via already pinned and 33,000 buffer visits recorded as logical I/Os, so we are missing approx. 270,000 buffer visits from the statistics. Compared to the results of the "unordered" test case we actually see a "reduction" of buffers visited that are already pinned (199,993 vs. 195,440), so that seems to be questionable

- The "unique index" variant still does the "short-cut" logical I/O on the inner table random lookup and hence requires actually more logical I/O and latch acquisitions in this case than the "non-unique index" variant

As we've seen above if in 11.1.0.7 and 11.2.0.1 a filter is put on the inner table random lookup Oracle 11g switches to "normal" consistent gets for the "unique index" variant, and in fact when repeating this experiment with the ordered data set, we see these results:

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:02.97 | 32315 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:02.97 | 32315 |
| 2 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:02.76 | 32315 |
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.22 | 10010 |
|* 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:01.81 | 22305 |
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.56 | 12302 |
---------------------------------------------------------------------------------------------------------------


So by switching to the "normal" consistent gets the buffer pinning optimization gets used for the inner table lookup also for the "unique index" variant (only reproducible in 11.1.0.7 and 11.2.0.1). The session statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..buffer is not pinned count 10,018 10,028 10
STAT..buffer is pinned count 195,103 195,440 337
STAT..consistent gets 32,315 33,149 834
STAT..consistent gets from cache 32,315 33,149 834
STAT..session logical reads 32,315 33,149 834
STAT..consistent gets from cache (fastpath) 21,050 26,587 5,537
STAT..no work - consistent read gets 21,040 26,577 5,537
STAT..consistent gets - examination 11,265 5,470 -5,795
LATCH.cache buffers chains 53,366 60,835 7,469
STAT..index fetch by key 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000


So we see now the "unique index" variant with similar results and the also a similar "gap" in the buffer visits explained by the statistics.

A slightly funny point is that by adding a "useless" filter we seem to arrive actually at a faster execution time due to the optimization kicking in - something that looks quite counter-intuitive and only seems to work in particular versions.

"Fastpath" consistent gets

To see if this optimization depends on the new "fastpath" consistent gets, let's turn this new feature off by setting "_fastpin_enable" to 0 and restarting the instance:

alter system set "_fastpin_enable" = 0 scope = spfile;


I'm showing here the results for the "inner table filter" variation - but those for the original case without the additional filter are also corresponding to those with "fast pinning" enabled:

11.2.0.1 Classic Nested Loop - data ordered by ID, fast pins disabled, inner table filter

Inner row source Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:02.90 | 32315 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:02.90 | 32315 |
| 2 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:02.70 | 32315 |
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
|* 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:01.79 | 22305 |
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.53 | 12302 |
---------------------------------------------------------------------------------------------------------------


Inner row source Non-Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.86 | 33143 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.86 | 33143 |
| 2 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:03.67 | 33143 |
| 3 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
|* 4 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:02.74 | 23133 |
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.71 | 13126 |
---------------------------------------------------------------------------------------------------------------


So the same optimization kicked in, and we can tell from the session statistics that the "fastpath" consistent gets indeed have not been used:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..buffer is not pinned count 10,018 10,028 10
STAT..buffer is pinned count 195,103 195,440 337
STAT..consistent gets 32,315 33,149 834
STAT..consistent gets from cache 32,315 33,149 834
STAT..session logical reads 32,315 33,149 834
STAT..no work - consistent read gets 21,040 26,577 5,537
STAT..consistent gets - examination 11,265 5,470 -5,795
LATCH.cache buffers chains 53,372 60,829 7,457
STAT..index fetch by key 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000


The only significant difference is the absence of the "consistent gets from cache (fastpath)" statistics.

Nested Loop Join Batching

Finally let's check if the new "Nested Loop Batching" optimization does have any additional effects on the test case here by enabling the Nested Loop Join Batching. Changing the hints like this does the job:

.
.
.
select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 1) no_nlj_prefetch(b) */
.
.
.


11.2.0.1 Nested Loop Batching - data ordered by ID, inner table filter

Inner row source Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:02.89 | 32306 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:02.89 | 32306 |
| 2 | NESTED LOOPS | | 1 | | | 100K|00:00:02.70 | 32306 |
| 3 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:01.43 | 22306 |
| 4 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.20 | 10010 |
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.53 | 12296 |
|* 6 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:00.57 | 10000 |
---------------------------------------------------------------------------------------------------------------


Inner row source Non-Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.05 | 33128 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.05 | 33128 |
| 2 | NESTED LOOPS | | 1 | | | 100K|00:00:02.85 | 33128 |
| 3 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:01.57 | 23128 |
| 4 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.20 | 10010 |
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.67 | 13118 |
|* 6 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:00.57 | 10000 |
---------------------------------------------------------------------------------------------------------------


Apart from some minor differences in the number of logical I/Os it doesn't change the outcome. The same applies to the session statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..buffer is not pinned count 10,006 10,010 4
STAT..buffer is pinned count 195,115 195,458 343
STAT..consistent gets 32,306 33,134 828
STAT..consistent gets from cache 32,306 33,134 828
STAT..session logical reads 32,306 33,134 828
STAT..consistent gets from cache (fastpath) 21,041 26,572 5,531
STAT..no work - consistent read gets 21,031 26,562 5,531
STAT..consistent gets - examination 11,265 5,470 -5,795
LATCH.cache buffers chains 53,348 60,816 7,468
STAT..index fetch by key 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000


What is interesting to see however is that it seems to perform faster, in particular the "non-unique index" variant is now really pretty close to the "unique index" variant - so although the Nested Loop Join Batching doesn't show any significant changes in the statistics and latch acquisition, it seems to save CPU cycles and performs better even without any physical I/O involved.

As a side note, if you want to check the effects of the "Nested Loop Join Batching" on physical I/O you need to be aware of an odd behaviour I've experienced during my tests: If any kind of row source statistics sampling was enabled by either using STATISTICS_LEVEL = ALL, the GATHER_PLAN_STATISTICS hint or even enabling (extended) SQL trace, the optimized, batched form of physical I/O could not be reproduced. You could tell this from the session statistics that start with "Batched IO%" - these all stayed at 0. Only when disabling all these things the effects were visible and the corresponding statistics where non-zero. I don't know why this is the case, but it is an important detail when testing this feature. I'll probably publish a separate post on the physical I/O optimizations of the Vector/Batched I/O at some time in the future.

Scalability

When running the "data ordered by ID" version concurrently it can be seen that the "non-unique index" variant scales now almost equally well as the "unique index" variant - so these two variants are now quite close not only in single-user mode, but they both scale very well, too.

There is another interesting effect that can only be observed when running the test case with the unordered data set concurrently: In recent code releases (10.2.0.5, 11.2.0.1 and 11.2.0.2) the "shortcut" consistent gets on the inner table lookup that are used with the "unique index" variant gets "downgraded" to "normal" consistent gets if there is concurrent access to the block. This can be observed in the session statistics and latch acquisitions:

Statistics Name Value
----------------------------------------------------- -----------
STAT..shared hash latch upgrades - no wait 99,995
STAT..RowCR attempts 100,000
STAT..RowCR hits 100,000
STAT..consistent gets from cache (fastpath) 10,011
STAT..no work - consistent read gets 10,000
STAT..consistent gets - examination 200,013
STAT..consistent gets 310,024
STAT..consistent gets from cache 310,024
LATCH.cache buffers chains 1,680,173


Note in particular how the "consistent gets - examination" statistics have been decreased from 300,000 to 200,000. So with four concurrent executions this "unique index" variant suddenly requires approx. 420,000 latch acquisitions per execution in contrast to the usual 320,000. Since 11.2.0.2 does not support the "filter" trick to make use of the buffer pinning optimization for the inner table lookup with the ordered data set and the "unique index" variant, it suffers twice: Not only it requires single latch acquisitions for the inner table lookup but due to the "downgrade" it performs two latch acquisitions per iteration requiring a whopping 200,000 excess latch acquisitions per concurrent execution with the ordered by ID data set.

It's also interesting to note that the "RowCR" optimization is recorded in the session statistics. I couldn't find much information about this - it seems to be in the code since 10.2 (partial support already in 9.2 RAC), but until 10.2.0.5 it is only enabled in RAC mode and not in single-instance mode (see MOS note "Bug 4951888 - Row CR is not enabled for non RAC systems"). I could reproduce this only in 10.2.0.5, 11.2.0.1 and 11.2.0.2. According to the description it has been specifically introduced for using row-level consistent gets instead of rolling back complete block versions for read-consistency in RAC environments where generating the previous version of a block might require undo blocks from remote instances. Why this optimization shows up in the above single-instance, read-only scenario where no rollback to the block version is required is not clear to me. It is however measurable that the "fallback" seems to slow down execution.

Whether this is a side-effect or a deliberate design choice that performs better in RAC environments or certain consistent read scenarios I can't tell yet, however when switching off this optimization via "alter system set "_row_cr" = false" this "downgrade" with concurrent execution doesn't happen any longer, and 11.2.0.2 performs better in my test cases, although it doesn't bring back the "filter" trick, so 11.2.0.2 is the only release where the "non-unique index" variant scales better with the ordered data set than the "unique index" variant.

A final word on scalability in general: I think it is important to point out that the test harness provided so far only checks for concurrent read access. Since it is interesting to see if the "buffer pinning" optimization observed does have any negative side effects on mixed read/write access to the buffers I've published an updated script set that includes new versions of the concurrent execution master and slave scripts. These allow to run a SELECT FOR UPDATE on both tables involved as first session, and all other sessions in read-only mode in order to test the effects of a mixed read/write concurrency scenario.

The result of this quite simple test shows that the buffer pinning optimization not only scales very well for read-only concurrency but also scales very good for the tested mixed read-write scenario. The provided test case might be a specific and simplistic case (there are some specialities with SELECT FOR UPDATE) and there might be other concurrency scenarios where the buffer pinning might not scale that well (for example potentially "free buffer waits" due to many blocks being pinned) but at least with this test case the result is quite impressive.

As a side note, the mixed read-write test is very interesting on its own in several ways, for example:

- It adds additional pressure on the buffer cache due to clone copies created. A query similar to the one provided by Jonathan Lewis here can be quite revealing. You'll find out that you need a much larger cache to still have a fully cached test case (with 8KB block size at least 512MB for keeping two 80MB segments! fully cached)

- It requires additional buffer cache for the undo blocks

- It will generate a much higher contention on the "cache buffers chains" latches due to the additional buffer cache activity (creating clone copies, rollbacks for consistent reads, current mode gets etc.)

- It requires applying undo to the blocks to arrive at a read-consistent version

- The buffers will have to be accessed in exclusive mode for write access

The updated script set also contains an Excel sheet with results from my test runs on different hardware and Oracle versions as well as a sample query to analyse the buffer cache.

Summary

Oracle 11g extends the logical I/O optimizations that could already been seen in Oracle 10g when using the Table Prefetching Nested Loop shape - and it is available without any further optimizations like Table Prefetching or Nested Loop Join Batching. It is also not depending on the new "fastpath" consistent gets introduced with 11g.

The efficiency of the optimization largely depends on the order of the data, so predicting it is not that easy - a bit similar to predicting the efficiency of the Subquery / Filter caching feature that also depends on data patterns.

However this knowledge might offer additional options how to take advantage of this optimization. Of course introducing additional sort operations might easily outweigh the benefits achieved, but there might be cases where a sort is not that costly and allows to improve scalability/concurrency in extreme cases.

Closing remarks

This blog post got way too long

Monday, August 8, 2011

Multi-Column Joins, Expressions and 11g

Introduction

I've already outlined in one of my previous posts that getting a reasonable cardinality estimate for multi-column joins can be tricky, in particular when dealing with correlated column values in the join columns.

Since Oracle 10g several "Multi-Column Join Cardinality" sanity checks have been introduced that prevent a multi-column join from producing too low join cardinalities - this is controlled via the "_optimizer_join_sel_sanity_check" internal parameter that defaults to true from 10g on.

It looks like that if you upgrade to 11g this version adds just another twist to this issue. If you happen to have expressions as part of your join predicates then in 10g these are still covered by the multi-column join cardinality sanity checks as long as at least one side of the join refers to simple columns, but this seems no longer to be the case from 11g on.

Note that if those expressions are already covered by corresponding function-based indexes in pre-11g then this problem will not show up as described here - in fact, adding corresponding indexes is one of the possible fixes as I'll outline below.

A working example

Let's have a look at a simple example to demonstrate the potential upgrade issue. This code snippet creates a table with 1000 rows - ID_50 and ID_CHAR_50 both hold 50 distinct values and the two columns are correlated.

create table t
as
select
rownum as id
, mod(rownum, 50) + 1 as id_50
, 'ABC' || to_char(mod(rownum, 50) + 1) as id_char_50
, case when mod(rownum, 2) = 0 then null else mod(rownum, 100) + 1 end as id_50_null
, case when mod(rownum, 2) = 0 then null else 'ABC' || to_char(mod(rownum, 100) + 1) end as id_char_50_null
from
dual
connect by
level <= 1000
;

exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1')


In 10.2.0.4 if you check the cardinality estimates of the following query you'll see the "Multi-Column Join Cardinality" check kicking in:

explain plan for
select /*+ optimizer_features_enable('10.2.0.4') */
/* opt_param('_optimizer_join_sel_sanity_check', 'false') */
count(*)
from
t t1
, t t2
where
t1.id_50 = t2.id_50
and t1.id_char_50 = t2.id_char_50
;

select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 1000 |
| 3 | TABLE ACCESS FULL| T | 1000 |
| 4 | TABLE ACCESS FULL| T | 1000 |
--------------------------------------------


If you activate the commented hint to disable the sanity check, you'll end up with a different estimate that corresponds simply to the selectivity of each single join predicate multiplied: 1/50 * 1/50 * 1000 * 1000 = 400.

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 400 |
| 3 | TABLE ACCESS FULL| T | 1000 |
| 4 | TABLE ACCESS FULL| T | 1000 |
--------------------------------------------


If now one or more expressions get introduced on one side of the join, in 10.2.0.4 the result will still correspond to the one with the sanity check enabled:

explain plan for
select /*+ optimizer_features_enable('10.2.0.4') */
/* opt_param('_optimizer_join_sel_sanity_check', 'false') */
count(*)
from
t t1
, t t2
where
t1.id_50 = case when t2.id_50 is null then -1 else t2.id_50 end
and t1.id_char_50 = t2.id_char_50
;

select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 1000 |
| 3 | TABLE ACCESS FULL| T | 1000 |
| 4 | TABLE ACCESS FULL| T | 1000 |
--------------------------------------------


But if the same is repeated in 11.1 or 11.2, you'll end up with this result - as you can see the sanity checks have not been used and we get the same result as in 10.2.0.4 with disabled sanity checks - by the way, depending on the expressions (and on which sides of the join they get used), you might even end up with a different cardinality estimate based on default selectivities like 1/100 - this is controlled via the "_use_column_stats_for_function" parameter that defaults to true in recent releases and therefore some simpler join expressions still use the underlying column statistics:

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 400 |
| 3 | TABLE ACCESS FULL| T | 1000 |
| 4 | TABLE ACCESS FULL| T | 1000 |
--------------------------------------------


This change in behaviour can lead to dramatic changes in the cardinality estimates and hence to different execution plans - potentially performing much worse than before. The change in this example here is not that significant but it can easily lead to very low cardinality estimates if the join columns do have a high number of distinct values.

New 11g Features

I believe the issue has been introduced by the new Oracle 11g feature of virtual columns and extended statistics on expressions and column groups. In fact these new features provide a possible workaround for the issue: By creating a corresponding virtual column or extended statistics on the expressions used as part of the join the sanity check can be re-enabled in 11g.

exec dbms_stats.gather_table_stats(null, 't', method_opt =>'for columns (case when id_50 is null then -1 else id_50 end) size 1')


or alternatively:

alter table t add (virtual_col1 as (case when t2.id_50 is null then -1 else t2.id_50 end));

exec dbms_stats.gather_table_stats(null, 't', method_opt =>'for columns virtual_col1 size 1')


As already outlined above, another possible workaround is adding a corresponding function-based index:

create index t_idx_func1 on t (case when id_50 is null then -1 else id_50 end);


Since adding a function-based index adds a similar hidden virtual column to the table as the extended statistics does the net effect will be the same but of course with the additional overhead of maintaining the index.

Column Groups with Expressions - Correlated Column Values

Of course if we really would like to take advantage of the new features with correlated column values what we should try to do is creating a column group on the combined expressions to allow the optimizer to detect the correlation, but unfortunately mixing expressions/virtual columns with column groups is explicitly mentioned in the documentation as not supported (yet), which can be confirmed:

SQL> exec dbms_stats.gather_table_stats(null, 't', method_opt =>'for columns (case when id_50 is null then -1 else id_50 end, id_char_50) size 1')
BEGIN dbms_stats.gather_table_stats(null, 't', method_opt =>'for columns (case when id_50 is null then -1 else id_50 end, id_char_50) size 1'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension - missing right parenthesis

SQL> exec dbms_stats.gather_table_stats(null, 't', method_opt =>'for columns (virtual_col1, id_char_50) size 1')
BEGIN dbms_stats.gather_table_stats(null, 't', method_opt =>'for columns (virtual_col1, id_char_50) size 1'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension - virtual column is referenced in a
column expression
ORA-06512: at "SYS.DBMS_STATS", line 20337
ORA-06512: at "SYS.DBMS_STATS", line 20360
ORA-06512: at line 1


Yet the strange thing is that the desired effect can easily be achieved by adding a corresponding multi-column function-based index like this:

create index t_idx3 on t (case when id_50 is null then -1 else id_50 end, id_char_50);


So this is one area where virtual columns / extended statistics are not yet equivalent to function-based indexes. However there is one significant difference between the index and the extended statistics column group approach: Whereas the former can be used to derive the number of distinct values if the index is an exact match to the column group the latter creates a virtual column combining the columns into a single expression using the undocumented SYS_OP_COMBINED_HASH function. Histograms can be generated on that virtual column which can be helpful in the case of correlated and skewed column values. Note that in my tests the join cardinality calculation based on column groups did not take any existing histograms on the virtual column into account, however single-table access predicates could make use of the histogram - but it's something that might be possible in future releases, but it's not possible to derive that information from the index on the column group.

Repeat above EXPLAIN PLAN now, first with 10.2.0.4 optimizer settings:

explain plan for
select /*+ optimizer_features_enable('10.2.0.4') */
count(*)
from
t t1
, t t2
where
t1.id_50 = case when t2.id_50 is null then -1 else t2.id_50 end
and t1.id_char_50 = t2.id_char_50
;

select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 1000 |
| 3 | TABLE ACCESS FULL| T | 1000 |
| 4 | TABLE ACCESS FULL| T | 1000 |
--------------------------------------------


No change, however, if you repeat the same now with 11.1 or 11.2 optimizer settings:

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 20000 |
| 3 | TABLE ACCESS FULL| T | 1000 |
| 4 | TABLE ACCESS FULL| T | 1000 |
--------------------------------------------


In this particular case the 11g cardinality estimate with the multi-column function-based index is spot on.

As already explained in the previous post Oracle 11g does now take advantage of indexes that 10g didn't - in 10g this required unique indexes.

Although this is good news, and the cardinality estimates in general should change for the better, it still means that even with suitable indexes in place you might end up with significant cardinality estimate changes after the upgrade to 11g that require testing.

The Single-Column Workarounds

With a single, non-combined statistics expression (using one of the methods shown above) in 11g we are at least back to the 10.2.0.4 cardinality estimate with the sanity checks enabled:

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 1000 |
| 3 | TABLE ACCESS FULL| T | 1000 |
| 4 | TABLE ACCESS FULL| T | 1000 |
--------------------------------------------


Interestingly if the column group is covered by an index then in 11g the sanity check is also still enabled - and the order and position in the index apparently doesn't matter in this particular case, it just has to be an index covering the columns/expressions used - possibly among other columns/expressions.

Of course this workaround can have other side effects: First of all you introduce more work because DBMS_STATS needs to gather statistics for the underlying virtual columns added - and if want to use extended statistics on expressions rather than virtual columns you can only have a limited number of statistics extensions per table (I'm not sure why this restriction exists and it can be worked around by using virtual columns instead). Also the additional virtual columns count towards the hard limit of 1,000 columns per table.

Furthermore if you happen to use the same expressions as filter predicates the cardinality estimates very likely will again change with the workaround in place - mind you, it will probably lead to improved cardinality estimates, but nevertheless it means a change that needs to be tested.

Here is a cardinality estimate for the sample join expression used as filter without the workaround:

explain plan for
select count(*) from t where case when id_50 is null then -1 else id_50 end = :b1;

select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));

-------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | TABLE ACCESS FULL| T | 10 |
-------------------------------------------


But with the workaround in place:

explain plan for
select count(*) from t where case when id_50 is null then -1 else id_50 end = :b1;

select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));

-------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | TABLE ACCESS FULL| T | 20 |
-------------------------------------------


More Complex Expressions

The new features allow 11g to use the sanity checks (or cardinality estimates derived from index statistics) even in cases where 10g would not be able to use them - if you for example happen to have expressions on both sides of the join, in 10g the sanity checks get disabled, but using the corresponding expressions in 11g allows to have the sanity checks enabled.

For more complex expressions - that are probably a clear indication of a design issue - the 11g extended statistics/virtual columns also allow get improved cardinality estimates in general - 10g would resort to some hard-coded selectivity like 1/100 for equi-joins - 11g would be able to cover that as well:

explain plan for
select /* optimizer_features_enable('10.2.0.4') */
/* opt_param('_optimizer_join_sel_sanity_check', 'false') */
count(*)
from
t t1
, t t2
where
nvl(t1.id_50 + t1.id_50_null, -1) = nvl(t2.id_50 + t2.id_50_null, -1)
and nvl(t1.id_char_50 || t1.id_char_50_null, 'bla') = nvl(t2.id_char_50 || t2.id_char_50_null, 'bla')
;

select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 100 |
| 3 | TABLE ACCESS FULL| T | 1000 |
| 4 | TABLE ACCESS FULL| T | 1000 |
--------------------------------------------


Expressions like that will disable the sanity check and also not use any underlying column statistics and therefore fall back to built-in, hard-coded defaults.

But when creating corresponding extended statistics / virtual columns / multi-column function-based indexes in 11g we are able to take advantage of the sanity checks (at least) and get improved cardinality estimates in general:

exec dbms_stats.gather_table_stats(null, 't', method_opt =>'for columns (nvl(id_50 + id_50_null, -1)) size 1')

exec dbms_stats.gather_table_stats(null, 't', method_opt =>'for columns (nvl(id_char_50 || id_char_50_null, ''bla'')) size 1')

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 1000 |
| 3 | TABLE ACCESS FULL| T | 1000 |
| 4 | TABLE ACCESS FULL| T | 1000 |
--------------------------------------------


Summary

Expressions used as part of multi-column join predicates can cause a lot of trouble when upgrading from 10g to 11g - fortunately there are viable workarounds available if you want to keep the optimizer features of 11g enabled - in other words restricting to 10g optimizer features is of course also a workaround but usually not a desired long-term solution.

Note that there are cases where multi-column function-based indexes offer improved cardinality estimates in 11g over that of virtual columns or extended statistics - but at the price of maintaining an additional (potentially wide) index, requiring additional storage and at the risk of other plans changing by either using the index or being indirectly influenced by the additional index statistics available.

Some of the side-effects of the additional index could be addressed by leaving such an index in unusable state, but this again might have other, undesirable side effects, like statistics gathering jobs failing with error messages about unusable indexes etc.

Monday, August 1, 2011

HCC And Virtual Columns

This is just a short heads-up note to those dealing with HCC-enabled tables (so at present this applies only to Exadata customers).

As already outlined in a previous post about compression restrictions tables with HCC enabled do not support dropping columns - DROP COLUMN gets silently converted into SET UNUSED and DROP UNUSED COLUMNS throws an error to be unsupported.

I've recently come across an interesting variation of this restriction. Obviously Oracle treats virtual columns in this case the same: If you drop a virtual column of a HCC-enabled table it doesn't get dropped but is also silently turned into an unused column - which doesn't really make sense to me since dropping it doesn't require any physical modification to the underlying data structures.

Now you might wonder why this could be relevant? Well it can be important for several reasons:

1. All the unused columns whether virtual or not count towards the 1,000 column limit of a table - so frequently adding and dropping virtual columns is a no-brainer with non-HCC tables, but can become relevant with HCC enabled

2. Extended Statistics also use under the covers virtual columns. So if you create and drop extended statistics the same happens - the dropped virtual columns stay there. What is even more annoying - there is an upper limit of number of extensions per table. The limit itself is defined in a quite interesting way (greatest(20, 10% non-virtual columns)), but the problem here is that the dropped extensions count towards this limit, so you can easily end up with a situation where you cannot add any further extended statistics but you can't see any of them in the DBA/ALL/USER_STAT_EXTENSIONS dictionary view. What you can see however in DBA/ALL/USER_TAB_COLS are the remaining dropped virtual hidden columns

Since you can't drop unused columns on HCC-enabled tables there is no easy way around this apart from uncompressing the table/all table partitions, dropping the unused columns and re-compressing - nothing you usually want to/can do with HCC-compressed segments...

Note by the way that this nuisance doesn't affect exchange partition operations. Virtual columns are correctly handled in case of exchange partition operations - which means that only the physical column definitions need to be in sync between the two segments exchanged, but not any virtual columns. You can happily exchange partitions between tables with different number and types of virtual columns.