Deferred Statistics, Dynamic Sampling & Extended Statistics:
Lets explore how the deferred statistics works and what impact does the dynamic sampling has on the optimizer when the statistics are missing.
SQL> show user
USER is "ARCHIVE"
SQL> create table stats (c1 number,c2 number,c3 number,c4 number,c5 number,
2 c6 number,c7 number,c8 number,c9 number);
Table created.
SQL> begin
for a in 1..5
loop
for b in 1..100 loop
insert into stats values (a,a,a,a,a,b,b,b,b);
end loop;
end loop;
end;
/
PL/SQL procedure successfully completed.
SQL> select count(*) from stats;
COUNT(*)
----------
500
SQL> commit;
Commit complete.
SQL> @sql.sql
SQL> set lines 100
SQL> set linesize 100
SQL> select table_name,last_analyzed,sample_size,num_rows from user_tables where table_name='STATS';
TABLE_NAME LAST_ANAL SAMPLE_SIZE NUM_ROWS
------------------------------ --------- ----------- ----------
STATS
SQL> select index_name from user_indexes where table_name='STATS';
no rows selected
SQL> desc user_tab_pending_stats
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
TABLE_NAME VARCHAR2(30)
PARTITION_NAME VARCHAR2(30)
SUBPARTITION_NAME VARCHAR2(30)
NUM_ROWS NUMBER
BLOCKS NUMBER
AVG_ROW_LEN NUMBER
SAMPLE_SIZE NUMBER
LAST_ANALYZED DATE
SQL> select table_name,last_analyzed,sample_size,num_rows from user_tab_pending_stats;
no rows selected
Default method_opt options doesn’t create the histograms on the columns:
SQL> exec dbms_stats.gather_table_stats('ARCHIVE','STATS');
PL/SQL procedure successfully completed.
SQL> select table_name,last_analyzed,sample_size,num_rows from user_tables where table_name='STATS';
TABLE_NAME LAST_ANAL SAMPLE_SIZE NUM_ROWS
------------------------------ --------- ----------- ----------
STATS 16-AUG-15 500 500
SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';
TABLE_NAME COLUMN_NAME LAST_ANAL SAMPLE_SIZE HISTOGRAM
------------------------------ ------------------------------ --------- ----------- ---------------
STATS C1 16-AUG-15 500 NONE
C2 16-AUG-15 500 NONE
C3 16-AUG-15 500 NONE
C4 16-AUG-15 500 NONE
C5 16-AUG-15 500 NONE
C6 16-AUG-15 500 NONE
C7 16-AUG-15 500 NONE
C8 16-AUG-15 500 NONE
C9 16-AUG-15 500 NONE
9 rows selected.
SQL> select table_name,column_name,last_analyzed,sample_size from user_col_pending_stats where table_name='STATS';
no rows selected
Delete the stats:
SQL> exec dbms_stats.delete_table_stats('ARCHIVE','STATS');
PL/SQL procedure successfully completed.
SQL> select table_name,column_name,last_analyzed,sample_size from user_col_pending_stats where table_name='STATS';
no rows selected
SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';
no rows selected
SQL> select dbms_stats.get_prefs('publish','archive','stats') publish from dual;
PUBLISH
--------------------------------------------------------------------------------
TRUE
SQL> exec dbms_stats.set_table_prefs('ARCHIVE','STATS','PUBLISH','FALSE');
PL/SQL procedure successfully completed.
SQL> select dbms_stats.get_prefs('publish','archive','stats') publish from dual;
PUBLISH
--------------------------------------------------------------------------------
FALSE
SQL> exec dbms_stats.gather_table_stats('ARCHIVE','stats',method_opt=>'for all columns size 1 for columns c1 size 254 for columns c2 size 254');
PL/SQL procedure successfully completed.
SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';
no rows selected
SQL> select table_name,column_name,last_analyzed,sample_size from user_col_pending_stats where table_name='STATS';
TABLE_NAME COLUMN_NAME LAST_ANAL SAMPLE_SIZE
------------------------------ ------------------------------ --------- -----------
STATS C1 16-AUG-15 500
STATS C2 16-AUG-15 500
STATS C3 16-AUG-15 500
STATS C4 16-AUG-15 500
STATS C5 16-AUG-15 500
STATS C6 16-AUG-15 500
STATS C7 16-AUG-15 500
STATS C8 16-AUG-15 500
STATS C9 16-AUG-15 500
9 rows selected.
SQL> show parameter pending
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_use_pending_statistics boolean FALSE
SQL>
SQL> show parameter dynamic
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling integer 2
SQL> alter session set optimizer_dynamic_sampling=0;
Session altered.
With dynamic sampling disabled and without any statistics on the table, optimizer makes a wrong guess on the number of rows.
Actual number of rows that satisfy the condition c1=1 & c2=1 are 100:
SQL> set autot on exp
SQL> select count(*) from stats where c1=1 and c2=1;
COUNT(*)
----------
100
Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 26 | | |
|* 2 | TABLE ACCESS FULL| STATS | 1 | 26 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=1 AND "C2"=1)
Turn on Dynamic sampling:
Optimizer guessed the number of rows correctly when using the dynamic sampling:
SQL> alter session set optimizer_dynamic_sampling=2;
Session altered.
SQL> select count(*) from stats where c1=1 and c2=1;
COUNT(*)
----------
100
Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 26 | | |
|* 2 | TABLE ACCESS FULL| STATS | 100 | 2600 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=1 AND "C2"=1)
Note
-----
- dynamic sampling used for this statement (level=2)
Turn off dynamic sampling:
& Turn on optimizer_use_pending_statistics:
Optimizer guessed the number of rows wrong even with the pending statistics:
SQL> alter session set optimizer_dynamic_sampling=0;
Session altered.
SQL> alter session set optimizer_use_pending_statistics=true;
Session altered.
SQL> select count(*) from stats where c1=1 and c2=1;
COUNT(*)
----------
100
Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| STATS | 20 | 120 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=1 AND "C2"=1)
How the dynamic sampling affects the optimizer and its options:
Level | When Dynamic Sampling will be used | Sample size (blocks) |
0 | Switches off dynamic sampling | N/A |
1 | At least one non-partitioned table in the statement has no statistics | 32 |
2 (default) | One or more tables in the statement have no statistics | 64 |
3 | Any statement that meets level 2 criteria and any statement that has one or more expressions used in the where clause predicates e.g. Where substr(CUSTLASTNAME,1,3) or Where a + b =5 | 64 |
4 | Any statement that meets level 3 criteria and any statement that has complex predicates. An OR or AND operator between multiple predicates on the same table | 64 |
5 | Any statement that meets level 4 criteria | 128 |
6 | Any statement that meets level 4 criteria | 256 |
7 | Any statement that meets level 4 criteria | 512 |
8 | Any statement that meets level 4 criteria | 1024 |
9 | Any statement that meets level 4 criteria | 4086 |
10 | All statements | All Blocks |
Create extended statistics:
We know that c1 and c2 are corelated and has the same data so lets create extended statistics on these two columns so that optimizer can guess and report the num of rows efficiently.
Extended statistics can help the Optimizer improve the accuracy of cardinality estimates for queries that as predicates with a function(e.g. UPPER(LastName)) or multiple columns from the same table used in filter predicates,
By creating extended statistics on a group of columns, the Optimizer can determine accurate cardinality estimate when the columns are used together as the predicates.You can use DBMS_STATS.CREATE_EXTENDED_STATS to define the column group you want to have statistics gathered on as a whole. Once the group has been established Oracle will automatically maintain the statistics on that column group when statistics are gathered on the table.
SQL> SELECT dbms_stats.create_extended_stats('ARCHIVE','STATS','(C1,C2)') FROM DUAL;
DBMS_STATS.CREATE_EXTENDED_STATS('ARCHIVE','STATS','(C1,C2)')
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SYS_STUF3GLKIOP5F4B0BTTCFTMX0W
SQL> select * from user_stat_extensions;
TABLE_NAME EXTENSION_NAME EXTENSION CREATO DRO
------------------------------ ------------------------------ -------------------------------------------------------------------------------- ------ ---
STATS SYS_STUF3GLKIOP5F4B0BTTCFTMX0W ("C1","C2") USER YES
SQL> exec dbms_stats.gather_table_stats('ARCHIVE','STATS',method_opt=>'for all columns size 1 for columns(c1,c2) size 254');
PL/SQL procedure successfully completed.
SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';
no rows selected
SQL> select table_name,column_name,last_analyzed,sample_size from user_col_pending_stats where table_name='STATS';
TABLE_NAME COLUMN_NAME LAST_ANAL SAMPLE_SIZE
------------------------------ ------------------------------ --------- -----------
STATS C2 16-AUG-15 500
STATS C3 16-AUG-15 500
STATS C4 16-AUG-15 500
STATS C5 16-AUG-15 500
STATS C6 16-AUG-15 500
STATS C7 16-AUG-15 500
STATS C8 16-AUG-15 500
STATS C9 16-AUG-15 500
STATS C1 16-AUG-15 500
STATS SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 16-AUG-15 500
10 rows selected.
SQL> select count(*) from stats where c1=1 and c2=1;
COUNT(*)
----------
100
Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| STATS | 100 | 600 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=1 AND "C2"=1)
Woow.. Optimizer picked up the correct cardinality now...
Let’s publish the stats:
SQL> select dbms_stats.get_prefs('publish','archive','stats') publish from dual;
PUBLISH
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TRUE
SQL> exec dbms_stats.set_table_prefs('ARCHIVE','STATS','PUBLISH','TRUE');
PL/SQL procedure successfully completed.
SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';
no rows selected
SQL> exec dbms_stats.gather_table_stats('ARCHIVE','STATS');
PL/SQL procedure successfully completed.
SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';
TABLE_NAME COLUMN_NAME LAST_ANAL SAMPLE_SIZE HISTOGRAM
------------------------------ ------------------------------ --------- ----------- ---------------
STATS C1 16-AUG-15 500 FREQUENCY
STATS C2 16-AUG-15 500 FREQUENCY
STATS C3 16-AUG-15 500 NONE
STATS C4 16-AUG-15 500 NONE
STATS C5 16-AUG-15 500 NONE
STATS C6 16-AUG-15 500 NONE
STATS C7 16-AUG-15 500 NONE
STATS C8 16-AUG-15 500 NONE
STATS C9 16-AUG-15 500 NONE
STATS SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 16-AUG-15 500 FREQUENCY
10 rows selected.
Now the stats are published and will be picked up by the optimizer
Let’s reset the dynamic sampling value back to the default value:
SQL> alter session set optimizer_dynamic_sampling=2;
Session altered.
SQL> set autot on exp
SQL> select count(*) from stats where c1=1 and c2=1;
COUNT(*)
----------
100
Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| STATS | 100 | 600 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=1 AND "C2"=1)
Good article on method_opt:
https://blogs.oracle.com/optimizer/entry/how_does_the_method_opt