A statistic (on a table, a column / a set of columns or on an index) provides information on:
– the table (number of lines, average size of a line, number of blocks/page, density),
– the distribution of values for the column(s) (number of distinct values, number of duplicates, existence of null values, histogram of distribution of values),
– the composition of the index (depth, number of blocks, cluster ratio).
These statistics are used by the database optimizer to study the different execution plans that can be used for a query.
The optimizer (based on costs / CBO) chooses the best execution plan (least expensive) from the data calculated from these statistics on, in particular, the number of rows and blocks of data to be shuffled at each stage of these plans (access strategies).
Statistics are essential for the proper calculation of query execution plans. It is therefore essential that they are up to date to take into account changes to the data and therefore to the indexes, columns and tables.
Thanks to update statistics, the optimizer has reliable data to choose the best performing execution plan to serve data to applications.