During the years I have a lot of times encountered performance problems that ended up to be fast queries (less than 10ms) which are executed very much. In these situations the execution plans, from such queries can look like:
OPERATION OPTIONS OBJECT# NAME
——————– ——————– ———- ———————————–
SELECT STATEMENT .
SORT ORDER BY .
TABLE ACCESS BY INDEX ROWID 120713 XXX.TABLE_WITH_MANY_COLUMNS
INDEX RANGE SCAN 121558 XXX.INDEX_WITH_FEW_COLUMNS
In a lot of cases we deal with a query for example which is returning fewer columns than exists in the involved table like four, from an involved table having twenty columns. Although the execution plan looks already pretty “OK”, it still results in a query which is in top three most resource taking queries.
To enhance the query we can use “Fat Indexes” or nowadays better known “Tapio Indexes” (I call them like that nowadays). What are these kind of indexes… In principle nothing really new, but for me a couple of years ago an (re)eye opener by Tapio Lahdenmaki. This Finish independent database performance consultant and instructor gave a presentation at Miracle Open World 2007 in Denmark, I attended. Tapio can present this topic in a very special way, and in my opinion “his message” should be known by all developers and dba’s.
The most important thing of “his message” is the fact that we should create an index which is highly selective and contains all the columns asked in the query. So in principle we create an index with forexample two selective columns, with the most selective one in front and behind these two columns the other selected columns. When creating such an index an execution plan would look like:
OPERATION OPTIONS OBJECT# NAME
——————– ——————– ———- ——————————–
SELECT STATEMENT .
SORT ORDER BY .
INDEX RANGE SCAN 128140 XXX.IND_WITH_ALL_SELECT_COLS
The big advantage is that only the index is accessed and not the table anymore, what can result in a decrease of elapsed time for the query with a factor four or even more. Why this is much better? Let we go back how indexes and tables are stored on the physical storage.
An index is stored in an ordered way on the storage, so an index range scan will mean that data is retrieved from the storage in a sequential read from a storage point of view and so a read-head from a spindle can stay on almost the same place. This eliminates seek times and latency on the physical storage spindles.
A Table is stored in a non ordered way on the storage, so accessing table rows will almost always result in accessing the data in a scattered read from a storage point of view. This implies that a lot of seek and read times are wasted.
Of course the existence of caches makes it a bit different but still the basic principles are valid, this means also for striping.
So when only an index is accessed instead of index and table, we don’t waste read times for columns we don’t need (table rows), and we eliminate a lot of seek times for retrieving the different table rows.
If you want to know more about this kind of index design, you can read the book “Relational Database Index Design and the Optimizers” from Tapio Lahdenmaki.