Sunday, July 15, 2007

SQL Query optimization

To begin optimization, select the SQL query that
- returns max number of rows
- has most number of executions
- most I/O

Optimizations:
- The basic tenet is to avoid full table scans. Add an index to the fields mentioned in the where clause. Indexs are created as an additional file typically that the database can consult to filter the table based on the values of the corresponding field. Be careful, don't add to many indexes as every index requires an additional change to the database whenever data is inserted in the table. The indexs are typically stored as BTree indexes.

Use EXPLAIN Select query... to examine if the database is using the indexes correctly.

If you do any kind of operation on a field in the where clause, the full table is scanned since the field value for each record needs to be computed to be compared. As a result the index on such a field is a waste. Try using the field as is.

- Joins should be ordered so that the table that has the max rows to be processed is filtered the most. The order is also imp. If you are joining A, B and C where C is 10 rows and A and B are 100000000 rows each, then it makes sense to join B and C first instead of joinging A and C first.
Apply max filtering to table that has max rows.


Other Database trivia:
1NF - normalization in which repeated fields (score1, team1, score2, team2) are moved into their own table. One to many relationship.
2NF - normalization in which repeated values (publisher and publisher address for a list of books) are moved into their own table. Many to Many relationship.
3NF - TO be written.

No comments: