Just Barebones: Oracle Query optimization

We are working on a RoR application that needs to crunch data across multiple tables (> 7 million records) from a huge Oracle database. The sql queries were initially taking a long time, upto 5 minutes for some queries.
Following are some links and tips to optimize the sql queries - specific to Oracle.

I added the following snippet to the environment.rb which gives me some stats for the queries:

# only run this code in development
if ENV["RAILS_ENV"] == "development"

# modify MysqlAdapter to track transfer stats
class ActiveRecord::ConnectionAdapters::OracleAdapter
@@stats_queries = @@stats_bytes = @@stats_rows = 0

def self.get_stats
{ :queries => @@stats_queries,
:rows => @@stats_rows,
:bytes => @@stats_bytes }
end

def self.reset_stats
@@stats_queries = @@stats_bytes = @@stats_rows = 0
end

def select_with_stats(sql, name)
bytes = 0
rows = select_old(sql, name)
rows.each do |row|
row.each do |key, value|
bytes += key.length
bytes += value.length if value && value.respond_to?('length')
end
end
@@stats_queries += 1
@@stats_rows += rows.length
@@stats_bytes += bytes
@logger.info sprintf("%d rows, %.1fk", rows.length, bytes.to_f / 1024)
rows
end

alias :select_old :select
alias :select :select_with_stats
end

# modify ActionController to reset/print stats for each request
class ActionController::Base
def perform_action_reset
ActiveRecord::ConnectionAdapters::OracleAdapter::reset_stats
perform_action_old
end

alias :perform_action_old :perform_action
alias :perform_action :perform_action_reset

def active_record_runtime(runtime)
stats = ActiveRecord::ConnectionAdapters::OracleAdapter::get_stats
"#{super} #{sprintf("%.1fk", stats[:bytes].to_f / 1024)}"
end
end

# trim blob logging
#class ActiveRecord::ConnectionAdapters::OracleAdapter
#def format_log_entry(message, dump = nil)
#if dump
#dump = dump.gsub(/x.([^.]+)./) do |blob|
#(blob.length > 32) ? "x.#{$1[0,32]}. (#{blob.length} bytes)." : $0
#end
#end
#super
#end
#end

end

Check out this link on Oracle.com and download the Oracle version of query_analyzer. This plugin automatically runs an explain plan query for the select query.
Analyzing
plan_table_output
--------------------------------------------------------------------

--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 31 | 8598 |
| 1 | SORT UNIQUE | | 1 | 31 | 8598 |
| 2 | TABLE ACCESS FULL | XXX | 128 | 3968 | 8583 |
--------------------------------------------------------------------
First level of optimization: the output of the explain plan shows you which tables columns in the WHERE clause cause a FULL table access. These columns are targets for optimizations by dropping indexes on them. This will result in a explain plan output as shown below. Note the INDEX RANGE SCAN bit.
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 675 | 332 |
| 1 | TABLE ACCESS BY INDEX ROWID| XX_XXXXX | 25 | 675 | 332 |
| 2 | INDEX RANGE SCAN | XX_X_INDEX1 | 430 | | 4 |
----------------------------------------------------------------------------
However, being too aggressive in creating indexes is not recommended. An index is essentially a seperate file on the filesystem and every time the table data is altered, each of the index files need to be updated.
Second level of optimization: Read this link to understand Index selectivity. Here are the steps I took to gather statistics and compute selectivity:

Download and install SQL Developer from oracle.com and connect to the database.
Right click on Table -> Statistics -> Gather statistics
Alternately, you can run the following query:
analyze table XXXX compute statistics;
select column_name, num_distinct from user_tab_columns where table_name='XXXXX'
Note the number of distinct values of the data in the column you are interested in. Alternately run a sql query such as
select count(distinct ) from XXXXX
Run a sql query to get number of records in the table:
select count(*) from XXXXX
Index selectivity = Distinct Column values / Num rows in table
A larger value for Index selectivity is good. This means there are fewer rows corresponding to the column's each distinct value.

In my case, due to a large number of records in the table and the num distinct values of the columns being small, the index selectivity was less than 0.01. So even after adding indexs on the columns in the WHERE clause, the query was slow. In fact it made loading data into the database slower.
However if the query is returning only the indexed column's data in the SELECT clause, the data can be returned from the index file directly (the table data is not referred) and that indeed may result in a faster query.
Another option is to consider a Bitmap index instead of a B-Tree index. Bitmap indexes are useful when the column to be indexed has low cardinality, e.g. gender or marital status. Bitmap indexes work well with Data warehousing kind of apps which B-Tree indexes (typical) are more common with OLTP systems. So consider the Bitmap index characteristics and impact on your application carefully before deciding on using Bitmap indexes. Especially the fact that inserting data into tables with bitmap indexes is a lot slower than tables with B-Tree indexes.
Third level of optimization: Composite indexes. Sometimes indexes with lower selectivity can be combined into a composite index to create an overall index with a higher selectivity. Oracle also supports Index Skip scan which allows a composite index to be used even when all the columns specified in the composite index are not present in the SQL WHERE clause.
Note that using LIKE '%xxx' and other terms in the WHERE clause can cause FULL Table scans. However LIKE 'xxx%' can potentially use an index. There are several options as listed here and here.

Links:

Just Barebones

Tuesday, May 13, 2008

Oracle Query optimization

No comments: