Tuesday, May 13, 2008

Oracle Query optimization

We are working on a RoR application that needs to crunch data across multiple tables (> 7 million records) from a huge Oracle database. The sql queries were initially taking a long time, upto 5 minutes for some queries.
Following are some links and tips to optimize the sql queries - specific to Oracle.
  1. I added the following snippet to the environment.rb which gives me some stats for the queries:

    # only run this code in development
    if ENV["RAILS_ENV"] == "development"

    # modify MysqlAdapter to track transfer stats
    class ActiveRecord::ConnectionAdapters::OracleAdapter
    @@stats_queries = @@stats_bytes = @@stats_rows = 0

    def self.get_stats
    { :queries => @@stats_queries,
    :rows => @@stats_rows,
    :bytes => @@stats_bytes }
    end

    def self.reset_stats
    @@stats_queries = @@stats_bytes = @@stats_rows = 0
    end

    def select_with_stats(sql, name)
    bytes = 0
    rows = select_old(sql, name)
    rows.each do |row|
    row.each do |key, value|
    bytes += key.length
    bytes += value.length if value && value.respond_to?('length')
    end
    end
    @@stats_queries += 1
    @@stats_rows += rows.length
    @@stats_bytes += bytes
    @logger.info sprintf("%d rows, %.1fk", rows.length, bytes.to_f / 1024)
    rows
    end

    alias :select_old :select
    alias :select :select_with_stats
    end

    # modify ActionController to reset/print stats for each request
    class ActionController::Base
    def perform_action_reset
    ActiveRecord::ConnectionAdapters::OracleAdapter::reset_stats
    perform_action_old
    end

    alias :perform_action_old :perform_action
    alias :perform_action :perform_action_reset

    def active_record_runtime(runtime)
    stats = ActiveRecord::ConnectionAdapters::OracleAdapter::get_stats
    "#{super} #{sprintf("%.1fk", stats[:bytes].to_f / 1024)}"
    end
    end

    # trim blob logging
    #class ActiveRecord::ConnectionAdapters::OracleAdapter
    #def format_log_entry(message, dump = nil)
    #if dump
    #dump = dump.gsub(/x.([^.]+)./) do |blob|
    #(blob.length > 32) ? "x.#{$1[0,32]}. (#{blob.length} bytes)." : $0
    #end
    #end
    #super
    #end
    #end

    end

  1. Check out this link on Oracle.com and download the Oracle version of query_analyzer. This plugin automatically runs an explain plan query for the select query.
    Analyzing
    plan_table_output
    --------------------------------------------------------------------

    --------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost |
    --------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 31 | 8598 |
    | 1 | SORT UNIQUE | | 1 | 31 | 8598 |
    | 2 | TABLE ACCESS FULL | XXX | 128 | 3968 | 8583 |
    --------------------------------------------------------------------

  2. First level of optimization: the output of the explain plan shows you which tables columns in the WHERE clause cause a FULL table access. These columns are targets for optimizations by dropping indexes on them. This will result in a explain plan output as shown below. Note the INDEX RANGE SCAN bit.
    ----------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost |
    ----------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 25 | 675 | 332 |
    | 1 | TABLE ACCESS BY INDEX ROWID| XX_XXXXX | 25 | 675 | 332 |
    | 2 | INDEX RANGE SCAN | XX_X_INDEX1 | 430 | | 4 |
    ----------------------------------------------------------------------------
    However, being too aggressive in creating indexes is not recommended. An index is essentially a seperate file on the filesystem and every time the table data is altered, each of the index files need to be updated.

  3. Second level of optimization: Read this link to understand Index selectivity. Here are the steps I took to gather statistics and compute selectivity:
    1. Download and install SQL Developer from oracle.com and connect to the database.
    2. Right click on Table -> Statistics -> Gather statistics
      Alternately, you can run the following query:
      analyze table XXXX compute statistics;
      select column_name, num_distinct from user_tab_columns where table_name='XXXXX'
    3. Note the number of distinct values of the data in the column you are interested in. Alternately run a sql query such as
      select count(distinct ) from XXXXX
    4. Run a sql query to get number of records in the table:
      select count(*) from XXXXX
    5. Index selectivity = Distinct Column values / Num rows in table
    6. A larger value for Index selectivity is good. This means there are fewer rows corresponding to the column's each distinct value.

  4. In my case, due to a large number of records in the table and the num distinct values of the columns being small, the index selectivity was less than 0.01. So even after adding indexs on the columns in the WHERE clause, the query was slow. In fact it made loading data into the database slower.
    However if the query is returning only the indexed column's data in the SELECT clause, the data can be returned from the index file directly (the table data is not referred) and that indeed may result in a faster query.
    Another option is to consider a Bitmap index instead of a B-Tree index. Bitmap indexes are useful when the column to be indexed has low cardinality, e.g. gender or marital status. Bitmap indexes work well with Data warehousing kind of apps which B-Tree indexes (typical) are more common with OLTP systems. So consider the Bitmap index characteristics and impact on your application carefully before deciding on using Bitmap indexes. Especially the fact that inserting data into tables with bitmap indexes is a lot slower than tables with B-Tree indexes.

  5. Third level of optimization: Composite indexes. Sometimes indexes with lower selectivity can be combined into a composite index to create an overall index with a higher selectivity. Oracle also supports Index Skip scan which allows a composite index to be used even when all the columns specified in the composite index are not present in the SQL WHERE clause.

  6. Note that using LIKE '%xxx' and other terms in the WHERE clause can cause FULL Table scans. However LIKE 'xxx%' can potentially use an index. There are several options as listed here and here.

Links:

No comments: