Gather Table stats useful or not at the end of dataload

Hi,

We are having DS jobs which loads data to a oracle table which is being used for reporting . At the end of data load gather table stats against those target table is useful ?

We are planning to run the table stats against the whole schema weekly basis.

Is it going to be helpful for performance improvement?


kgecdeep (BOB member since 2012-07-25)

You should update the Oracle statistics whenever the load did change something fundamentally.

For example, you ran the statistics and one table was empty then, after the load it contains millions of rows. Knowing that would help the optimizer.

On the other hand, if the table had 1million before and 1.1 million after the load will not matter.


Werner Daehn :de: (BOB member since 2004-12-17)

That’s a question to be discussed with an Oracle DBA rather than on this forum. If you have performance issues which are due to stale or incorrect table statistics then you should be investigating how those table statistics have got into the state they are in and what action you, or your DBA, should be taking to remedy the situation.

Investigate the problem first, rather than just throwing random solutions at it. If there isn’t a problem you are best leaving things alone.


dastocks (BOB member since 2006-12-11)

<Oracle_DBA_HAT = ON>
My criteria for gathering statistics at the end of an ETL job are:

  1. Does it make a difference?
  2. Do you have time to do it?
  3. Are you allowed to do it?

The only way to answer #1 is to run Explain Plan against the report queries. The Oracle optimizer is awfully smart but without good statistics it can be a drooling idiot - especially if the Oracle DBA doesn’t have the default optimizer settings configured just right.

Let’s say you load data daily that has a DATE column called TRANSACTION_DATE that is frequently used in the WHERE clause of report queries. After the initial load you gather statistics. Every day after that the current value gets further and further from the max value that the statistics gathering process identified. The Oracle optimizer extrapolates from the established max value (from the statistics) to the query value. As time goes on the optimizer will make a less accurate decision.

With regard to criteria #2 it has to be understood that gathering statistics takes time and resources. The data is there and ready to use so you aren’t preventing the users from reporting, but it will take time before the up to date statistics are available. Also, gathering statistics can be costly in terms of CPU and disk I/O. Gathering statistics right at the end of the job may be the best way to reduce disk I/O as the data may still be in memory.

There are some slick techniques in Oracle 11g for greatly reducing the time it takes to gather statistics. I presented on this topic at an Oracle conference this year so if you’re interested we can discuss that.

With regard to the automatic weekly statistics gathering job did you know that it may not be gathering statistics on all the tables that need it? It usually is configured to run within a specific time period. If it goes beyond that it just stops. The DBA at one of my client sites never checked to see how long the automatic statistics gathering job was running until we asked. It turns out that it ran for 12 hours and then gave up. It still had quite a few tables to go so the statistics were stale for another week.

Finally, if Data Services isn’t logging into the target schema as the schema owner then you may not have the necessary permissions to execute DBMS_STATS. There are a couple of solutions for that.
</Oracle_DBA_HAT>


eganjp :us: (BOB member since 2007-09-12)

Thanks all for the details.

Hi Eganjp,

Thanks for the details. I am very much inetersted in Oracle discussion reg. performance improvement which you mentiuoned. Please share the best practices which we should use. We are having Oracle env against the DS job , the target database is Oracle and the repositorey database is Oracle too.(Oracle 11g) we are using Unix as Job server.

We are doing a full load against one DS job ,if a table gets truncated and loaded again after the job gets completed , whether Gather Table stats is useful in this case to improve the performance in BO query which will act on this target table. Though the count of record difference is not so significant between previous day and present. But I am not sure if the DS job truncates and reloads the table , that can impact the performance if Gather table stats not used.

Also , u r right , I saw Gather table stats against the whole schema is taking long time . I am not sure if gathering table stats is the correct approach . My dba said auto stats gather will work below 10 million records so we are going for gather table stats against whole schema for performance improvement . Is it going to help ? Or what should be the alternatove way ? I am confused.


kgecdeep (BOB member since 2012-07-25)

I suggest this might be a good starting point:

http://www.amazon.co.uk/books/dp/1590596366


dastocks (BOB member since 2006-12-11)

Truncating a table does NOT wipe out the statistics. Dropping the table does. So in this case a weekly gather of statistics may be good enough. The only objective way to tell if current statistics help or not is to run the explain plan before and after gathering statistics to see if the plan is improved.


eganjp :us: (BOB member since 2007-09-12)

As eganjp has pointed out truncating and reloading a table won’t change the stats if the reloaded data isn’t changing very much.

YOU NEED TO ANALYSE THE PROBLEM FIRST.

Here are some of the things I have looked at when I have had problems with Oracle query performance in the past:

  1. Are the table indexes appropriate? - index usage can be monitored, so you can identify and get rid of unused indexes.

  2. Are the table indexes being used in typical queries and, if not, is this due to inaccurate statistics? Changing the order in which a table is loaded can dramatically change stats held against indexes and this can determine whether they are used in queries. Using the parallel loader may mean you get the data in faster, but it can also mean seriously degraded query performance when you’re trying to run your reports later on.

  3. What are the default settings for auto-gathered stats, and are they appropriate? e.g. do you really need histograms calculated for all columns in all tables? - I believe this is an ‘out-of-the-box’ default for Oracle that many DBAs never get around to modifying.

If you can answer (and understand) the above you might be part of the way to answering your initial question.


dastocks (BOB member since 2006-12-11)

My usual approach to gathering statistics is to use this procedure:

DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'DW', 
TABNAME => 'STUDENT_DIM', 
ESTIMATE_PERCENT => NULL, 
METHOD_OPT => 'FOR ALL INDEXED COLUMNS SIZE AUTO', 
CASCADE => TRUE, 
DEGREE => 4);

I HIGHLY recommend using NULL for the ESTIMATE_PERCENT parameter when using 11G. Oracle has done some fantastic work on coming up with a way to not have to scan every row but still come up with very accurate statistics.

The METHOD_OPT parameter used most often is ‘FOR ALL INDEXED COLUMNS’. The parameter I used above will gather histograms which will cause the process to take longer and also increase the amount of storage required to hold statistics. So use it carefully and only if it makes a difference in your explain plan.

In my opinion gathering statistics on a weekly basis in a data warehouse is like closing the barn door after the cows have already made their escape. An OLTP database is a different matter. For one of my clients we gather stats either within the ETL job or with a secondary job. By the time the users start running reports we have 100% up to date statistics. The DBA still runs the weekly gather stats job but it now runs in just a few minutes as all it has to gather stats on are system tables and some tables that aren’t relevant to the reports.


eganjp :us: (BOB member since 2007-09-12)