Aggregate Aware Functions

system · October 1, 2002, 5:01pm

Hello,

Forgive me if I am a tad Unaware at the moment, but I could use some help determining whether or not I should use aggregate aware for a particular problem.

We have to calculate a percentage of actual hours worked out of total available hours by week, by month-to-date, and by year-to-date for 6 different criteria and 2 different calendars. It is currently represented within the universe at the lowest level (not aggregated). Before going to BO, the query to generate this report took hours to run…literally.

Would this situation be a good candidate for aggregate functions?

Thanks,

MVB

mvbokker (BOB member since 2002-09-03)

system · October 1, 2002, 5:27pm

Yes, but you have to build and maintain the tables first.

What @Aggregate_Aware() does is provide the ability to make “smart” objects that will pick the aggregate value from the best table available. But you have to have several different tables (base level up through top level summary) available in the database before this function is useful.

This function doesn’t create the tables, it simply allows the designer to make objects that switch from one to another. Make sense?

Dave

Dave Rathbun (BOB member since 2002-06-06)

system · October 1, 2002, 5:59pm

Does this mean for every possible combination of criteria we will need a table with the aggregated totals? The BO (report) solution used data providers, we need a solution within designer for adhoc reports.

I think I’m a little confused.

MVB

mvbokker (BOB member since 2002-09-03)

system · October 1, 2002, 6:12pm

You need “appropriate” combinations, but you don’t need “all” combinations. For example, if you have daily invoice data in your most detailed table, you can create a monthly summary version that drops the daily detail and rolls everything up to the month. Do the same for quarter and annual values, and you have 3 possible summary tables.

Your “revenue” object would then look like:

@Aggregate_Aware(Sum(annual.revenue), sum(quarter.revenue), sum(monthly.revenue), sum(daily.revenue))

You have other steps involved in setting this up. My point in the original response was that Busobj does not create the annual, quarterly, and monthly summary tables. Your DBA has to do that. Once those tables have been created, you can use this function to pick the “best” place for the revenue, based on the other objects selected in the ad-hoc query.

If the user picks the quarter, for example, you obviously can’t use the annual table.

There are several presentations on using Aggregate Awareness that other Integra consultants have done an prior user conferences. Click the “Integra” link in my sig, and look for the conference presentations in our “Library”. That should help get you started.

Dave

Dave Rathbun (BOB member since 2002-06-06)

system · October 1, 2002, 7:16pm

See also https://bobj-board.org/t/16232 for my reasons why @Aggregate_Aware is obsolete

slimdave (BOB member since 2002-09-10)

system · October 1, 2002, 9:20pm

I heartily disagree with you that “@Aggregate_Aware is obsolete” for many reasons, only one of which is:

Not everyone is using Oracle, much less Oracle 8.1.6!
Nor does every designer of a BusinessObjects universe have rights to determine views (materialized or otherwise) in a database to meet such “aggregate-aware” needs.

I cannot argue whether or not Oracle’s materialized views are better or worse than pre-aggregated summary tables.

However, in the case of a data warehouse, where financial data is, say, refreshed monthly – it sure seems that a one-time build of one or two or three well-planned aggregate tables would be well-advised. And the use of @Aggregate_Aware would then be beneficial.

Any other takers?

Anita Craig (BOB member since 2002-06-17)

system · October 1, 2002, 9:34pm

Thanks for the info.

I just found out I will not have to aggregate or use views…for this one anyway. Apparently this was built within the warehouse, but now I have bad data to contend with.

I’m sure this info will be useful for other requirements.

mvbokker (BOB member since 2002-09-03)

system · October 2, 2002, 1:20am

Ok, ok, I exaggerate. @Aggregate_Aware is not entirely obsolete.

But, if you …

i) Are using a recent version of Oracle (and 8.1.6 ain’t that recent, either)
ii) Have caring ETL and database staff who would rather save themselves and you much time and effort OR can just do the ETL and database stuff yourself
iii) Have data suitable for aggregation summary tables

… then MV’s make so much sense that any query tool’s method of aggregate awareness is obsolete.

The issue of how often or how large the aggregates are is not relevant, as the aggregation has to be performed and saved whatever the method. However as the number of metrics to be aggregated increases, and/or the number of effective aggregation levels increase, the case for MV’s becomes stronger.

One more posting by me dissing @Aggregate_Aware, and it’ll be – “Slimdave? Materialized view nut. Goes on about them all the time. Hates aggregate awareness.” Hey, don’t make me start a new “MV vs. @aggregate_aware” thread, 'cos I’m ready man, I’m ready!

slimdave (BOB member since 2002-09-10)

system · October 2, 2002, 1:22am

Cindy Clayton (BOB member since 2002-06-11)

system · October 2, 2002, 1:54am

I need a little education on this one.

If you’re using MVs, do you have to do ANYTHING in the universe? Does Oracle automatically navigate to the MV instead of using the detail tables?

If so, I think you’re right MVs would be better for Oracle users. If not, then a MV is really just an aggregate table and you still need @AggAware stuff.

Steve Krandel (BOB member since 2002-06-25)

system · October 2, 2002, 4:25am

Me too.

I’ve been out of the Oracle DBA side of things for a while now, so I’m going to admit I’m not up on the latest wizz-bang features. But I’m concerned by the “V” part of “MV”. Is this really a View? Meaning “dynamic”?

Because if so, I can’t see where the performance benefit comes from. The point of summary tables is to pre-aggregate values on some regular (scheduled) basis. That way when you are ready for the answer now, it comes back now, and doesn’t have to wade through the detailed entries. If this is the case, then the size and frequency of the updates does matter.

You said

Yes, this is true. But if I’m creating aggregate tables I’m looking for a 10:1 reduction ratio at each step. Assuming I start with a 100 million row table, I can reduce that down to 10,000 rows (100M, 10M, 1M, 100K, 10K) with four aggregate tables. Less, if I skip a step. Scanning a permanent 10K row table has got to be faster than aggregating a 100M row table on the fly. If you’re talking about a warehouse, where the 100M row table is updated on some schedule, then fine. Update the MV on the same schedule. What’s the difference between that and creating summary tables?

Feeling the need for education, I went to the Oracle web site, and found this:

Hmmm, sounds like aggregate awareness with summary tables to me. They go on to say:

That’s exactly what the Aggregate_Aware function does. Eliminate the need for the user to switch tables.

I then found this article on refreshing materialized views where it points out that there are four options for updating the data in your MV. Complete, Fast, Force, and Never. Based on the amount (and I guess type) of data that has changed, Oracle will attempt to update the MV on the fly. But it could fail, at which point you would (I presume) be switched over to the base 100M row table instead.

Well, why not simply update the summary table when the data is updated? Then don’t worry about it?

I guess my point is this… Oracle provides materialized views, other database vendors might not. With materialized views, any query from any source (busobj or otherwise) might take advantage of the summary data. Without materialized views, BusObj will still provide the option using @AA. And I know what’s happening with @AA, whereas the database is making the decision for me with views.

And since materialized views take up storage space just like regular tables, why not use regular tables? What’s the different between defining a view and running a load script? What steps does Oracle do to guarantee that the summary view and base table will agree as far as total aggregate values?

Anyway, I’m not convinced. I’ll have to see about creating some materialized views and playing with them before I’m completely sold.

Very cool discussion… please keep it going.

Dave

Dave Rathbun (BOB member since 2002-06-06)

system · October 2, 2002, 1:05pm

No, it’s a view definition that sits on top of an actual table, which is where the “Materialized” comes in. When I build them I tend to create the table first, then define a materialized view on top of it. MV’s were called “snapshots” in earlier versions of Oracle.

Because the definition of the query that populated the underlying table is known to Oracle, and because of the presence of PK’s, FK’s, and change-tracking on the source tables for the query definition, Oracle can work out whether the aggregate table can be used to answer the query.

You still have to store the aggregated data somewhere, but with an MV Oracle itself can track whether the data in the underlying table has changed, and therefore whether the MV data can be reliably used to answer a query. Furthermore you get access to the supporting packages that can tell you exactly how much I/O has been saved by rewriting queries to use particular MV’s, and you can ask Oracle to predict the I/O saving that you would get by having a particular MV. It’s also transparant to the query tool. Works with absolutely anything that sends SQL to the database, including SQL*Plus, BusObj, Microstrategy, C programs, whatever.

Indeed, but MV’s eliminate the need for the universe designer to define objects as being aggregate aware, which is a painful exercise when you have for example fifty aggregatable metrics and five aggregate tables. I see aggregates purely as a database tuning subject, not unlike indexing. In fact it is almost an exact parallel. Knowledge of query patterns and likely access paths is helpful for both, and both are invisible to the end user and application

Aggregate refresh failures can and do occur – they have happened to me using Informatica to load up aggregates, and when I didn’t get a warning of a failure we had users drilling down and getting “strange results”. If I had been using MV’s then the queries would never have been re-written against the aggregates, and we would have got immediately noticeable poor performance instead of reporting the wrong numbers, which I would prefer.

Refresh on the fly is not the only option – I exclusively use refresh on demand, and I can command the refresh of a group of twenty aggregate tables (MV’s) with a single PL/SQL procedure call. I can even control whether all of the refreshes rollback should a single one fail.

The database is making decisions for you all the time. Hands up all those who want to take over the job of specifying which indexes their database will use for every individual query they submit. Any takers? You want to make the choice on hash joins, star transformations, merge joins, fast full index scans for every SQL statement? I think not.

Oracle has complete insight into the refresh process, the constraints on the tables etc… The presence of PK’s and FK’s (either enforced or unenforced) is essential, and to get maximum rewrite potential you should look at the “CREATE DIMENSION …”, which gives Oracle more complex metadata of the relationships between tables and columns.

eg.

CREATE DIMENSION DMN_DATE
LEVEL MONTH IS DMN_DATE.MONTH_CD
LEVEL QUARTER IS DMN_TB_DATE.QUARTER_CD
LEVEL YEAR IS DMN_TB_DATE.YEAR_CD
HIERARCHY DATE (
MONTH CHILD OF
QUARTER CHILD OF
YEAR
)
ATTRIBUTE MONTH DETERMINES (DMN_DATE.DATE_OF_MONTH, DMN_DATE.MTH_OF_FISCAL_YEAR);

As an example of a sophisticated rewrite decision, suppose you created two MV’s on the same table, with exactly the same query definition but with one MV creation ORDER BY’d on column “date_of_transaction” and the other ORDER BY’d on “customer_id” and both columns indexed in each example. Oracle’s statistics will show a high degree of index clustering on “date_of_transaction” for the first MV, and on “customer_id” for the second.

If you send a rewritable query using “where customer_id = 12345” against the base table then Oracle will use the second MV, because it will give a lower cost than using the first. Use “date_of_transaction between trunc(sysdate) and sysdate”, and you rewrite against the first MV.

There is absolutely no way that any query tool will predict that. It is all Oracle.

That is absolutely the best approach. I found that once a query was transparantly re-written for me, I was hooked. It was so simple, safe and fast that I couldn’t live without it any more. As with all these unfamiliar technologies you get almost nothing from reading documents – you’ve got to try it in order to get understanding.

slimdave (BOB member since 2002-09-10)

system · October 2, 2002, 3:54pm

Well, we have 8.1.7 and if I have to, now I know what to do…so thank you for that.

mvbokker (BOB member since 2002-09-03)

system · October 3, 2002, 12:39pm

Found another reason to use MV’s, in the 5.1.6 readme file

LOV queries do of course use MV’s. Just a minor point

slimdave (BOB member since 2002-09-10)

system · October 3, 2002, 12:52pm

IMHO:
For LOVs it is fine if they come from the highest/less detailed aggregate table…So I do not see the point regarding LOVs and MV.

Andreas (BOB member since 2002-06-20)

system · October 3, 2002, 3:14pm

Well, you’re quite right as far as single column LOV’s go – you would just want the highest aggregation level used. However it looked like it was going to be a problem with a multi-column LOV’s (for hierarchical selections), where the child column might not be aggregated to the same level as the parent, and the query might try to run against both the aggregate and the base table.

I tested it, expecting either a cartesian product or an error, but it seems that multi-column LOV queries are aggregate aware, and there is no issue here.

So now I don’t know what the point of the note in the readme is.

slimdave (BOB member since 2002-09-10)