Incremental loading of an Aggregated Fact Table

We currently have a summarized fact table that stores actuals per account.

The source for this table is a large transaction table (approx 50 million
records). Instead of recalculating this table every load, we only what to add changes to it.

We have a transaction timestamp on the source table which should allow us to use incremental loads.

The problem is that I don’t know how to add the changes to the summarized Fact table.

Any advice would be greatly appreciated


RGeurts :uk: (BOB member since 2006-12-29)

My advise? Load a transaction table and build an Oracle Materialized View on top of it. Then the database deals with the delta logic depending on the information it has, MView Log, Partition Change Flags,…

As above is the standard way of doing it, there is no special transform. And a delta could be tricky.

Basically, you take your aggregated input, e.g. 2008, 100USD and perform an outer join against the existing target to find the matching cell 2008, 800USD.
The two values are added “source.value + nvl(target.value, 0)” and loaded with the include_in_transaction flag via autocorrect load or Table Comparison.

Be very careful in case of failure. If you rerun the DF again you will add another 100USD to the already added ones. Include in transaction solves partial loads.


Werner Daehn :de: (BOB member since 2004-12-17)

Thanks Werner. This makes sense, I will give this a shot and let you know how I get on.

Regards
Rim


RGeurts :uk: (BOB member since 2006-12-29)

Unfortunately I can’t create materialised views in the source system. (if that was what you meant)
I will have to load the transaction table in the data warehouse and then do the logic. The problem is that the warehouse runs on SQL2000 and I am pretty sure it won’t support materialised views.

Do you think a normal view would do the trick as well or would cause performance issues?

Regards
Rim


RGeurts :uk: (BOB member since 2006-12-29)

Normal view will cause performance problems, IMO.

We have something like what you are talking about that recalculates only the necessary changed rows, but as Werner pointed out it is by necessity pretty complex.

Basically, I load the changes from the transaction table on a daily basis. Then I build a table of all the recalculations that need to be done. In our case, the aggregate has daily sums, mtd sums and ytd sums. So if a day changes, I recalculate that day, and then all the days after it in the month for the MTD and all the days after it in the year for the YTD. I process in 3 loops, first updating the day, then using it to update the MTD, then using the 12 MTDs and the needed days to recalulate the YTD.

HTH.

  • Ernie

eepjr24 :us: (BOB member since 2005-09-16)

I meant you load the target database with the transaction data - call it a stage table if you like - and then you use the MView to rollup the detail data into an aggregation. Well, in SQL Server this is called Analysis services, isn’t it? SQL Server 2000 could be a problem though.

http://www.akadia.com/services/sqlsrv_matviews.html ???


Werner Daehn :de: (BOB member since 2004-12-17)

Hi Werner,

Thanks for the help. I have given up on the idea of incrementally loading summary tables now.
It will be too time consuming to implement as I have over 20 summary tables.

Regards
Rim


RGeurts :uk: (BOB member since 2006-12-29)

The other point is, once you have moved the data at the transaction level, you can create all kinds of aggregate tables. You figure more and more queries would benefit from an aggregate table with these columns? You just add that. In fact some databases do that even automatically!
Or in case of Analysis Services, store the data in a cube.


Werner Daehn :de: (BOB member since 2004-12-17)

Hi Werner,

The problem with using Anal. Services is that you need to create an OLAP universe on top of it.
The limitation is that it won’t support contexts etc. (as far as I am aware)
Since we have row level security in our datawarehouse this might be a problem as well in Analysis Services.

Last time I did an SSAS proof of an concept I noticed Web-I was very unstable when using an OLAP universe (always crashes when using dates as a filter).

btw I am using BOXI 3.1 nd DI 3.1

Please let me know your thoughts on the above.

Regards
Rim


RGeurts :uk: (BOB member since 2006-12-29)

Hi Werner,

In order to incrementally load the aggregated fact tables I have done the following

Identify all jobs that changed

Join Query 1 to the transaction table and recalculate totals

Table comparison: jobs that already exist in the aggregated fact table will be updated and new ones inserted

*A job is a term relating to our source system (nothing to do with DI)

This seems to work perfectly. Can you see anything wrong with this?#
Also I have attached a screenshot

Regards
Rim


RGeurts :uk: (BOB member since 2006-12-29)

If it does not take too long for you to recalculate the totals from the transaction table that is a perfectly fine approach in my opinion. Simple to use, flexible, save,…


Werner Daehn :de: (BOB member since 2004-12-17)

Thanks Werner,

I just done a proof of concept and it seems to be working fine.
The only comment / question I have is when I join the ‘changed jobs’ table to the transaction table it seems to scan through every single record (approx 7 Mill). I was hoping it would only scan the records that need updating but that doesn’t seem the case.

Is there any way I can “pimp” (optimise) this join? (I had a look at your wiki page but couldn’t find anything relating to my problem)

Thanks Werner

Rim


RGeurts :uk: (BOB member since 2006-12-29)

I would play with the join rank until you get a nested SQL.


Werner Daehn :de: (BOB member since 2004-12-17)

Hi Werner,

I am trying to capture the changed records against a source table and I have crated MV log on that table . The MV log is giving me all the change records for delta load. Is there any problem if I use MV log or delta extraction in Oracle


kgecdeep (BOB member since 2012-07-25)

Hi Werner,

I need your suggestion pls. I am using Oracle MV log ( Created against a table) to track the DML changes and using as source for delta extraction instead of Table comparison due to performance improvement.

Now , In MV log we have sequence number , DML operation type , I am think ing to use the sequence number to track the load each time .

DBA can not turn on archive logging option on in the database due to various reasons , so I avoided using Oracle CDC feature and went for using Materialized View Log option .

Please let me know if MV log can be fine in DS ? I need confidence pls


kgecdeep (BOB member since 2012-07-25)

Werner hasn’t been active on the forum a while.


eganjp :us: (BOB member since 2007-09-12)