We currently have a summarized fact table that stores actuals per account.
The source for this table is a large transaction table (approx 50 million
records). Instead of recalculating this table every load, we only what to add changes to it.
We have a transaction timestamp on the source table which should allow us to use incremental loads.
The problem is that I dont know how to add the changes to the summarized Fact table.
My advise? Load a transaction table and build an Oracle Materialized View on top of it. Then the database deals with the delta logic depending on the information it has, MView Log, Partition Change Flags,…
As above is the standard way of doing it, there is no special transform. And a delta could be tricky.
Basically, you take your aggregated input, e.g. 2008, 100USD and perform an outer join against the existing target to find the matching cell 2008, 800USD.
The two values are added “source.value + nvl(target.value, 0)” and loaded with the include_in_transaction flag via autocorrect load or Table Comparison.
Be very careful in case of failure. If you rerun the DF again you will add another 100USD to the already added ones. Include in transaction solves partial loads.
Unfortunately I can’t create materialised views in the source system. (if that was what you meant)
I will have to load the transaction table in the data warehouse and then do the logic. The problem is that the warehouse runs on SQL2000 and I am pretty sure it won’t support materialised views.
Do you think a normal view would do the trick as well or would cause performance issues?
We have something like what you are talking about that recalculates only the necessary changed rows, but as Werner pointed out it is by necessity pretty complex.
Basically, I load the changes from the transaction table on a daily basis. Then I build a table of all the recalculations that need to be done. In our case, the aggregate has daily sums, mtd sums and ytd sums. So if a day changes, I recalculate that day, and then all the days after it in the month for the MTD and all the days after it in the year for the YTD. I process in 3 loops, first updating the day, then using it to update the MTD, then using the 12 MTDs and the needed days to recalulate the YTD.
I meant you load the target database with the transaction data - call it a stage table if you like - and then you use the MView to rollup the detail data into an aggregation. Well, in SQL Server this is called Analysis services, isn’t it? SQL Server 2000 could be a problem though.
Thanks for the help. I have given up on the idea of incrementally loading summary tables now.
It will be too time consuming to implement as I have over 20 summary tables.
The other point is, once you have moved the data at the transaction level, you can create all kinds of aggregate tables. You figure more and more queries would benefit from an aggregate table with these columns? You just add that. In fact some databases do that even automatically!
Or in case of Analysis Services, store the data in a cube.
The problem with using Anal. Services is that you need to create an OLAP universe on top of it.
The limitation is that it won’t support contexts etc. (as far as I am aware)
Since we have row level security in our datawarehouse this might be a problem as well in Analysis Services.
Last time I did an SSAS proof of an concept I noticed Web-I was very unstable when using an OLAP universe (always crashes when using dates as a filter).
If it does not take too long for you to recalculate the totals from the transaction table that is a perfectly fine approach in my opinion. Simple to use, flexible, save,…
I just done a proof of concept and it seems to be working fine.
The only comment / question I have is when I join the ‘changed jobs’ table to the transaction table it seems to scan through every single record (approx 7 Mill). I was hoping it would only scan the records that need updating but that doesn’t seem the case.
Is there any way I can “pimp” (optimise) this join? (I had a look at your wiki page but couldn’t find anything relating to my problem)
I am trying to capture the changed records against a source table and I have crated MV log on that table . The MV log is giving me all the change records for delta load. Is there any problem if I use MV log or delta extraction in Oracle
I need your suggestion pls. I am using Oracle MV log ( Created against a table) to track the DML changes and using as source for delta extraction instead of Table comparison due to performance improvement.
Now , In MV log we have sequence number , DML operation type , I am think ing to use the sequence number to track the load each time .
DBA can not turn on archive logging option on in the database due to various reasons , so I avoided using Oracle CDC feature and went for using Materialized View Log option .
Please let me know if MV log can be fine in DS ? I need confidence pls