IMHO the biggest weak link in the DELTA processing is the use of a text file on SAP to identify new and change candidates. Usually this is “joined” after extracted in SAP based on create/change dates in VBAK/VBAP, for example. This approach really is inefficient with any significant volume, even for a single day’s incremental load. Also make sure you’re only looking back a day or whatever is appropriate, IIRC the default was to go look 6 months back for changes based on Order date or something like that. Also make sure you’re not running that DF that chews through all of CDHDR/CDPOS looking for changes, I think it was related to the partner function. In a mature system, that will run for days.
As for DELTA prcessing, for example the R3_DeltaInsertSlsOrds “as-is” is a real dog with any significant volume. The combination of VBAK, VBAP, VBUK, VBUP in a join with OrderNum.dat, works “OK” when you have tens or hundreds of thousands of Sales Order/Line Item records, but eventually the SAP Job R3_DeltaInsertSlsOrds will just run and run for hours. A low effort fix is to put this information into database tables in SAP, join the tables instead - i.e. SalesDeltaKeys, DeliveryDeltaKeys, etc. That pretty effectively allows you to tune the R3 transform.
The second problem again IMHO with this approach is the direct feed of this data into the Table Compare transform, this takes “new rows” and forces at least a scan of the entire index (unless you’ve dropped it, then it’s the entire fact table) to see if the row exists. I prefer to split new transactions into a direct append load with the Bulk Loader, apply the changes to existing rows as separate steps, that way all my new order/items are just pushed into the table w/o regard for whether they exist or not. The changed rows can then be updated, or the other order is fine too, except the changed row processing works much faster with the index/constraint enabled, so disabling the PK, bulk load, enable PK & rebuild, etc. If the model suits your needs in general, I would rework parts of the ETL. It’s not going to perform with just simple “tuning”, whatever that means. You need to shift more of the work back to SAP on DELTA loads.
I do not like the lack of table partitioning or the indexing strategy in the database Physical Model, either. Not partitioning the Sales Orders Fact by Fiscal Period, for example, or whatever makes sense, is a real lost opportunity. Since the DDL is Oracle specific, I don’t think this is asking too much to set up a shell for you to partition the table, use partitioned indexes based probably on date range like the table. Far too much of the DELTA ETL misses using the PK and ends up performing multiple table scans during updates. In an ideal world I wouldn’t even depend on the PK as the DIM keys would all be surrogate and I would not have to deal with Fact Table keys to help figure out where to update rows. The model is fairly old, maybe as much as 8-9 years and hasn’t changed a whole lot until perhaps recently (your version is newer than mine). Indexing strategy is more appropriate to Oracle 7 or early Oracle 8, you have dimensions with FK’s in the fact tables unindexed, and you have UNV objects that are attributes reading directly from the Fact Table instead of a dimension. Again, lots of scans.
The other thing to ditch is the default views, the Sales Order Header view for example may seem like a good idea but not the way it is implemented, it is a very poor performer. If you don’t agree, go run an explain plan on a query generated off the UNV using this view and see the cost.
The RM approach does give you a lot of good insight on where to find the data in SAP, and to some degree how to extract it, but the incremental processing is just not scalable “as-is”. Through adding tables in SAP and scheduling jobs there to setup the changed record keys, you can make it work pretty well. If you spend some time tuning the physical model and look for pseudo-dimensional things and replace them, query performance can be pretty good. The other thing you run into pretty quickly is lack of absolutely necessary fields in the Data Model, for example Customer Request Ship date only in the Schedule Lines, not at the Order Item Level in Sales Orders Fact. Schedule lines is another place where the table needs to be setup right, who wants to keep around ancient history for completed orders forever?
I’ve been able to make this work well, but from say Sept to March is the time span we spent with a new SAP implementation coming on line, adding the fields and tables not covered in Sales and Distribution by the RM, and also replacing most of the DELTA logic for all fact tables. It’s deceptive with a dev system and 60k order/items, you can load everything in the entire RM in 30 minutes once you fix the stupid things, but that falls off a cliff as the volume ramps up. Now I have tens of millions of fact table records, usually ~30,000 Orders with activity in any given load cycle, and the out-of-the-box RM would never handle that.
I do want to emphasize it can be made to work, and work well, it can get you up and running in 3-6 months from nothing to a Production Deployment, including multiple test environments, setting up version management with a CR, etc.- I’ve done it, with only a total team of 2 or 3 people (total) working on the RM changes and enhancements required for the specific SAP deployment, it will scale well if you understand BODI and how to make it perform well, and the code is reasonably clean and runs quickly enough on SAP for a typical load window. I’d hate to recreate all the ETL from scratch, esp. giving up the BODI visibility into SAP metadata.
HTH, your current processing running on and on if your volume is low is probably due to variable settings for start/end dates and delete period, etc. Also that DF option that chews through CDPOS, what a bad idea that was. Sorry for the length of this post, but I hope my experience can help you make a better informed decision and not have to find the weak spots as they show themselves.
Stracy
(BOB member since 2006-07-14)