SAP RapidMart experiences?

system · January 16, 2009, 9:39am

I noticed a post on the General Discussion about SAP/BW experiences. In our organisation we are going the RapidMart route instead, after carrying out a fairly successful Proof of Concept last year. We are now just about to embark on commissioning the Data Services and MetaData Manager products in conjuction with the Sales Analysis RapidMart. Are there any useful user experiences to report, pitfalls to avoid?

twofivepie (BOB member since 2008-10-16)

system · January 20, 2009, 3:35pm

I don’t have any direct experiences, but from what I have heard it is decent for an out of the box solution. We did use some of their pre and post script job handling routines as a loose basis for our original system and it seems to flow well.

Ernie

eepjr24 (BOB member since 2005-09-16)

system · January 20, 2009, 3:39pm

thanks

twofivepie (BOB member since 2008-10-16)

system · February 4, 2009, 7:18pm

Several observations:

Check the DDL used to create the RM schema; some default to the same tablespace for data and indexes, instead of creating indexes in their own tablespace. Also in general I have found the pre-processing and post-processing for fact tables a little wanting. For example, one script assumed an index rebuild once a week on Sundays. A lot of the tables are defined with PK/Unique constraints and in general while it’s a “safe” indexing strategy, it’s not the highest performance approach. It will work fine, but may be less scalable - but that’s really tuning issue for your DBA and specific DBMS. Plan early on how scalable you need your RM deployment to be, and adjust accordingly before you deploy to Production, you’ll avoid significant database structural changes. You’ll not find things like Partitioned Fact tables and indices, so before you grow too large you might want to rethink the physical structure of the database at the table level. Also some of the older designs use questionable Views, e.g. Sales Rapid Mart with its Sales Order Header View, that’s a dog performance-wise, you can do some tuning and do much better off the Sales Orders Fact table directly. Same for Stock History Fact in the Inventory RM.

These are all just tuning issues; the designs in terms of data model are generally good, the ETL structure shows a good understanding of how to efficiently extract data from SAP. You might want to take a hard look at incremental load or “Delta” processes, they tend to do a lot of sequential reads of SAP tables to come up with “what is new and changed”. If you can create a table of keys, for example, on SAP with those same change candidates, you won’t have to “join” a text file to the Sales Orders table, not a very efficient process.

To be clear and transparent I worked for BOBJ Global Services North America, and assisted in the Sales and installation/config of many different RM’s at mostly Fortune 200 companies. However I also have worked as a consumer on the other end, working for the Customer as DW Architect, Consultant, Designer, etc. (my current role). I have over 5 years of very direct, hands-on Rapid Mart experience with RM+SAP and other ERP Systems, I will say they are a very cost effective means of rapidly bringing up DW content from SAP into relational DBMS DW (i.e. not BW), just expect to do some tuning over time. I will say I have seen a 100% success rate in terms of RM customer deployment within the projects I have been personally involved with, which says a lot given the failure rate typical of DW/DM projects (well over half first efforts fail completely). Some have been challenging (e.g. stuffing 27k employee org into HR RM) but they’ve all seen production and achieved steady state with the typical break/fix and enhancement process. Just remember customization of SAP by clients means your source is going to be a little bit of a moving target, and the business will want things in addition to the core RM schema design, it’s inevitable.

Speaking of tuning, also realize that the model you start with will suit some but not likely all your needs over time, just like any other Data Model. Changing existing data flows that subsequently don’t work for any reason means you’re on your own for support (at least “formal” support). If you’re integrating your RM Dimensional Model into a larger EDW based on the Dimensional Matrix/Kimball approach, realize you may also want to introduce surrogate key processing and make some changes to RM Dimensions which generally do not handle SCD issues, and don’t use surrogate keys. The subject area that comes to mind where SCD treatment could be better is HR. Using a more up to date treatment for historical data than just creating history tables is a good idea. You end up with massive history tables where query is not so efficient to just find all the positions/jobs, supervisors for a single employee, for example.

Also the SAP exposure in terms of metadata and presentation of SAP’s tables, functions, etc. is excellent and probably the strongest aspect of the DI and RM combo for SAP. I’ve also found if the code running on SAP is slow or resource hungry, it is most often a result of how you design your ETL than any issues with how it generates. The code is clean and readable for SAP developers to help assist with troubleshooting.

Sorry for a brain dump, I hope it is obvious overall this is a thumb’s up

Stracy (BOB member since 2006-07-14)

system · February 4, 2009, 7:20pm

I’m out of time at the moment, but yes, there are some very specific things I’d recommend doing somewhat differently in the Sales Analysis Rapid Mart - not that many, mostly again incremental processing changes and database structural changes for scalability over time. I’m in year number 3 of a DW based on SAP that started with the Sales Analysis RM and work with its evolved form just about every day.

Try to catch back later, have to work

Stracy (BOB member since 2006-07-14)

system · February 16, 2009, 4:08pm

Greetings,

I am in the middle of trying to install and load the SAP Sales RapidMart. My experience is with the installation process since we haven’t actually got it up and running yet. I am attempting to load the DW using version 11.5.5.13 of the Sales RapidMart and version 3.1 of DS/DI.

I have found a number of what appear to be bugs in the delivered ETL. For instance, the first three major steps in the main dataflow, C_SalesOrder_Section_SAP, C_ScheduleLine_Section_SAP, and C_Delivery_Section_SAP, are repeated near the end of the job in the C_OnTime_Delivery_Section. This seems unnecessary.

There was also a “Use Input Keys” switch I had to set on one of the BACKLOG_FACT tables in DF_BackLogBilling_SAP.

The most recent one I ran into was a Query in DF_ShipToAddr_KNA1_SAP which had an input schema name that didn’t match the output of the previous step (don’t ask me how).

I admit I am a BOBJ rookie so maybe I’m missing something. If that is the case, I would appreciate any advice. But it seems to me that there are quality problems with the delivered ETL for the Sales RapidMart. I’m really not impressed (yet).

billb1

billb1 (BOB member since 2009-02-06)

system · February 16, 2009, 8:05pm

billb1:

Greetings,

I am in the middle of trying to install and load the SAP Sales RapidMart. My experience is with the installation process since we haven’t actually got it up and running yet. I am attempting to load the DW using version 11.5.5.13 of the Sales RapidMart and version 3.1 of DS/DI.

I have found a number of what appear to be bugs in the delivered ETL. For instance, the first three major steps in the main dataflow, C_SalesOrder_Section_SAP, C_ScheduleLine_Section_SAP, and C_Delivery_Section_SAP, are repeated near the end of the job in the C_OnTime_Delivery_Section. This seems unnecessary.

There was also a “Use Input Keys” switch I had to set on one of the BACKLOG_FACT tables in DF_BackLogBilling_SAP.

The most recent one I ran into was a Query in DF_ShipToAddr_KNA1_SAP which had an input schema name that didn’t match the output of the previous step (don’t ask me how).

I admit I am a BOBJ rookie so maybe I’m missing something. If that is the case, I would appreciate any advice. But it seems to me that there are quality problems with the delivered ETL for the Sales RapidMart. I’m really not impressed (yet).

billb1

Check those DF properties in the On Time delivery section - they are probably set to execute only once, and are part of the same job. It’s likely keying off the value of a global variable or conditional to determine which set to run. Or, it is possible they are run twice but with different parameters.

Don’t have that up to look and see at the moment, but when you see things repeated in the RM ETL, it is usually to support being run once or there’s a conditional similar to the FIRST vs. DELTA.

What this is trying to tell you is if you update Sales, you need to update Schedule Lines, or they’re out of sync. If you update Sales and Schedule Lines, you need to update Delivery section or the subject areas are out of sync, etc. I don’t recall the On Time delivery section, that may have been added since my last install.

Stracy (BOB member since 2006-07-14)

system · February 16, 2009, 8:29pm

I’ll have to check the newer version, mine started with 11.0.4.0 so it’s not up to date and the section you mention is not present.

Lol, backlog treatment and on-time delivery were issues I had to come up with solutions for, the RM lacked ability to track pre-allocated, backlog, allocated-not-shipped, and customer requested/promise dates. I had to create a section to allow tracking of shipped, backlog, etc. at the Schedule Line level and by Sales Order/Line Item.

Stracy (BOB member since 2006-07-14)

system · February 16, 2009, 9:26pm

Thanks for the insight Stracy.

You are right about the Execute only once flag.

My problem is I am trying to recover from errors in the middle of the process. I have been copying the job and deleting the steps that already have run earlier. I didn’t realize that the steps were duplicated so I didn’t realize I had to delete them all through the rest of the job.

I need to learn a better way to start in the middle of the process.

billb1

billb1 (BOB member since 2009-02-06)

system · February 16, 2009, 10:27pm

Right, the scope is within a single job for “run once”. So the steps get repeated if you break out a single job to fix things.

Make sure you are taking advantage of “recover as a unit” where you can, it really helps if you have a process where you can back up a workflow or so and restart - I try to design with this in mind, i.e. break my workflows into groupings where the entire discrete amount of work is “recoverable” and won’t leave things in a messy state.

I recall my RM experiences with Recovery, IIRC I made a lot of changes to make recovery more understandable for prod support and useful.

Those run once things are confusing, just like the dimension tables that repeat in sections where again it’s trying to tell you “if you load this fact table, these are the dims that must be up to date for it to make sense”.

Eventually, we moved all the dimension loads out of the Fact Load workflow sections due to time constraints. Most are loaded much earlier so we have our main load window for the fact tables and where needed update non-master tables with incremental load - so many of the dims change very little in a day, and too many are setup as truncate and load IMHO.

Stracy (BOB member since 2006-07-14)

system · February 27, 2009, 3:01pm

Well, I am still trying to get the SAP Sales RapidMart to load. The initial load did finally complete but the delta load has been running for 2 days. There is just no way that is going to work. We need it to run in something more like 2 hours. SAP support is telling me we need to tune it but it is going to take more than tuning to get this monster down to something reasonable.

My current feeling is that we need to scrap the RapidMart and develop something on our own. There is way too much processing going on in the DI ETL program. Most, if not all, of this should be done in ABAP in SAP before the extract file is produced. It would be much more efficient to do it in SAP, IMO.

Have a good weekend!

billb1

billb1 (BOB member since 2009-02-06)

system · February 27, 2009, 5:29pm

Just curious, what kind of size load are we talking for your delta? Our experiences say that even with a large number of lookups and transforms you should easily be able to process a hundred thousand records an hour in DI. As I said before, I have not look at the code for RM except for their start and end scripts.

Ernie

eepjr24 (BOB member since 2005-09-16)

system · February 27, 2009, 5:35pm

What dataflow takes so long? Or is it everyone does? And what processing is done in the engine and not the R/3 DF for example?

Werner Daehn (BOB member since 2004-12-17)

system · February 27, 2009, 5:54pm

IMHO the biggest weak link in the DELTA processing is the use of a text file on SAP to identify new and change candidates. Usually this is “joined” after extracted in SAP based on create/change dates in VBAK/VBAP, for example. This approach really is inefficient with any significant volume, even for a single day’s incremental load. Also make sure you’re only looking back a day or whatever is appropriate, IIRC the default was to go look 6 months back for changes based on Order date or something like that. Also make sure you’re not running that DF that chews through all of CDHDR/CDPOS looking for changes, I think it was related to the partner function. In a mature system, that will run for days.

As for DELTA prcessing, for example the R3_DeltaInsertSlsOrds “as-is” is a real dog with any significant volume. The combination of VBAK, VBAP, VBUK, VBUP in a join with OrderNum.dat, works “OK” when you have tens or hundreds of thousands of Sales Order/Line Item records, but eventually the SAP Job R3_DeltaInsertSlsOrds will just run and run for hours. A low effort fix is to put this information into database tables in SAP, join the tables instead - i.e. SalesDeltaKeys, DeliveryDeltaKeys, etc. That pretty effectively allows you to tune the R3 transform.

The second problem again IMHO with this approach is the direct feed of this data into the Table Compare transform, this takes “new rows” and forces at least a scan of the entire index (unless you’ve dropped it, then it’s the entire fact table) to see if the row exists. I prefer to split new transactions into a direct append load with the Bulk Loader, apply the changes to existing rows as separate steps, that way all my new order/items are just pushed into the table w/o regard for whether they exist or not. The changed rows can then be updated, or the other order is fine too, except the changed row processing works much faster with the index/constraint enabled, so disabling the PK, bulk load, enable PK & rebuild, etc. If the model suits your needs in general, I would rework parts of the ETL. It’s not going to perform with just simple “tuning”, whatever that means. You need to shift more of the work back to SAP on DELTA loads.

I do not like the lack of table partitioning or the indexing strategy in the database Physical Model, either. Not partitioning the Sales Orders Fact by Fiscal Period, for example, or whatever makes sense, is a real lost opportunity. Since the DDL is Oracle specific, I don’t think this is asking too much to set up a shell for you to partition the table, use partitioned indexes based probably on date range like the table. Far too much of the DELTA ETL misses using the PK and ends up performing multiple table scans during updates. In an ideal world I wouldn’t even depend on the PK as the DIM keys would all be surrogate and I would not have to deal with Fact Table keys to help figure out where to update rows. The model is fairly old, maybe as much as 8-9 years and hasn’t changed a whole lot until perhaps recently (your version is newer than mine). Indexing strategy is more appropriate to Oracle 7 or early Oracle 8, you have dimensions with FK’s in the fact tables unindexed, and you have UNV objects that are attributes reading directly from the Fact Table instead of a dimension. Again, lots of scans.

The other thing to ditch is the default views, the Sales Order Header view for example may seem like a good idea but not the way it is implemented, it is a very poor performer. If you don’t agree, go run an explain plan on a query generated off the UNV using this view and see the cost.

The RM approach does give you a lot of good insight on where to find the data in SAP, and to some degree how to extract it, but the incremental processing is just not scalable “as-is”. Through adding tables in SAP and scheduling jobs there to setup the changed record keys, you can make it work pretty well. If you spend some time tuning the physical model and look for pseudo-dimensional things and replace them, query performance can be pretty good. The other thing you run into pretty quickly is lack of absolutely necessary fields in the Data Model, for example Customer Request Ship date only in the Schedule Lines, not at the Order Item Level in Sales Orders Fact. Schedule lines is another place where the table needs to be setup right, who wants to keep around ancient history for completed orders forever?

I’ve been able to make this work well, but from say Sept to March is the time span we spent with a new SAP implementation coming on line, adding the fields and tables not covered in Sales and Distribution by the RM, and also replacing most of the DELTA logic for all fact tables. It’s deceptive with a dev system and 60k order/items, you can load everything in the entire RM in 30 minutes once you fix the stupid things, but that falls off a cliff as the volume ramps up. Now I have tens of millions of fact table records, usually ~30,000 Orders with activity in any given load cycle, and the out-of-the-box RM would never handle that.

I do want to emphasize it can be made to work, and work well, it can get you up and running in 3-6 months from nothing to a Production Deployment, including multiple test environments, setting up version management with a CR, etc.- I’ve done it, with only a total team of 2 or 3 people (total) working on the RM changes and enhancements required for the specific SAP deployment, it will scale well if you understand BODI and how to make it perform well, and the code is reasonably clean and runs quickly enough on SAP for a typical load window. I’d hate to recreate all the ETL from scratch, esp. giving up the BODI visibility into SAP metadata.

HTH, your current processing running on and on if your volume is low is probably due to variable settings for start/end dates and delete period, etc. Also that DF option that chews through CDPOS, what a bad idea that was. Sorry for the length of this post, but I hope my experience can help you make a better informed decision and not have to find the weak spots as they show themselves.

Stracy (BOB member since 2006-07-14)

system · February 27, 2009, 6:33pm

Werner, as you can guess from my post, IMHO the entire DELTA processing on the SAP R3 side has some really inefficient steps in SA_RM. If you’d be interested in pursuing “why” I say this and how we’ve worked around some of these issues, I would be willing to provide more input. One of the obvious design issues is you can’t assume a customer will be allowed to create custom tables on SAP, etc. so I can see where the approach came from, but it does not scale well to ERP level volumes w/o some work.

I’ve worked with Sales, Inventory, HR, and a slew of other rapid marts I’ve already forgotten. It sounds like there have been SA_RM improvements in backlog processing and on-time delivery, that would be good as I had to design most of mine from scratch - that was a fair amount of work.

I started with my current engagement in July 2006 based in the beginning on a few RM’s, Sales and Inventory. The good news is DI has stood the client’s test of time, we’ve added many other subject areas outside the RM scope, but in the end the main focus is supporting an Oracle DW development from the SAP S&D deployment, based on the SA_RM as its core.

Steve

Stracy (BOB member since 2006-07-14)

system · February 27, 2009, 6:47pm

The bottleneck is not usually in DI. The DELTA bottlenecks for the vast majority are in the R3_Transform, i.e. running a job in SAP for an hour to combine a text file with tables to extract your changed records file.

If you use the RM approach like OrderNum.dat, for example, with Pricing Conditions (KONV) in SAP, you can easily create a many hours long job for a ridiculously small number of new condition records. The SAP support team cites long running excessive sequential reads as one of the slowness issues.

I can however extract, ftp, and stage a fact table with a million rows from SAP, create its monthly aggregate, completely finish the job in times like 3 minutes, much of which is beyond my control - ftp from where SAP is hosted (Data Center 1) to where the DW is hosted (Data Center 2). Sometimes this will add up to a minute to a file transfer (may not sound like much, but our load window is small, a 5-minute improvement in a multi-step DI Job can be a big deal).

When things go badly, it is usually a fault in the way the R3_Transform was written, options selected, etc. Most of the SA_RM runs fine. For example, the entire DF_DeltaMaterial_SAP runs start to finish in 90 seconds wall time, the SAP job takes 7 seconds.

Design counts.

Stracy (BOB member since 2006-07-14)

system · February 27, 2009, 6:50pm

Ahhh, I see.

As for design counting, I can attest to that. Once I can convince management of that I think I will have reached nirvana and can happily pass on to my next state of being.

Ernie

eepjr24 (BOB member since 2005-09-16)

system · March 1, 2009, 8:06pm

SAP SD is a bi**** bad thing when it comes to delta. There is one company selling three different materials to millions of customers, another company sells atomic powerplants - so million of line items but just a few orders to a few customers. One has configuarable materials a lot, the other none.

One company allows to delete orders, others allow that only for a certain change, lots of dependent tables that could influence the fact and hence require to be included in the delta detection as well.

And no save delta recognition method as CDHDR/CDPOS is not mandatory.

SAP Extractor support would be cool for that.

Werner Daehn (BOB member since 2004-12-17)

system · March 1, 2009, 10:10pm

Yes, it is. Configurable materials, almost like a second MARA table, only each one can almost be unique, and order types that deliver w/o a sales order on the delivery note, it’s a party. We’re in the 100k’s of customers, 50-100k materials, 10’s of M of sales orders, so its somewhat more balanced SD implementation.

Sales Status is another area where SAP in SD could use finer grain, the SAP team added another set of lower grain status objects than Header and Line Item, it’s tough otherwise to do real “what’s in the queue?” operational reports. Sales Pricing Conditions were another major area to add to base SM, that was a lot of work.

Since SAP also considers something “delivered” before it actually ships, that makes it interesting as the delivery doc items need an actual goods issue date (not “goods issue”, that’s something else!).

I’ve actually been pretty impressed with the speed of DI extraction, so long as the design breaks down the work, it’s all about where you want the work to happen.

I do think the SA_RM model and ETL is good starting point and throwing it out would be a mistake, in a small team you just have to really understand the source and what runs well, what won’t.

Stracy (BOB member since 2006-07-14)