We implemented our extracts from SAP last year. We use about 75% standard Extractors, 20% Custom Extractors and 5% tables pulls. We do not use any ABAP data flows at present.
More and more often we are being asked for the data to be updated through the day. Based on the architecture we have now, that isn’t possible for 3 reasons:
A fair few of the standard extractors (and several of them being key ones) provide a delta for the prior day; they aren’t capable of running for smaller time periods.
A lot of other tables and/or extractors do not have a last modified time and therefore we need to pull the whole table and run a comparison.
Out loads currently take about 3 hours just to get the data out of SAP. Out Staging and warehousing process adds another 90 minutes to that.
We need to address this, but I am not sure how.
Has anyone tackled this problem with data services. I know there are Hana solutions, but that is outside our budget.
I was wondering if ABAP data flows would give us any more options, but given that most extractors represent several tables joined together, it is going to take a long time to re-write them.
What do you do ? Are there options I haven’t considered ?
Hi Leigh, I can imagine the choices that you have. It is always a tough nut to crack. You deliver the data, then the business wants is twice a day until the business wants is realtime (why is it then management reporting and not operational reporting on the SAP system it self…).
3 hours for loading a DWH is not so bad, assuming some volumes. You could look in to some alternatives:
Extract nightly and then during the day provide in another cube updated information from during the day (assuming you work around your extractors)
Create for the parts of the DWH that really require faster updates a seperate extraction logic that runs multiple times a day
Anyway. We use R/3 flows all the way. For example: we have an interface that extracts open invoices from SAP. It is run by company. The biggest company is extracted in 7 minutes, 7000 lines (checked this morning).
This is what I consider almost the fastest possible, only one thing left: the tables extracted are a bit wide, while if you trim down to the fields required a performance gain can be made. From BSAS (closed documents?) in the past we extracted all fields, for 30 days of changes. 11 minutes. I trimmed it down to 7 fields, it went down to 5 minutes. Then also the filter on the table BSAS was changed so that the index on the table was correctly used: 1m 15s…
Anyway. When doing a delta with R/3 flows you need to use a few things:
A delta frame work that understands what the last succesfull (!)extraction date(time) was
Use change pointers in SAP, either create date, change date, but also the change tables CDHR and possibly CDPOS
Do not do complex transformations in SAP. Or even too much joins. SAP is a transactional system and is slower compared to a decent database. Transfer the data in to the database and transform it there.
Then regarding the DWH: I do not know how big it is, and how much data is processed daily. But there always tricks to trim down loading times. This week be brought the rebuild time of a cube down to half, by only modifiying technical settings (API bulkload, outer joins to dimensions instead of lookups, insert/appends etc). But this is so totally dependend on what you use and what you want to use.
Oh and do not use table comparions objects for big tables. They tend to be slow. For small amounts of changed data it is fine, but bigger sets are slow. We are looking in to getting rid of all of them…
Leigh we did this! But as you said, the extractors would be a blocking factor! We bypassed the Application layer and performed on the Database layer for a 10 minute refresh!
ganeshxp… that is a nasty teaser… could you please go in to more detail? You query directly the underlying database? What were the challenges? And does this not void some SAP support? In the past SAP was very unwilling to this kind of things…
Its funny, SAP won’t let us query the underlying database, but when they wanted to do ti for Hana… its exactly what they did (with Sybase Replication server).
The fact that you can do in ten minutes what takes hours using the supported methods, suggests some architectural limitations of the supported mechanisms, i.e. ABAP is a strong suspect as the bottleneck.