Speed of persistent cache

system · August 9, 2010, 4:13pm

Hi,

After the holidays with my girlfriend and the ‘BODS Performance optimization guide’ i was eager to test/ benchmark some theories with a basic-setup (zero measurement) and measure what the real difference in performance is.

Regarding the Persistent Cache i didn’t notice any performance advantage, here is what i did…

Dimension table ‘DIM’: 1 PK column (Integer/ sequence), 2 UK columns (integer and date). Both are indexed, 10 million rows.

1. Normal lookup flow (zero-measurement):
Rowgen (10 mil) --> Lkp in dimension (pre-load-cache) --> bulkload into fact.

2. Build up persistent cache table:
‘DIM’ (oracle source table) --> ‘DIM_PERS’ (persistent cache). I defined the key columns on teh same columns as i use in the lookup (UK)

3. Lookup Persistent cache flow:
Exactly the same as the normal lookup flow, however with a lookup to the Persistent caching table.

Results
The persisent-cache-lkp flow and the normal-lkp flow take exactly the same time to finish (i did several runs).

I would expect the persistent cache to be faster because the persistent cache doesn’t have to query the database and there is no extra I/O. Obviously i am missing something here, but what?
When should the persistent cache be faster?

BBatenburg (BOB member since 2008-09-22)

system · August 10, 2010, 7:18am

I would expect a lookup with pre-load cache to run in the same time
as using a lookup against the persistent cache. All you ‘saved’ is the time to load the cache - although you had to do it in another DF which itself takes time.

If there are a lot of values to load into the cache, and a lot of lookups spread throughout the job you might save some time by only loading the cache once. However, if you’re doing this sort of thing the chances are the lookup table itself is being modified and every time that happens the persistent cache will need to be reloaded.

dastocks (BOB member since 2006-12-11)

system · August 10, 2010, 2:36pm

Exactly, i should win the time to build up the lkp-cache. Example:

Pre-load cache lookup flow:

Build up cache: 10 minutes
Lookup values: 15 minutes
Total: 25 minutes

In this case i would expect the PC (persistant cache)-flow would take 15 minutes in stead of 25 (compared to the normal cached lookup). Because the PC-table is allready build and it only has to execute step #2.

Reality is it doesn’t, it takes 25 minutes… I don’t get it…

BBatenburg (BOB member since 2008-09-22)

system · August 11, 2010, 11:02am

I believe that the jobserver only tries to keep the PC data in memory on a ‘best endeavours’ basis. If the PC data is flushed from memory and then read from disk again for each lookup DF you’re probably not gaining anything.

Is the DF doing the lookups running with In-memory or Pageable cache? - this might make a difference. You could also try setting the lookup itself to run as a seperate process within the DF.

Some years ago I tried using the PC for all static (or near static) dimension lookups in a CDC job that runs in a continuous loop. There was no perceptible difference in performance, and I couldn’t see any benefits elsewhere (e.g. reduced load on DB server) so I decided it wasn’t worth the bother.

It would be interesting to hear from anyone who lurks on this forum who does use the persistent cache.

dastocks (BOB member since 2006-12-11)

system · August 13, 2010, 10:56am

Assuming we’re talking about batch jobs…

Persistent Cache doesn’t keep anything in memory across dataflows. Every time you go to do a lookup against the persistent cache datastore, it has to re-read it off of disk. Whereas your database engine (Oracle, MySQL) has substantial caching capability, assuming you have plenty of memory.

Where Persistent Cache is useful is where you have to build a very complex lookup that can be reused several times in a job, or, if the looked-up data is remote (over a slow WAN connection) and you want a local copy. In both cases, the cost/time of building that lookup would have to be exceeded by the benefit of having the cache local to DI on-disk. As you experienced, it often doesn’t.

What would be much more useful IMO is for DI to allow In-Memory Datastores to be usable with batch jobs (not just realtime). Then you could pin your frequently-used dimension lookups into memory.

dnewton (BOB member since 2004-01-30)

system · August 23, 2010, 11:13am

See the table at the bottom
https://wiki.sdn.sap.com/wiki/display/BOBJ/DI+Caching+Example

My first question would be what the trace log said in the first case. If it did use the pageable cache anyhow, the lookup times will be identical as the same code is executed. The only difference will be

How long does it take to read the lookup table and build the cache - Case1
How long does it take to read the prebuild cache structure from disk to memory - Case3

If the disk where the pageable cache is placed is very slow, Case1 can be faster than the sum of Case2+3 because a file is not created in Case1 unless it exceeds the threshold of 1.5GB of memory.
But usually if you cache millions of rows the Case2 will take a long time to build the cache table and this time will be included in your Case1 as well.

To answer your question why both show the same execution time:
The lookup is not the bottleneck, the bulkload is. Imagine for a second, the load would take 1 minute per row. Would you expect to see a difference depending on the lookup method? No, not really. You might see that the one consumes a little bit more CPU than the other as “slow” means “more to do”. So one is more idle than the other.
And in fact we have that situation, lookup is times faster than bulkload. Hence I discard the data for such tests instead of loading it.

Werner Daehn (BOB member since 2004-12-17)

system · September 10, 2010, 2:59pm

I want to give a thumbs up to what dnewton said above as I very much agree with this:

I personally agree with this other statement he made. However, this is (IMHO) an extremely advanced feature that should be used only if the developer can answer 10 questions that only an experienced developer would know. In other words, this feature could get a lot of inexperienced developers in trouble because of the perceived “Coolness Factor” of it without realizing the implications.

Anyhow, great thread on persistent cache that should be required reading.

eganjp (BOB member since 2007-09-12)

system · September 10, 2010, 4:37pm

Yes, you’d only do this if you thought you were smarter than your DBMS’ resource-allocation algorithms.

dnewton (BOB member since 2004-01-30)

system · September 13, 2010, 7:33am

My list of things in that area:

persistent cache database is moved out of the dataflow into the jobserver process. Hence if one dataflow is done, the cache is still available for the next dataflow and does not need to be read from disk.
optimizer collects the usage of caches and tables initially and caches the data upfront. Example: DF1 does a lookup on CUSTOMER where KEY=? returns C_ID. DF2 does a lookup on CUSTOMER where key=? returning LASTNAME. DF5 loads the CUSTOMER table and DF6 does a lookup on CUSTOMER after DF5. So why should both DF1 and DF2 read the CUSTOMER table, we read the table once and save KEY, C_ID and LASTNAME in the cache when executing DF1 and we keep the cache in the persistent cache database. After DF2 is done, the cache is release although DF6 does a lookup as well because in the meantime DF5 will have entered new rows into the table. Hence we have to read it again. (And later we can even optimize that)
Cache reader can have multiple reader threads
cache reader uses partitioning information for reading and for the lookup, e.g. if I am reading all order-partitions of region1 and I have a lookup on C_ID=C_ID and REGION=REGION, I know that as REGION can be region1 only, I need the lookup partition of region1 only.

Werner Daehn (BOB member since 2004-12-17)

system · December 19, 2012, 6:25pm

I ran into an interesting (to me at least) issue today in a job that has been failing over the past few days after being in production for almost two years runnig the same version of DS 12.1.1.4 the entire time.

The Dataflow uses DOP=10 and has a lookup() (pre-load cache) that uses a persistent cache table as a source. The persistent cache table has 25 million rows in it. Looking at the monitor log I’m unable to clearly determine what is going on with the lookup and it’s effect on memory.

The persistent cache table is loaded by a separate job (Job A) and only the Dataflow (above) reads from the persistent cache table in Job B.

Is the data in the persistent cache table loaded into memory by the Dataflow that uses it in a lookup() that is set to pre-load cache? Or is the lookup() smart enough to know that the persisted data is already in a local database so there is no need to load it into memory?
What is the effect of DOP=10? Will it load the 25 million rows of data 10 times - once for each parallel thread? I think in this case - see below - the answer is that the lookup is instantiated only once, not ten times.

Job B fails when running with “Use collected statistics”, but it works when “Collect statistics for optimization” is turned on. Subsequent runs with “Use collected statistics” still fail every time.

This is the monitor log. Note that most of the transforms have 10 threads while the tranform “Query_previous_row_and_lookup” appears to be single threaded with additional threads for each of the three lookups.

+df_use_persistent_cache/Query_sort STOP 4939449 1528.508 2128.601
+df_use_persistent_cache/TCRdr_15 STOP 493754 0.000 2128.601
+df_use_persistent_cache/TCRdr_15[1] STOP 493876 0.000 2128.616
+df_use_persistent_cache/TCRdr_15[2] STOP 493865 0.000 2128.616
+df_use_persistent_cache/TCRdr_15[3] STOP 493887 0.000 2128.616
+df_use_persistent_cache/TCRdr_15[4] STOP 493820 0.000 2128.616
+df_use_persistent_cache/TCRdr_15[5] STOP 493745 0.000 2128.616
+df_use_persistent_cache/TCRdr_15[6] STOP 493796 0.000 2128.616
+df_use_persistent_cache/TCRdr_15[7] STOP 493753 0.000 2128.616
+df_use_persistent_cache/TCRdr_15[8] STOP 493748 0.000 2128.616
+df_use_persistent_cache/TCRdr_15[9] STOP 493642 0.000 2128.616
-df_use_persistent_cache/Merge_PRODUCT_FACT STOP 23070 1371.379 2128.648
/df_use_persistent_cache/Map_Operation STOP 23070 1371.004 2128.616
/df_use_persistent_cache/Map_Operation[10] STOP 2100 1261.532 2128.632
/df_use_persistent_cache/Map_Operation[11] STOP 23070 1371.004 2128.632
/df_use_persistent_cache/Map_Operation[1] STOP 2200 1370.957 2128.616
/df_use_persistent_cache/Map_Operation[2] STOP 2200 1311.533 2128.616
/df_use_persistent_cache/Map_Operation[3] STOP 2200 1285.189 2128.616
/df_use_persistent_cache/Map_Operation[4] STOP 2200 1283.017 2128.616
/df_use_persistent_cache/Map_Operation[5] STOP 2200 1282.876 2128.616
/df_use_persistent_cache/Map_Operation[6] STOP 2107 1282.424 2128.616
/df_use_persistent_cache/Map_Operation[7] STOP 2100 1280.767 2128.616
/df_use_persistent_cache/Map_Operation[8] STOP 2100 1272.017 2128.616
/df_use_persistent_cache/Map_Operation[9] STOP 2100 1270.751 2128.632
/df_use_persistent_cache/Merge: 0 STOP 21507 1370.972 2128.632
/df_use_persistent_cache/MergeSplit: 0 STOP 23070 1371.004 2128.616
/df_use_persistent_cache/Merge[1]: 0 STOP 1563 1283.908 2128.632
/df_use_persistent_cache/Merge[2]: 0 STOP 23070 1370.972 2128.632
/df_use_persistent_cache/Query_Inserts STOP 200 1283.893 2128.632
/df_use_persistent_cache/Query_Inserts-Mapping15 STOP 200 1275.908 2128.632
/df_use_persistent_cache/Query_Inserts-Mapping16 STOP 200 1236.766 2128.632
/df_use_persistent_cache/Query_Inserts-Mapping17 STOP 200 1210.734 2128.632
/df_use_persistent_cache/Query_Inserts-Mapping18 STOP 200 1164.295 2128.632
/df_use_persistent_cache/Query_Inserts-Mapping19 STOP 163 968.822 2128.632
/df_use_persistent_cache/Query_Inserts-Mapping20 STOP 100 911.367 2128.632
/df_use_persistent_cache/Query_Inserts-Mapping21 STOP 100 781.895 2128.632
/df_use_persistent_cache/Query_Inserts-Mapping22 STOP 100 750.285 2128.632
/df_use_persistent_cache/Query_Inserts-Mapping23 STOP 100 720.284 2128.632
/df_use_persistent_cache/Query_Updates STOP 2200 1370.957 2128.616
/df_use_persistent_cache/Query_Updates-Mapping10 STOP 2107 1282.424 2128.616
/df_use_persistent_cache/Query_Updates-Mapping11 STOP 2100 1280.767 2128.616
/df_use_persistent_cache/Query_Updates-Mapping12 STOP 2100 1272.017 2128.616
/df_use_persistent_cache/Query_Updates-Mapping13 STOP 2100 1270.751 2128.616
/df_use_persistent_cache/Query_Updates-Mapping14 STOP 2100 1261.532 2128.632
/df_use_persistent_cache/Query_Updates-Mapping6 STOP 2200 1311.518 2128.616
/df_use_persistent_cache/Query_Updates-Mapping7 STOP 2200 1285.189 2128.616
/df_use_persistent_cache/Query_Updates-Mapping8 STOP 2200 1283.017 2128.616
/df_use_persistent_cache/Query_Updates-Mapping9 STOP 2200 1282.876 2128.616
/df_use_persistent_cache/Query_calculate STOP 4939449 1409.286 2128.601
/df_use_persistent_cache/Query_previous_row_and_lookup STOP 4939449 1409.036 2128.601
/df_use_persistent_cache/Query_previous_row_and_lookup-Function2 STOP 4939449 1528.867 2128.601
/df_use_persistent_cache/Query_previous_row_and_lookup-Function3 STOP 4939449 1525.196 2128.601
/df_use_persistent_cache/Query_previous_row_and_lookup-Function4 STOP 4939449 1525.179 2128.601
/df_use_persistent_cache/Query_previous_row_and_lookup-Mapping1 STOP 4939449 1528.602 2128.601
/df_use_persistent_cache/Query_previous_row_forlookup STOP 4939449 1409.130 2128.601
/df_use_persistent_cache/Query_waiting_period_ind_and_process_status_code STOP 4939449 1409.489 2128.601
/df_use_persistent_cache/Round_Robin_Split STOP 4939449 1409.645 2128.601
/df_use_persistent_cache/Round_Robin_Split[1] STOP 21507 1370.957 2128.616
/df_use_persistent_cache/Round_Robin_Split[2] STOP 1563 1283.893 2128.632
/df_use_persistent_cache/Table_Comparison: 0 STOP 494000 1411.083 2128.601
/df_use_persistent_cache/Table_Comparison: 1 STOP 493754 1410.911 2128.601
/df_use_persistent_cache/Table_Comparison[1]: 0 STOP 494000 1411.083 2128.616
/df_use_persistent_cache/Table_Comparison[1]: 1 STOP 493876 1410.989 2128.616
/df_use_persistent_cache/Table_Comparison[2]: 0 STOP 494000 1411.098 2128.616
/df_use_persistent_cache/Table_Comparison[2]: 1 STOP 493865 1410.989 2128.616
/df_use_persistent_cache/Table_Comparison[3]: 0 STOP 494000 1411.130 2128.616
/df_use_persistent_cache/Table_Comparison[3]: 1 STOP 493887 1410.958 2128.616
/df_use_persistent_cache/Table_Comparison[4]: 0 STOP 493949 1411.114 2128.616
/df_use_persistent_cache/Table_Comparison[4]: 1 STOP 493820 1411.005 2128.616
/df_use_persistent_cache/Table_Comparison[5]: 0 STOP 493900 1410.958 2128.616
/df_use_persistent_cache/Table_Comparison[5]: 1 STOP 493745 1410.864 2128.616
/df_use_persistent_cache/Table_Comparison[6]: 0 STOP 493900 1410.958 2128.616
/df_use_persistent_cache/Table_Comparison[6]: 1 STOP 493796 1410.771 2128.616
/df_use_persistent_cache/Table_Comparison[7]: 0 STOP 493900 1411.020 2128.616
/df_use_persistent_cache/Table_Comparison[7]: 1 STOP 493753 1410.833 2128.616
/df_use_persistent_cache/Table_Comparison[8]: 0 STOP 493900 1410.974 2128.616
/df_use_persistent_cache/Table_Comparison[8]: 1 STOP 493748 1410.895 2128.616
/df_use_persistent_cache/Table_Comparison[9]: 0 STOP 493900 1410.974 2128.616
/df_use_persistent_cache/Table_Comparison[9]: 1 STOP 493642 1410.802 2128.616

eganjp (BOB member since 2007-09-12)