Speed of persistent cache

Hi,

After the holidays with my girlfriend and the ‘BODS Performance optimization guide’ :wink: i was eager to test/ benchmark some theories with a basic-setup (zero measurement) and measure what the real difference in performance is.

Regarding the Persistent Cache i didn’t notice any performance advantage, here is what i did…

Dimension table ‘DIM’: 1 PK column (Integer/ sequence), 2 UK columns (integer and date). Both are indexed, 10 million rows.

1. Normal lookup flow (zero-measurement):
Rowgen (10 mil) --> Lkp in dimension (pre-load-cache) --> bulkload into fact.

2. Build up persistent cache table:
‘DIM’ (oracle source table) --> ‘DIM_PERS’ (persistent cache). I defined the key columns on teh same columns as i use in the lookup (UK)

3. Lookup Persistent cache flow:

Exactly the same as the normal lookup flow, however with a lookup to the Persistent caching table.

Results
The persisent-cache-lkp flow and the normal-lkp flow take exactly the same time to finish (i did several runs).

I would expect the persistent cache to be faster because the persistent cache doesn’t have to query the database and there is no extra I/O. Obviously i am missing something here, but what?
When should the persistent cache be faster?


BBatenburg :netherlands: (BOB member since 2008-09-22)

I would expect a lookup with pre-load cache to run in the same time
as using a lookup against the persistent cache. All you ‘saved’ is the time to load the cache - although you had to do it in another DF which itself takes time.

If there are a lot of values to load into the cache, and a lot of lookups spread throughout the job you might save some time by only loading the cache once. However, if you’re doing this sort of thing the chances are the lookup table itself is being modified and every time that happens the persistent cache will need to be reloaded.


dastocks (BOB member since 2006-12-11)

Exactly, i should win the time to build up the lkp-cache. Example:

Pre-load cache lookup flow:

  1. Build up cache: 10 minutes
  2. Lookup values: 15 minutes
    Total: 25 minutes

In this case i would expect the PC (persistant cache)-flow would take 15 minutes in stead of 25 (compared to the normal cached lookup). Because the PC-table is allready build and it only has to execute step #2.

Reality is it doesn’t, it takes 25 minutes… I don’t get it…


BBatenburg :netherlands: (BOB member since 2008-09-22)

I believe that the jobserver only tries to keep the PC data in memory on a ‘best endeavours’ basis. If the PC data is flushed from memory and then read from disk again for each lookup DF you’re probably not gaining anything.

Is the DF doing the lookups running with In-memory or Pageable cache? - this might make a difference. You could also try setting the lookup itself to run as a seperate process within the DF.

Some years ago I tried using the PC for all static (or near static) dimension lookups in a CDC job that runs in a continuous loop. There was no perceptible difference in performance, and I couldn’t see any benefits elsewhere (e.g. reduced load on DB server) so I decided it wasn’t worth the bother.

It would be interesting to hear from anyone who lurks on this forum who does use the persistent cache.


dastocks (BOB member since 2006-12-11)

Assuming we’re talking about batch jobs…

Persistent Cache doesn’t keep anything in memory across dataflows. Every time you go to do a lookup against the persistent cache datastore, it has to re-read it off of disk. Whereas your database engine (Oracle, MySQL) has substantial caching capability, assuming you have plenty of memory.

Where Persistent Cache is useful is where you have to build a very complex lookup that can be reused several times in a job, or, if the looked-up data is remote (over a slow WAN connection) and you want a local copy. In both cases, the cost/time of building that lookup would have to be exceeded by the benefit of having the cache local to DI on-disk. As you experienced, it often doesn’t.

What would be much more useful IMO is for DI to allow In-Memory Datastores to be usable with batch jobs (not just realtime). Then you could pin your frequently-used dimension lookups into memory.


dnewton :us: (BOB member since 2004-01-30)

See the table at the bottom
https://wiki.sdn.sap.com/wiki/display/BOBJ/DI+Caching+Example

My first question would be what the trace log said in the first case. If it did use the pageable cache anyhow, the lookup times will be identical as the same code is executed. The only difference will be

  • How long does it take to read the lookup table and build the cache - Case1
  • How long does it take to read the prebuild cache structure from disk to memory - Case3

If the disk where the pageable cache is placed is very slow, Case1 can be faster than the sum of Case2+3 because a file is not created in Case1 unless it exceeds the threshold of 1.5GB of memory.
But usually if you cache millions of rows the Case2 will take a long time to build the cache table and this time will be included in your Case1 as well.

To answer your question why both show the same execution time:
The lookup is not the bottleneck, the bulkload is. Imagine for a second, the load would take 1 minute per row. Would you expect to see a difference depending on the lookup method? No, not really. You might see that the one consumes a little bit more CPU than the other as “slow” means “more to do”. So one is more idle than the other.
And in fact we have that situation, lookup is times faster than bulkload. Hence I discard the data for such tests instead of loading it.


Werner Daehn :de: (BOB member since 2004-12-17)

I want to give a thumbs up to what dnewton said above as I very much agree with this:

I personally agree with this other statement he made. However, this is (IMHO) an extremely advanced feature that should be used only if the developer can answer 10 questions that only an experienced developer would know. In other words, this feature could get a lot of inexperienced developers in trouble because of the perceived “Coolness Factor” of it without realizing the implications.

Anyhow, great thread on persistent cache that should be required reading.


eganjp :us: (BOB member since 2007-09-12)

:yesnod: Yes, you’d only do this if you thought you were smarter than your DBMS’ resource-allocation algorithms. :wink:


dnewton :us: (BOB member since 2004-01-30)

My list of things in that area:

  • persistent cache database is moved out of the dataflow into the jobserver process. Hence if one dataflow is done, the cache is still available for the next dataflow and does not need to be read from disk.

  • optimizer collects the usage of caches and tables initially and caches the data upfront. Example: DF1 does a lookup on CUSTOMER where KEY=? returns C_ID. DF2 does a lookup on CUSTOMER where key=? returning LASTNAME. DF5 loads the CUSTOMER table and DF6 does a lookup on CUSTOMER after DF5. So why should both DF1 and DF2 read the CUSTOMER table, we read the table once and save KEY, C_ID and LASTNAME in the cache when executing DF1 and we keep the cache in the persistent cache database. After DF2 is done, the cache is release although DF6 does a lookup as well because in the meantime DF5 will have entered new rows into the table. Hence we have to read it again. (And later we can even optimize that)

  • Cache reader can have multiple reader threads

  • cache reader uses partitioning information for reading and for the lookup, e.g. if I am reading all order-partitions of region1 and I have a lookup on C_ID=C_ID and REGION=REGION, I know that as REGION can be region1 only, I need the lookup partition of region1 only.


Werner Daehn :de: (BOB member since 2004-12-17)

I ran into an interesting (to me at least) issue today in a job that has been failing over the past few days after being in production for almost two years runnig the same version of DS 12.1.1.4 the entire time.

The Dataflow uses DOP=10 and has a lookup() (pre-load cache) that uses a persistent cache table as a source. The persistent cache table has 25 million rows in it. Looking at the monitor log I’m unable to clearly determine what is going on with the lookup and it’s effect on memory.

The persistent cache table is loaded by a separate job (Job A) and only the Dataflow (above) reads from the persistent cache table in Job B.

  1. Is the data in the persistent cache table loaded into memory by the Dataflow that uses it in a lookup() that is set to pre-load cache? Or is the lookup() smart enough to know that the persisted data is already in a local database so there is no need to load it into memory?
  2. What is the effect of DOP=10? Will it load the 25 million rows of data 10 times - once for each parallel thread? I think in this case - see below - the answer is that the lookup is instantiated only once, not ten times.

Job B fails when running with “Use collected statistics”, but it works when “Collect statistics for optimization” is turned on. Subsequent runs with “Use collected statistics” still fail every time.

This is the monitor log. Note that most of the transforms have 10 threads while the tranform “Query_previous_row_and_lookup” appears to be single threaded with additional threads for each of the three lookups.


eganjp :us: (BOB member since 2007-09-12)