BusinessObjects Board

DS 4.2: DF fails "possibly due to insufficient memory&q

Environment: Data Services 4.2 (64-bit Linux).

We are testing new version (preparing for upgrade from 3.2) and observing quite a strange behavior.

DF fails as soon as we enable either one of two Execution Properties:

  • Collect statistics for optimization
  • Collect statistics for monitoring

The error message is vague:

The error is reproduced only if one of the transforms inside DF use caching. I.e.:

  • DEMAND_LOAD_CACHE of lookup_ext function.
  • “Cached comparison table” of Table_Comparison transform.

As soon as I switch of the caching option it works fine.

The type of target load does not affect the problem (bulkload/row-by-row).

So, my questions would be:

  1. Could it be due some missing or incorrect options in DSConfig.txt?
  2. What exactly is happening when I tick those two checkboxes in execution properties to collect statistics for monitoring and for optimization?
  3. Do these options really make any significant impact on performance? Can we just not use them altogether and still be fine? :slight_smile:

I would greatly appreciate any advice on direction to look at. It does look like a bug a bit. :slight_smile:

Thanks!


shamanoff :new_zealand: (BOB member since 2010-06-17)

It’s probably not a memory issue. See this thread for how I resolved the error:


eganjp :us: (BOB member since 2007-09-12)

Thank you for your answer! Unfortunately re-importing objects didn’t help me to resolve the issue.

The error appear as soon as I tick one of the checkboxes to generate statistic in Job execution properties AND data flow is using in-memory caching somewhere inside (I tested it with Table Comparison and lookup_ext() function).

If one of these two conditions is not met the error does not appear. I am still testing and other parameters of the environment seem not to affect this behaviour. I will continue to test and experiment further, i.e.:

  1. Testing the behaviour on repository which was created in 4.2, not upgraded to one.
  2. Testing the behaviour for different type of source/target database.
  3. Testing the behaviour when caching and target tables are different.

I have a suspicion that may be our DSConfig.txt file is not configured fully/properly.

I was trying to find any information on WHAT exactly is happening when we enable the statistics generation for job execution to help me in troubleshooting, but didn’t find any useful resources unfortunately.


shamanoff :new_zealand: (BOB member since 2010-06-17)

Go into the DSConfig.txt file on the job server and change the Global_DOP from 2 to 1. In DS 3.2 the default was 1. With a number of my clients I have found that this little change between 3.x and 4.x can create a large number of problems. Switching back to 1 fixes a number of those problems.

Often, if Dataflows are not coded with DOP > 1 in mind then there can be issues. One example is where the initial Query transform performs an ORDER BY to support the Sorted Input of the Table Comparison. But if there are other Query transforms between the initial one and the Table Comparison the rows can get unsorted if DOP > 1.

Alternatively, just change the Dataflow’s properties to force DOP =1.


eganjp :us: (BOB member since 2007-09-12)

DOP didn’t affect the behavior at all.

I just noticed that during my tests the error message changed to:

I also reproduced the problem under the following conditions:

  1. Oracle, Netezza, mixed source/targets/lookups.
  2. DS repository created in 4.2.

shamanoff :new_zealand: (BOB member since 2010-06-17)

Additional info:

After enabling “Transform Tracing” I can see that during job execution Data Services performs the following operation as many times as many rows the cached comparison table has:

In my case target tables had 60k rows and this operation was performed 60k times.

The failed job log (options “generate statistics” are enabled) looks like this:

Log of the healthy run (disabled options “generate statistics”) looks like this:

The first part of the log (and I enabled memory tracing as well) looks exactly the same.


shamanoff :new_zealand: (BOB member since 2010-06-17)

Remove the table compare, check if it works. Table compares can be disaster.

And do you test a bulk load with a table compare? That would be a bit strange.


Johannes Vink :netherlands: (BOB member since 2012-03-20)

What are the volumes of these tables? I’ve had some nasty crashes with one development job myself until I realised one lookup actually had the caching options enabled - even though the lookup table was very large and our poor little development DS4.1 box ran out of RAM … and crashed.

(I could reproduce this problem every single time - it really was the caching option in that lookup, against that table - changing it no cache resolved the issue immediately.)

How did you upgrade your code from 3.2 to 4.2? Did you perform a repository upgrade (the right way) or did you export an ATL directly into your new DS 4.2 testing environment (the wrong way)?


ErikR :new_zealand: (BOB member since 2007-01-10)

The problem is reproduced on a very small tables (3 columns, few records).

As long as we do not tick those two checkboxes in Job Execution properties, the problem do not appear and everything working fine. By everything I mean “Caching”, “Bulk load”, all transforms etc.

Only two things are required to be enabled to reproduce the problem: “caching” (in any way, does not have to be TC transform) and “Generate statistics” … nothing else (bulk load, database type, target and source objects) affects the behaviour.

It would be really great to understand what exactly this “Generate statistics” does under the hood, i.e.:

  • where DS stores the generated statistics.
  • what objects and structures DS generates statistics on.
  • how exactly does DS use generated statistics.
  • which processes/scripts/jobs perform statistics generation and at which points of execution of the job and how.

shamanoff :new_zealand: (BOB member since 2010-06-17)

Hi Erik, we upgraded using DS Repository Manager. I actually didn’t know about ATL option as documentation is quite clear about right way. :slight_smile:

The problem seems not to be related to upgrade itself. I have created new repository in 4.2 from scratch and have reproduced the same behavior in it.


shamanoff :new_zealand: (BOB member since 2010-06-17)

The generated statistics are held in the repository table AL_CACHE and AL_CACHE_STATS. If some knucklehead locked that table down such that you can’t write to it then that could be a problem but perhaps not with the symptoms you’ve described.

When the Dataflow runs it stores metadata about the execution in AL_STATISTICS. This is different from gathering or using gathered statistics.

Is it possible that your job server is horribly starved for memory?


eganjp :us: (BOB member since 2007-09-12)

Hi Jim,

Server has 8Gb which I guess should be more than enough for a test purposes. There is no workload on a server whatsoever.

I checked AL_CACHE and AL_CACHE_STATS tables and they both are empty. Schema owner has full access to his schema of course.

I am trying to locate the core dump of the failed job process.

stack_trace.txt file in $LINK_DIR/log folder has nothing informative:

There are no files created in /var/ folders ( I would think that on Red Hat core dumps should be created there).


shamanoff :new_zealand: (BOB member since 2010-06-17)

On a 64 bit O/S it’s pretty easy to exhaust 8 GB of memory. The job server is pretty good about allocating memory based on how much is available. But I would still configure more than just 8 GB. I run 32 GB on my desktop PC.


eganjp :us: (BOB member since 2007-09-12)

When you say that you recreated the problem in a new DS 4.2 repository - did you also create a new date flow to test with, or did you import one of the existing (DS 3.2) data flows into that new repository?

I think it would be good to isolate if this is a problem with migrating DS3.2 objects to DS 4.2 or if this is a bug in DS 4.2 that occurs with new DFs as well.


ErikR :new_zealand: (BOB member since 2007-01-10)

I was creating new DFs from scratch for every test case.


shamanoff :new_zealand: (BOB member since 2010-06-17)

OK in that case, definitely log a call with SAP. This may well be a bug with DS 4.2 Sp1 and your specific configuration.


ErikR :new_zealand: (BOB member since 2007-01-10)

I am having exactly the same issue as you are.

Just did an upgrade from 3 to 4.2 and now on certain DF’s i get the error message you are getting.

I have also recreated everything from scratch and still get the same message.

I wish I could shed some light on the issue, but I have had not luck yet… i’m hoping you may have fund something out by now.


bungy :us: (BOB member since 2007-10-04)

ok, so i just find a very odd work around.

If i place a merge transform before my table comparison, the dataflow works.

I must be tricking DS in doing what it is supposed to do all along, but i am not sure what or why it works.


bungy :us: (BOB member since 2007-10-04)

Look at the Display Optimized SQL in the Dataflow, both with and without the Merge transform. I suspect that the Merge is forcing a different approach to how the Dataflow is using the database.


eganjp :us: (BOB member since 2007-09-12)

I’ve double check and I have the same optimized sql wether i use the merge transform or not. Only difference is it actually runs with the merge transform (doing nothing but passing through the data) and it doesn’t run without it.

In fact it doesn’t even get into the DF, I just get the Insufficient Memory error. :hb:


bungy :us: (BOB member since 2007-10-04)