I am getting the Access Violation Error, Sometimes Virtual Memory Error too.
For the Access Violation Error, I went through various pages and it has been asked to avoid using the Cache of Windows or change the DF.
I have no rights in changing the DF. So the only way is to deal with Memory Side.
I am unclear about how to switch off this Windows Cache memory usage.
Please clarify this
In my production system, the lines something like below is not coming which does means they dont use the Memory Cache?
(12.0) 06-29-09 13:59:33 (28320:20352) DATAFLOW: Cache statistics determined that data flow <DF_CHECK > uses <0> caches with a total size of <0> bytes. This
is less than(or equal to) the virtual memory <1608515584> bytes available for caches. Statistics is switching the cache type to
IN MEMORY.
(12.0) 06-29-09 13:59:33 (28320:20352) DATAFLOW: Data flow <DF_CHECK> using IN MEMORY Cache.
Directly it says DF starts and then DF completed successfully messages only.
In short, you have the job property “use collected statistics from optimization” turned on.
DI has two caching methods: in-memory and pageable-cache.
With in-memory DI allocates as much OS memory it wants. This method is simple but if the amount of data is large the system starts to swap or simply run out of memory.
With pageable-cache turned on we use the Oracle Berkeley database, an in-memory database that has the capability to overflow data to disk. So you set the maximum amount of memory you would allow to use and if more is required, data is overflown to disk.
The big question now is, what method to use, in worst case the same dataflow might require different settings as once it deals with large amounts and the other time it does not. So we implemented a feature to collect memory statistics in the repo during the execution and next time when executing the dataflow the engine can automatically decide what to pick.
You message now says that there are memory statistics available - you have executed the job with the execution property “collect statistics for optimization” - and this dataflow requires 0 memory. Secondly, you execute the job with the property “use collected statistics for optimization” and therefore the dataflow is set automatically to in-memory cache method - very unlikely a dataflow with nothing to cache runs out of cache memory.
But I would like to know if this is newly implemented in Data Services.Because I can find none of those check boxes in 11.0.1 DI Version. Correct me if i am wrong.
If such is the case, how DI 11.0.1 has dealt this?
I tried this out in DEV 12.0. The message I attached is from 12.0!!!
Thanks Werner.
Inspite of this improved Memory Usage method, my Job is not giving Error in 11.0.1 (Production) and providing error in 12.0.
Does this mean, the Job Server Machine is running out of memory and I need to increase the Virtual Memory? Because sometimes I am getting Virtual Memory Error also
I would rather check the dataflow for improvements. Caching large volumes of data is not good performance wise anyhow. So post a screenshot of your dataflow and describe it a bit. Then we can suggest optimizations.
To be frank, I have no rights to change the Design !!! Thatz a sad side for me.
Anyhow I will put the same question in front of you.
If my production Server can Run the same job successfully in 11.0.1 Version of DI (where there is no concept of IN Memory Cache or Pageable Cache),
why my DS 12.0 is unable to Run this (inspite of the improved performance of the Application )
I am asking this because, if the problem is purely on Memory of the Server, then my first action should be asking to increase the Memory Size and not for looking into optimizing the DF (They will say itz not your job )
Assuming the amount of data is the same in both environments, most likely the execution plan chosen by the DI optimizer is different due to the different versions and their capabilities.
It’s unlikely adding memory to the server will fix the problem… it’s probably that the dataflow is set to memory-only (and not pageable) and you’re exceeding the 2Gb limit for a 32-bit process.
When the job is run, be sure “use collected statistics” is un-checked.
If you don’t have the power to change the job… what is your role here? Why are you debugging it? Can you find someone who does own the job/dataflow, and convince them they need to update the design?
The message to your employer should be: It’s unrealistic to expect a huge jump from 11.0 to 12.x would involve no design changes.
Haha I do very well understand, but what to do, in this case. It is my access level here.
Anyhow I can suggest about these solutions.
In between, I wanted to highlight, the Access Violation Error pops up for the DF that is getting executed using the Pageable Cache and not IN Memory!!!
So I thought of changing the DF’s to use IN Memory directly and Run.
Cache statistics determined that data flow <DF_CHECK > uses <0> caches with a total size of <0> bytes. This is less than(or equal to) the virtual memory <1608515584> bytes available for caches. Statistics is switching the cache type to IN MEMORY.
This means even though the dataflow might be set to Pageable, DI thinks, incorrectly, that the dataflow uses so little memory that it can switch and force a in-memory process only. So you need to run the dataflow/job with “collect statistics” ON, but “use collected statistics” OFF. That will get fresh stats for DI to use in future runs. Or always just run it with “use collected stats” OFF and never worry about it again.
Hey thatz fine. When I began this thread, I was unware about this concept!!!
I tried all means. But no luck.
But I understood, that by using Pageable Cache, the AL_ENGINE eats only less memory when compared to IN MEMORY. Thatz gr8.
To give a outline about my DF.
There are 2 Source Tables coming up from 2 different Oracle DB’s !!
Before the join of 2nd Table, the 1st table is added up with a Query Transform. In that Query Transform, they do LookUP with a Table that lies in another DB (a 3rd DB). Here for all lookup’s they does use Cache Spec : PRE-LOAD CACHE and when they join this Query Transform Output with the 2nd Table, there in the LOOKUP they does use Cache Spec to be NO CACHE
Can you help me with these Inputs?
To be frank I don’t know what functionally goes here.
With those number of rows it does not matter, that’s processed within seconds. The problem has to be somewhere else, table 3, the cached lookups, the row-by-row lookup performance
As you said, there were many LOOKUP Calls and to around 3 or 4 DS Connections, there were lookups done.
But there is one LOOKUP where I have a Doubt.
The Translate Table coming up has 1Million+ records.
Moreover that is not a Table and it is a VIEW. And when I tried “View Data” on that Table from the DS, it is approximately taking 3 to 5 mins to load. Sounds the problem is arising at this point.
So I made PRE_LOAD_CACHE from NO_CACHE here. Then also no use.
Anyone I am helpless at this juncture. Please pour in your ideas
I will not be able to post the DF here. It is too too huge
Anyhow please help me in understanding 2 things
→ For DS 12.0, you said about the concept of Oracle berckley. But when this one is getting used, which memory will be used? Whether the System memory on which Job server runs or from somewhere?
→ For 32-bit application, I understand, a max of 2 GB mem only can be used. But if the system’s Virtual memory is getting used by other applications how we can tell 2GB is used by AL_ENGINE? It wont right?