We developed a job, it is a source for 40 crores records table. We have a business scenario in each quarter suddenly huge no.of updates are happening at the source system. These changes also we need to capture in our datawarehouse environment as well.
We are receiving appx. 1600 files on that day. Our job is processing files one after the other. First few files job is performing well. After some files or some time it is taking huge time to do the same type of operation.
First file - 20 sec
100 th file --1000sec(one file processing only not for 100 files)
500 th file — 30000 sec(one file processing only not for 500 files)
I am not able to understand the reason behind this. Is there any problem with Job server load? Any ideas? Please share your views…
check for any possible option of giving break in between… if you put load at a time on server… it may struck at some point of time… try to divide the total operation in batches and then check… it should work…
We need to identify the problem first. My assumption, as you are talking about updates and that’s an expensive task for the target database, would be it is the loader. So why don’t you take your dataflow and add a Map_Operation right before the target where you discard the data. This way all files will be read, completely transformed but not loaded. If that’s quick we know we have to do something on the loader side, e.g. check the database settings like indexes etc. or use different methods for loading. And if I am wrong we need to continue the search.