Issue
We are migrating our production ETL code from SQLServer 2000 / Data Integrator 11.7 to SQLServer 2008 / Data Services 12.2.1 and replacing old hardware with new. We have significant performance degradation in spite of the new hardware being considerably more powerful than the old hardware.
Environment
SQLServer 2008 Enterprise SP1 on 64-bit Windows 2008 Enterprise SP2. 2 node physical cluster. The local repository is on this server.
Data Services 12.2.1 on 64-bit Windows 2008 Enterprise SP2 sitting on VM Ware. VM Server resides on ESX. The virtual server is provided with 4GB RAM, 4 processors
Anti Virus software is running on the servers but is not in on-access mode. Weekly scans of the servers is conducted instead.
Our old ETL server was physical whereas the new one is virtual but our benchmarking last summer showed that this did not make any difference in our scenario.
Test Harness
Consists of 2 dataflows executed in series. The first copies the claim_line_movement table in the KDR to KDRStaging Claim_line_movement_bak_20087 with a delete before load. Total 4 milllion rows
The second dataflow selects out of the backup table, does a group by to remove duplicates (selects the max(repos_id)) and then loads back into the KDR (with a delete before load).
The copy dataflow takes 50% longer in new production than old production (8mins 37 as opposed to 5mins 39)
Tests
Select from SQLServer – 2mins 14s
Here we are taking the SQL executed by the copy dataflow (select * from claim_line_movement) and run it in SQLServer studio. No problem
Select from SQLServer and job server dumps the data – sub 2 mins
Here the copy dataflow is changed so that we read the data into a map_operation component and discard all data. Tests getting the data as far as the job server.
Strangely quicker than SQL on the SQLServer but maybe some caching issue, but no problem.
Conclusion: The SQLServer is ok reading
Load 4 million records generated on ETL server into db server – 7.1 mins
Created a job with a row_gen component generating 4 million rows into a query component which hard codes extra columns and loads into the target. This is a problem
Conclusion: The write is taking too long - must be SQL write, network or ETL server
Load 4 million records into SQLServer using SQL studio – 26s
Generated a 160 row table and did an insert select from 3 copies of that version with no join condition. Added the extra columns. No Problem
Conclusion: The db seems to be able to write ok, so network or ETL Server
Moved the (virtual) ETL server onto a different physical server – marginal improvement but not sufficient
Conclusion: Physical server doesnÂ’t seem to matter
I think there are a few things we could do to improve the performance of isolated jobs and certainly using the bulk loader is what we could try here. However, the problem is bigger than isolated jobs because all jobs run slower than our existing system and yet our hardware is massive. I think there must be some environmental problem somewhere which is causing the general degradation in addition to isolated bits of non-optimal code.
Why don’t you run the benchmark job I have in the signature, it would give you an option to compare with others. Are others getting the same load performance if the load across the network, if DS is on VMWare, if …
Here are our results from running the benchmark tests:
The results to look at are in columns L and M
M is our existing production system (DI 11.7, SQL2000, Win2003, 32bit, Physical)
L is our aspiring production system (DI 12.2.1, SQL2008, Win2008, 64 bit, Virtual)
specs further detailed in the tabs.
As you can see M (on hardware 4 years old) is significantly quicker than L (4 months old)
We have tried L on physical, on 32 bit, on different physical hosts but nothing as quick as M ETL Benchmark Results.xls (57.0 KB)
Dev/QA environment: 4 processors server, 12gb ram, Windows 2003, SQL server 2005, DS 12.2
Production environment: 48 processors, 256gb ram, windows 2008, sql server 2008. DS 12.2.
We expect better performance in our production environment, but it is worst than the performance in Dev/QA environment. In the benchmark test, job in Dev/QA completed in 2h9s, the identical job ran for 2h 39m in prod.
Could you please share your experience? what did you do to improve the performance. Thanks.
Hi,
I am the originator of this thread and in the end it was as simple as making our ETL server physical. This inspite of the benchmarking we did (now) 2 years ago.
Just shows you need to be careful about the benchmarks because at the end of the day its the production jobs that are important. That said, the benchmarks are very useful and we still use them.