BusinessObjects Board

Benchmark YOUR hardware and compare the results V1.1

I have prepared a job with a few dataflows (DI 6.5 and above) to compare your hardware with others.

Please post in this thread here and provide us with the following information like in this example:

Name of the Test: VMWare image
Source: Oracle 9i
Network source to DI: 10MBit Ethernet
DI server: Windows 2003SP1 32bit /1 CPU @ 2.4GHz/512MB/1Disk (VMWare)
DI: 6.5
Network DI to Target: shared with above
Target server: unknown
Target: Oracle 9iR2

and the number of seconds as printed in the trace log. I will then update the table with the results.

Installation details are found here: http://wiki.sdn.sap.com/wiki/display/EIM/Performance+characteristics+at+customers


Werner Daehn :de: (BOB member since 2004-12-17)

Test 1 - Source and DI on same server, different target server

Source: MS SQL 2005

Source Server: Win2003 r2 SP1 x64, (2) 2.4Ghz AMD CPUs (dualcore), 16GB RAM, small RAID 5 array

DI Server: Same as above (on same machine as source db)

Network, source to DI: n/a

DI Version: 11.7

Network, DI to Target: Gigabit ethernet with fiber backbone

Target db: MS SQL 2000

Target server: Win2003 SP1 32-bit, (2) 3.4Ghz XEON (dualcore), 4Gb RAM, large RAID 10 array

Results

<DF_Benchmark_read_MS>: 94 seconds
<DF_Benchmark_API_bulkloader_MS>: 269 seconds
<DF_Benchmark_regular_load_MS>: 725 seconds
<DF_Benchmark_single_thread_MS>: 99 seconds
<DF_Benchmark_lookup_DOP1_MS>: 57 seconds
<DF_Benchmark_lookup_DOP10_MS>: 27 seconds

The delta between the bulk load and regular load is surprising to me.

However, I should say that your “regular load” dataflow had 10 loaders in it. We’ve found than more than 2 or 3 target loaders overwhelms our target SQL environment.

Also - the bulkload works well in this test because you have no index at all on your target table. In real-world testing, we haven’t found bulk loading to be that much faster in SQL, at least on our servers.


dnewton :us: (BOB member since 2004-01-30)

Test 2 - Source, Target, and DI on same server

Source: MS SQL 2005

Single Server: Win2003 r2 SP1 x64, (2) 2.4Ghz AMD CPUs (dualcore), 16GB RAM, small RAID 5 array

DI Server: Same as above

Network, source to DI: n/a

DI Version: 11.7

Network, DI to Target: n/a

Target db: MS SQL 2005

Target server: Same as above

Results

<DF_Benchmark_read_MS>: 86 seconds
<DF_Benchmark_API_bulkloader_MS>: 216 seconds
<DF_Benchmark_regular_load_MS>: 389 seconds
<DF_Benchmark_single_thread_MS>: 23 seconds
<DF_Benchmark_lookup_DOP1_MS>: 55 seconds
<DF_Benchmark_lookup_DOP10_MS>: 25 seconds

The last 2 tests are virtually the same results as above. Most other tests improved, which suggests either the network was a bottleneck in the previous Scenario (my other post), or, the disk subsystem is faster on the single machine (which normally I’d say is unlikely, however we are suspicious about the machine in Test 1 as having a problem).


dnewton :us: (BOB member since 2004-01-30)

Regarding:

<DF_Benchmark_regular_load_MS>: 389 seconds

This dataflow had 10 loaders (DOP of 0), each with 10000 rows per commit.

Changed to 2 loaders @ 10000 rows per commit = 368 seconds

Changed to 2 loaders @ 1500 rows per commit = 477 seconds

For what it’s worth; at least in this test, the number of loaders (versus the rows per commit) doesn’t seem to help whether it’s 2 or 10. Well, 2 loaders is slightly faster than 10.


dnewton :us: (BOB member since 2004-01-30)

In your test1, I would really love to see the reverse order: read over the network and load locally… if you have a spare minute :oops:


Werner Daehn :de: (BOB member since 2004-12-17)

Test 3 - Target and DI on same server, different Source server

Source: MS SQL 2000

Source Server: Win2003 SP1 32-bit, (2) 3.4Ghz XEON (dualcore), 4Gb RAM, large RAID 10 array

DI Server: Win2003 r2 SP1 x64, (2) 2.4Ghz AMD CPUs (dualcore), 16GB RAM, small RAID 5 array

Network, source to DI: Gigabit ethernet with fiber backbone

DI Version: 11.7

Network, DI to Target: n/a

Target db: MS SQL 2005

Target server: (same as DI server above)

Results

<DF_Benchmark_read_MS>: 185 seconds
<DF_Benchmark_API_bulkloader_MS>: 177 seconds
<DF_Benchmark_regular_load_MS>: 497 seconds
<DF_Benchmark_single_thread_MS>: 22 seconds
<DF_Benchmark_lookup_DOP1_MS>: 54 seconds
<DF_Benchmark_lookup_DOP10_MS>: 24 seconds

Readingis quite a bit slower, but writing is quite a bit faster. It’s odd that the read_MS dataflow took longer, since all it does is read 200 rows. Well, maybe not, since the target disk subsystem is a lowly RAID 5 array.

Also note that while the “regular_load” DF was running (with 10 loaders), it completely brought the SQL server to its knees. There was substantial blocking as all of the loaders fought to do INSERTs to the same table. As noted in other posts, the problem seems to be worse in SQL 2005 for some reason.


dnewton :us: (BOB member since 2004-01-30)

Check the SQL that is pushed to the database: It is a 3-way cartesian product of the 200rows table. So you actually transfer 8Mio rows over the net and that was the goal, measure network throughput only.


Werner Daehn :de: (BOB member since 2004-12-17)

Here is my result at the client I’m working with. This test is against our development database environment. I will rerun the tests overnight against production. This test is across a total VMware environment however the images are on different host servers.

Source: Oracle 9i
Source Server: 2003SP1 32bit - 2 CPU /2.8Gb/2 Disk (VMWare Server 2CPU’s 12 GB Memory - Unsure of other VMWares sharing environment)

Network source to DI: Gigabit Ethernet
DI server: Windows 2003SP1 32bit - 2 CPU /3.6Gb/3Disk (VMWare Server 2CPU’s 12 GB Memory - Unsure of other VMWares sharing environment)
DI: 11.5.2
Network DI to Target: Gigabit Etehrnet
Target server: Same as Source
Target: Oracle 9i

<DF_Benchmark_read_MS>: 108 seconds
<DF_Benchmark_API_bulkloader_MS>: 329 seconds
<DF_Benchmark_regular_load_MS>: 509 seconds
<DF_Benchmark_single_thread_MS>: 30 seconds
<DF_Benchmark_lookup_DOP1_MS>: 92 seconds
<DF_Benchmark_lookup_DOP10_MS>: 68 seconds

Cheers

Glenn


GlennL :australia: (BOB member since 2005-12-29)

Source: SQL Server
Source Server: 2000 32bit - 4 CPU (2.2GHz) / 4Gb
Network source to DI: 100MB Ethernet

DI server: Windows 2003SP1 32bit - 2 CPU (3.2GHz) / 4Gb
DI: 11.0.1
Network DI to Target: 100MB Etehrnet
Target server: Same as Source
Target: SQLServer

<DF_Benchmark_read_MS>: 498 seconds
<DF_Benchmark_API_bulkloader_MS>: 546 seconds
<DF_Benchmark_regular_load_MS>: 1270 seconds
<DF_Benchmark_single_thread_MS>: 25 seconds
<DF_Benchmark_lookup_DOP1_MS>: 35 seconds
<DF_Benchmark_lookup_DOP10_MS>: 15 seconds


mctiger (BOB member since 2007-03-08)

Hi all,

Here are the results of my benchmark.


Test 01:
Source and target are on the same server
DI server is on another machine

Source: Oracle 9i
Network source to DI: 100MB Ethernet

DI server: Windows 2000SP4 32bit - 2 CPU (1.5GHz) / 4Gb RAM
DI: 6.5
Network DI to Target: 100MB Etehrnet

Target server: Same as Source
Target: Oracle 9i

<DF_Benchmark_read> 129 seconds
<DF_Benchmark_API_bulkloader> 936 seconds
<DF_Benchmark_regular_load> 984 seconds
<DF_Benchmark_single_thread> 73 seconds
<DF_Benchmark_lookup_DOP1> 161 seconds
<DF_Benchmark_lookup_DOP10> failed


Test 02: => same as test 01
Source and target are on the same server
DI server is on another machine

Source: Oracle 9i
Network source to DI: 100MB Ethernet

DI server: Windows 2000SP4 32bit - 2 CPU (1.5GHz) / 4Gb RAM
DI: 6.5
Network DI to Target: 100MB Etehrnet

Target server: Same as Source
Target: Oracle 9i

<DF_Benchmark_read> 1261 seconds
<DF_Benchmark_API_bulkloader> 1540 seconds
<DF_Benchmark_regular_load> 1398 seconds
<DF_Benchmark_single_thread> 72 seconds
<DF_Benchmark_lookup_DOP1> 185 seconds
<DF_Benchmark_lookup_DOP10> 99 seconds


Test 03:
Source, target and DI are on the same server

Source: Oracle 9i

DI server: Windows 2000SP4 32bit - 2 CPU (1.5GHz) / 4Gb RAM
DI: 6.5

Target server: Same as Source
Target: Oracle 9i

<DF_Benchmark_read> 147 seconds
<DF_Benchmark_API_bulkloader> 608 seconds
<DF_Benchmark_regular_load> 742 seconds
<DF_Benchmark_single_thread> 71 seconds
<DF_Benchmark_lookup_DOP1> 160 seconds
<DF_Benchmark_lookup_DOP10> 79 seconds


Test 04: => same as test 03
Source, target and DI are on the same server

Source: Oracle 9i

DI server: Windows 2000SP4 32bit - 2 CPU (1.5GHz) / 4Gb RAM
DI: 6.5

Target server: Same as Source
Target: Oracle 9i

<DF_Benchmark_read> 145 seconds
<DF_Benchmark_API_bulkloader> 1035 seconds
<DF_Benchmark_regular_load> 1108 seconds
<DF_Benchmark_single_thread> 71 seconds
<DF_Benchmark_lookup_DOP1> 159 seconds
<DF_Benchmark_lookup_DOP10> 80 seconds

I also tried on my development environment, but the second DF generated an error on my di server (out of memory)… :crazy_face:

Any comment would be greatly appreciated :yesnod:

I’m thinking that my DI server is not enough performant… any advice?

Regards.


cedrickb :fr: (BOB member since 2005-08-19)

None of the DFs consume any memory! So how is it possible one failed with out of memory???

What happenend in test1, the DOP10 case?

I’d say with your DI server everything is in order - relatively. The tests with the lookups are consistant, the single threaded as well. PArt of the reason you are not getting similar numbers as the others will be the slower memory access, part will be the DI 6.5 version.

Ther read over the netowrk in test1 in 129secs is excellent, the test2 took 10times longer??? Is it possible you are using network hubs but not switches and other applications used the bandwith at that time?

Can you disable the antivirus application when running test3&4 for a try?


Werner Daehn :de: (BOB member since 2004-12-17)

-> :crazy_face:
honnestly I don’t know why!!! and that point is rather strange…

-> nothing changed but the hours the job was launched

-> I’ll ask to the system guy, he’ll tell me tommorrow (is on day off today).

-> yes, I’ll run a new test with the antivirus disable. I’ll give you the results tomorrow

Regards


cedrickb :fr: (BOB member since 2005-08-19)

Hi all,

Here are the new results without antivirus:

Test :
Source, target and DI are on the same server

Source: Oracle 9i

DI server: Windows 2000SP4 32bit - 2 CPU (1.5GHz) / 4Gb RAM
DI: 6.5

Target server: Same as Source
Target: Oracle 9i

Results:
<DF_Benchmark_read> 121 seconds
<DF_Benchmark_API_bulkloader> 598 seconds
<DF_Benchmark_regular_load> 600 seconds
<DF_Benchmark_single_thread> 72 seconds
<DF_Benchmark_lookup_DOP1> 160 seconds
<DF_Benchmark_lookup_DOP10> 79 seconds

These last results seems to be correct!

For the network problems, they have some issue with their swtich, so that might be the reason why the test 02 took 10 times longer than test01!

Regards


cedrickb :fr: (BOB member since 2005-08-19)

Couple of things…

First, you should configure your virus scanner to exclude a couple of directories, the ones where Oracle does have logs files and database files and the DI\logs\ directory.

Second, I can’t believe Test1 result: <DF_Benchmark_read> 129 seconds. That’s better than what others get with GBit network cards. The idea of this test was to read a cartesian product of 3 times 200 rows = 8Mio rows, each row with about 700Bytes row length (We select 7 columns, each with 100chars filled). So that would be 5400MByte data. In 120secs = 44MByte/sec = 44*8 = 356MBit/sec. And this on a 100MBit line??? There is something wrong with that number…

Third, I don’t like the fact that API bulkloader is taking as long as the regular loader. I have two explanations, one would be that you are running the target database is archive log mode. If that’s the case, make sure all tables have the NOLOGGING attribute set if - and only if - you are loading them via the API bulkloader. “alter table xyz nologging”.

Forth, while executing your regular loads, I would collect CPU usage of the database server and the DI server. In case you figure that the database server does not require much CPU during those loads, why not installing DI one the database machine? You would avoid the network and DI would consume CPU that would have been idle.


Werner Daehn :de: (BOB member since 2004-12-17)

Hi,

I exlude the Oracle directories and the Di directories from the antivirus.

For the network results, in fact the network is Gigabite… sorry for the mistake.

For the API bulkloader, I don’t use it in the dataflow dedicated. I’m doing “regular load”, so that might explain the results. I didn’t use API Bulkloader, and I didn’t find information about “how to use it”… if you have one, can you give me a link?

And to finish, I have a server where I installed Oracle AND DI, the results are shown on my prevous post.

Regards


cedrickb :fr: (BOB member since 2005-08-19)

DI JOB SERVER:

Microsoft Windows server 2003 Standard edition SP1
AMD Opteron processor 875
2.20 GHz, 3.51 GB of RAM


DI DESIGNER:

Microsoft Windows XP PROFESSIONAL SP2
INTEL® CPU T4200 @ 1.83 GHZ 1.00 GB of RAM

SOURCE AND TARGET DATABASE: ORCALE 10g

SOURCE/TARGET DATABASE AND DI JOB SERVER ARE ON DIFFERENT MACHINE.

CONNECTED by Gig port SWITCH

<DF_Benchmark_read_MS>: 146 seconds
<DF_Benchmark_API_bulkloader_MS>: 269 seconds, WITHOUT PARTITION OPTION
<DF_Benchmark_API_bulkloader_MS>: 4520 seconds, WITH PARTITION OPTION
<DF_Benchmark_regular_load_MS>: 1606 seconds
<DF_Benchmark_single_thread_MS>: 43 seconds
<DF_Benchmark_lookup_DOP1_MS>: 91 seconds
<DF_Benchmark_lookup_DOP10_MS>: 88 seconds

Any suggestion…

Thanks,


data_guy :us: (BOB member since 2006-08-19)

Is there enough data yet to draw any conclusions?

Everyone else: C’mon, give it a try, it only takes maybe 10-15 minutes to do the test. :mrgreen:


dnewton :us: (BOB member since 2004-01-30)

@Data_Guy: Sorry, overlooked this one. Partitioned load into Oracle is slower than without the partitions??? I wonder if you really really really used API bulkloader when enabling the partitioned loader.

Reimport the table, check the loader settings and execute again. btw, the bug that you cannot set “enable partitions” when API bulkloader is turned on, I filed already. I set the flag first and then turn on bulkloader so that this field is greyed out but checked - that works for me at least. Please check if you have multiple lines saying “bulkloaderAPI” in the trace log.


Werner Daehn :de: (BOB member since 2004-12-17)

Conclusions…
[list]* GBit Ethernet is important (150secs vs. 90secs local).

  • 10Gbit Ethernet Cards will not help much as one reader thread will consume one CPU entirely then - so partitioned reader is required.
  • Loading, even with API bulkloader, is times slower than reading.
  • Yes, avoid loading via the network.
  • Loading gets faster with disk arrays - so I/O has to be the bottleneck for the DB.
  • DOP scales well with the number of CPUs - within natural borders obviously.
    [/list]

Too little data to understand
[list]* difference between 6.5 and 11.5. and 11.7

  • VMWare vs. native
  • Windows vs. Unix
  • 8CPU servers (jobs are not built to get better numbers for > 8CPU)[/list]

Werner Daehn :de: (BOB member since 2004-12-17)

yes Werner…
There was a problem with the partitioning thing.
As suggested by you, below is the new statistics:


DI JOB SERVER:

Microsoft Windows server 2003 Standard edition SP1
AMD Opteron processor 875
2.20 GHz, 3.51 GB of RAM


DI DESIGNER:

Microsoft Windows XP PROFESSIONAL SP2
INTEL® CPU T4200 @ 1.83 GHZ 1.00 GB of RAM

SOURCE AND TARGET DATABASE: ORCALE 10g

SOURCE/TARGET DATABASE AND DI JOB SERVER ARE ON DIFFERENT MACHINE CONNECTED by Gig port SWITCH

Process to execute data flow <DF_Benchmark_read> is completed
Time______________________________ 80 secounds

Process to execute data flow <DF_Benchmark_API_bulkloader> is completed.
Time______________________________ 752 secounds

Process to execute data flow <DF_Benchmark_regular_load> is completed.
Time______________________________ 1728 secounds

process to execute data flow <DF_Benchmark_single_thread> is completed.
Time______________________________ 40 secounds

Process to execute data flow <DF_Benchmark_lookup_DOP1> is completed.
Time______________________________ 66 secounds

Process to execute data flow <DF_Benchmark_lookup_DOP10> is completed.
Time______________________________ 56 secounds

Thanks,


data_guy :us: (BOB member since 2006-08-19)