Bulk Load in MS SQL Server vs Oracle

system · June 9, 2009, 10:00pm

When using the Bulk Loader feature in a dataflow going against an Oracle table I see the following:

 (11.7) 06-02-09 18:46:10 (27105:0006) BLKLDAPI: Starting Bulk Loader API in <Dataflow df_LU_TF_U_CS_CHANGRP_TAB; 4096>.

 (11.7) 06-02-09 18:46:10 (27105:0006) BLKLDAPI: Bulk Loader API in <Dataflow df_LU_TF_U_CS_CHANGRP_TAB>  completed successfully.

When using Bulk Loader against a MS SQL Server table I don’t see any message indicating that bulk load is being used. However, when I turn on low level tracing I do see bulk load messages.

Is this just an API thing? We spun our wheels for a while thinking that bulk load wasn’t working. We spent time trying to figure out what was wrong with our dataflow and/or table that was preventing bulk load from working.

eganjp (BOB member since 2007-09-12)

system · June 11, 2009, 12:52am

Note the conditions, per MS, on getting ideal bulk load performance:

Db should be in bulk-logged recovery mode (DI does not switch it for you)
Even so, if the table isn’t already empty, most of the time, the load will be fully logged
Incoming data should be in the order of the clustered PK index, if one exists
Ideally you should use the TABLOCK locking hint, which DI does not do (as far as I’m aware)
If you have several indexes or more, consider dropping them if the incoming data exceeds ~25% of the volume of the existing rows, and add them back later.

Also, constraints in the db are automatically ignored and not enforced, but this sets them to a “suspect” state once the load is done. (Or some other term, I forget what it is.)

So SQL Server bulk loading seems rather picky and labor intensive, and it’s not clear that all of the conditions are met for optimal performance, when DI is using it.

dnewton (BOB member since 2004-01-30)

system · June 11, 2009, 1:02am

We have bulk load working against SQL Server, but it’s spotty. Pulling in columns that DI sees as LONG data types - even though we convert the data type to something like VARCHAR(255) - seems to prevent the bulk load feature from being used.

This was using DI 11.7.3.

eganjp (BOB member since 2007-09-12)

system · August 15, 2009, 8:31pm

New minimally-logged options exist in SQL 2008:

http://www.sqlmag.com/Articles/Print.cfm?ArticleID=102462

(Account required)

I wonder if DS 12.x supports them?

dnewton (BOB member since 2004-01-30)

system · April 11, 2012, 4:28pm

" Db should be in bulk-logged recovery mode (DI does not switch it for you) "

Au Contraire.

In DS 4.0 it (unfortunately) DOES switch it for you.

To see this… create a simple source > target mapping dataflow. Start by using Bulk Load. Run the job and then look in your EventLog. You will see the Recovery Model change message.

Now turn OFF Bulk Load and rerun the job. You will not see the EventLog message.

This needs to be an OPTION for us to check on the Bulk Load screen… or at least give us a parameter to tweak!

fblau (BOB member since 2005-02-07)

system · April 18, 2012, 4:59pm

But is it the SQL Server API that is doing this or is it DS? If you can find that statement in the trace log (when doing a full SQL trace on the job) then it is DS. But if it doesn’t show up in the log then it’s the API and you can’t do anything about it.

For example…
The Oracle Bulk Loader API disables indexes under the covers. It took me a while to figure this out. DS does perform a check to see if there are unusable indexes after the load is complete and that query shows up in the trace log.

eganjp (BOB member since 2007-09-12)