When using the Bulk Loader feature in a dataflow going against an Oracle table I see the following:
(11.7) 06-02-09 18:46:10 (27105:0006) BLKLDAPI: Starting Bulk Loader API in <Dataflow df_LU_TF_U_CS_CHANGRP_TAB; 4096>.
(11.7) 06-02-09 18:46:10 (27105:0006) BLKLDAPI: Bulk Loader API in <Dataflow df_LU_TF_U_CS_CHANGRP_TAB> completed successfully.
When using Bulk Loader against a MS SQL Server table I don’t see any message indicating that bulk load is being used. However, when I turn on low level tracing I do see bulk load messages.
Is this just an API thing? We spun our wheels for a while thinking that bulk load wasn’t working. We spent time trying to figure out what was wrong with our dataflow and/or table that was preventing bulk load from working.
Note the conditions, per MS, on getting ideal bulk load performance:
Db should be in bulk-logged recovery mode (DI does not switch it for you)
Even so, if the table isn’t already empty, most of the time, the load will be fully logged
Incoming data should be in the order of the clustered PK index, if one exists
Ideally you should use the TABLOCK locking hint, which DI does not do (as far as I’m aware)
If you have several indexes or more, consider dropping them if the incoming data exceeds ~25% of the volume of the existing rows, and add them back later.
Also, constraints in the db are automatically ignored and not enforced, but this sets them to a “suspect” state once the load is done. (Or some other term, I forget what it is.)
So SQL Server bulk loading seems rather picky and labor intensive, and it’s not clear that all of the conditions are met for optimal performance, when DI is using it.
We have bulk load working against SQL Server, but it’s spotty. Pulling in columns that DI sees as LONG data types - even though we convert the data type to something like VARCHAR(255) - seems to prevent the bulk load feature from being used.
" Db should be in bulk-logged recovery mode (DI does not switch it for you) "
Au Contraire.
In DS 4.0 it (unfortunately) DOES switch it for you.
To see this… create a simple source > target mapping dataflow. Start by using Bulk Load. Run the job and then look in your EventLog. You will see the Recovery Model change message.
Now turn OFF Bulk Load and rerun the job. You will not see the EventLog message.
This needs to be an OPTION for us to check on the Bulk Load screen… or at least give us a parameter to tweak!
But is it the SQL Server API that is doing this or is it DS? If you can find that statement in the trace log (when doing a full SQL trace on the job) then it is DS. But if it doesn’t show up in the log then it’s the API and you can’t do anything about it.
For example…
The Oracle Bulk Loader API disables indexes under the covers. It took me a while to figure this out. DS does perform a check to see if there are unusable indexes after the load is complete and that query shows up in the trace log.