Hi,
I have a type 1 oracle target, and my DF looks like this
Delta–>TC–>KeyGen–>MapOper–>Trgt.
Now the problem is, in my target table we have 35Ms of records and in the delta 5k records daily. I have created the DF and in the test run, the jobs starts and never ends.
I have tried all 3 options in TC, nothing worked, what would be the best option to select in TC to compare the target?
Or am I doing in the wrong way?
I don’t use TC for huge datasets, rather I would outer join the target table with source table, and fetch the tgt primary/surrogate key column. Based on the Tgt key I will determine whether a record to insert/update.
pseudo logic
Drag the tables (src n tgt)
Outher join those tables in Qry step and fetch required column from source and tgt surrogate key column from tgt table
Add a case to to route the records,
For insert
If tgt.surrogate_key is null then key_gen and insert into table
else
If tgt.surrogate_key is not null then update.
BTW, nothing is wrong in your design as far as my knowledge is concern. Check the exact bottle neck in which instance job is taking time…
Check whether you have any proper indexes or not?
I am bouncing 5,000 records an hour to a table with over 5 million records, I make sure to base the compare on the RecID, and two other key columns (not set to pk but known distinct record columns).
A couple of my co-workers even take the MapOp to another level, splitting it into two:
For all the people doing the map operations: Why? Is it to turn UPDATES into INSERTS (for a slowly-changing-dimension kind of scenario?) The original poster said it was a Type 1 SCD, which means no history preservation, just updates. A map operation should be unnecessary.
Adding 5000 rows to a 5M row table, in my experience, should take far less than an hour, even with a Table Compare and row-by-row select, it should be more like 5 minutes. If it’s slow, the most likely problem is insufficient indexes on the target to support the table compare lookup.
Outer joining the target in with the source can be a good approach, particularly if the source and target are in the same datastore, or a linked datastore, so that DI can push down the SQL. Otherwise, it’s less compelling (and not really much faster than a table compare).
When you say check the indexes, I have one more question.
Let’s go through the example
I have a target table contains 5Ms records, created indexes on the compare columns and applying table comaprison(TC) on it. The TC detected 1Mlln inserts or updates. How does it work now? does it take less time to do table comparison and more time to do inserts and updates than usual, because of the indexes on the key columns?
Meaning: For whatever columns are in the lower-left pane of the Table Compare, these are the columns that will be used in a row-by-row select to retrieve against the target.
So for TARGET_TABLE_A, if you had COL1 and COL2 as the logical key for the lookup (in the lower left), DI will issue SQL for the table compare row-by-row:
select * from TARGET_TABLE_A
where COL1 = 'BLAH'
and COL2 = 'BLAHBLAH'
To make this work fast in the database, you should have a composite index on:
As always, with indexes, the answer is “it depends”. Generally, given the choice of a full-table scan (which is what you’d get with no indexes), or the additional overhead of one more index…
Generally adding the one index to support the table compare, improves the performance far beyond the extra overhead for its maintenance by the db.
Try adding a script to disable the index before the load and re-enable afterwards, then see what works best. Like a lot of things, it’s trial and error, to some extent, and all part of performance testing.