Hello All:
Is there a way to implement SCD type2 with out surrogate key and use a combination of say ID and END_DATE as the key? say for example a table DEPT with columns:
DEPTID
DEPTNAME
BEGIN_DATE
END_DATE
and for the first time we get an entry something like
DEPTID DEPTNAME BEGIN_DATE END_DATE
10 ENGLISH 1/1/1000 1/1/2020
but today if we receive a new row for DEPTID = 10 with DEPTNAME = 'BIOLOGY
The table compare correctly says it is an update and history preserve creates two rows one for update and one for insert… but I believe the update statement should be something like
but I believe DI is submitting a query something like
which does not update any row and when it tries to insert a row it fails with unique constraint…
If end_date is column that you want to update. y is that it is a part of your key columns, and y not begin date.
The TC will tell you an update and you might not need a history preserving. just put a map operation to make it from update to normal to do the SCD type 2.and you can update the previous record with sysdate before doing this operation by using a transactional flow.
I have had problems with Type 2 SCDs that do not have surrogate keys. It seems to me that the default way that Data Services implements history preserving functionality will not work for tables without surrogate keys - unless I’ve been doing something wrong which is always a distinct possibility. To get around this I have found a workable method but I think it may be a little time/resource consuming and could be improved on.
I create such DataFlows as follows:
Source querys etc…
Sort on Business Key and Effective Date
Table Comparison 1 (Sorted Input) - Compare against the full target table and use the Business Key and the Effective Date as the key columns.
Map operation - turn inserts back to normal and discard anything else. I find that trying to update existing records causes problems. Also if you have “late arriving records” this can also be a problem so before my sort I do a lookup to see if my incoming record falls within the dates of a non-current record and if it does I get rid of it.
Table Comparison 2 (Sorted Input) - Compare against a view of the current rows of the target table (ie trunc(sysdate) between EFFECTIVE_FROM_DATE and EFFECTIVE_TO_DATE)
History Preserve
Target Table
In your example you will see the following:
After initial Load
DEPTID DEPTNAME BEGIN_DATE END_DATE
10 ENGLISH 01/01/1000 31/12/9000
After second load
DEPTID DEPTNAME BEGIN_DATE END_DATE
10 ENGLISH 01/01/1000 28/01/2010
10 BIOLOGY 29/01/2010 31/12/9000
If on the third load you get record effective on 1/1/2010 then it would be ignored unless you truncated your target table and did a full reload. It would then be included correctly thus:
I have used this in several places and it works for my needs but as I say I think it could use some tweaking to improve performance, especially if dealing with large volumes of data.