TYPE 2 without surrogate

Hello All:
Is there a way to implement SCD type2 with out surrogate key and use a combination of say ID and END_DATE as the key? say for example a table DEPT with columns:
DEPTID
DEPTNAME
BEGIN_DATE
END_DATE

and for the first time we get an entry something like

 
DEPTID   DEPTNAME    BEGIN_DATE    END_DATE
10           ENGLISH       1/1/1000          1/1/2020

but today if we receive a new row for DEPTID = 10 with DEPTNAME = 'BIOLOGY

The table compare correctly says it is an update and history preserve creates two rows one for update and one for insert… but I believe the update statement should be something like

but I believe DI is submitting a query something like

which does not update any row and when it tries to insert a row it fails with unique constraint…

Any thoughts on this one??

Thanks for all your inputs…


broadcast (BOB member since 2004-11-18)

If end_date is column that you want to update. y is that it is a part of your key columns, and y not begin date.

The TC will tell you an update and you might not need a history preserving. just put a map operation to make it from update to normal to do the SCD type 2.and you can update the previous record with sysdate before doing this operation by using a transactional flow.


dragonwhiz :cn: (BOB member since 2009-09-25)

I have had problems with Type 2 SCDs that do not have surrogate keys. It seems to me that the default way that Data Services implements history preserving functionality will not work for tables without surrogate keys - unless I’ve been doing something wrong which is always a distinct possibility. To get around this I have found a workable method but I think it may be a little time/resource consuming and could be improved on.

I create such DataFlows as follows:

  • Source querys etc…

  • Sort on Business Key and Effective Date

  • Table Comparison 1 (Sorted Input) - Compare against the full target table and use the Business Key and the Effective Date as the key columns.

  • Map operation - turn inserts back to normal and discard anything else. I find that trying to update existing records causes problems. Also if you have “late arriving records” this can also be a problem so before my sort I do a lookup to see if my incoming record falls within the dates of a non-current record and if it does I get rid of it.

  • Table Comparison 2 (Sorted Input) - Compare against a view of the current rows of the target table (ie trunc(sysdate) between EFFECTIVE_FROM_DATE and EFFECTIVE_TO_DATE)

  • History Preserve

  • Target Table

In your example you will see the following:

After initial Load
DEPTID DEPTNAME BEGIN_DATE END_DATE
10 ENGLISH 01/01/1000 31/12/9000

After second load
DEPTID DEPTNAME BEGIN_DATE END_DATE
10 ENGLISH 01/01/1000 28/01/2010
10 BIOLOGY 29/01/2010 31/12/9000

If on the third load you get record effective on 1/1/2010 then it would be ignored unless you truncated your target table and did a full reload. It would then be included correctly thus:

DEPTID DEPTNAME BEGIN_DATE END_DATE
10 ENGLISH 01/01/1000 31/12/2009
10 MATHEMATICS 01/01/2010 28/01/2010
10 BIOLOGY 29/01/2010 31/12/9000

I have used this in several places and it works for my needs but as I say I think it could use some tweaking to improve performance, especially if dealing with large volumes of data.


moldy72 (BOB member since 2009-03-17)