Problems with performing a Type 2 SCD in DI

Hi All,

I know its a fairly easy question, but is there anyway of implementing a Type 2 SCD given the following (made up) dimension table structure.

Surrogate_Key
Natural_Key
Description
Price

I want to insert a record in to the dimension table for each new record that doesn’t already exist in the dimension table, and insert a record when the Description/Price changes for records in the dimension table with the same Natural_Key.

I.E.

Dimension Table (Surrogate_Key, Natural_Key,Desc,Price)
1,1,‘Blue Jeans’,£25
2,2,‘Red Shirt’,£15 <- current version
3,3,‘Greenish Socks’,£2
4,1,‘Blue Jeans’,£27 <- current version
5,3,‘Greenish Socks’,£3 <- current version

Source - To be checked/loaded
(Natural_Key, Desc, Price) - surrogate key is only in the dimension
1,‘Blue Jeans’,£27
2,‘Red Shirt’,£15
3,‘Green Socks’,£2 <- changed
4,‘Black Hat’,£7 <- new

What I want to go in to my dimension table is:
(Surrogate_Key, Natural_Key,Desc,Price)
6,3,‘Green Socks’,£2
7,4,‘Black Hat’,£7

So I end up with:
(Surrogate_Key, Natural_Key,Desc,Price)
1,1,‘Blue Jeans’,£25
2,2,‘Red Shirt’,£15 <- Current Version
3,3,‘Greenish Socks’,£2
4,1,‘Blue Jeans’,£27 <- Current Verison
5,3,‘Greenish Socks’,£3
6,3,‘Green Socks’,£2 <- Current Version
7,4,‘Black Hat’,£7 <- Current Version

What usually happens is that the ‘Blue Jean’ source record gets inserted again in to the dimension table (because £27 - Source is not the same as £25 ONE of the dimension record values)

I don’t have any current_flag type columns, or effective_date columns.

When comparing the Desc/Price I always want to compare the Natural_Key in the source with the record in the dimension table with the same Natural_Key and the greatest (max) Surrogate_Key - Thats how I find the current version in the dimension.

I can’t seem to find a Source–>TC–>HP/MO–>KG–>Target combination that works…I end up with multiple records going in to my dimension table, as the TC compares all previous occurances of my Natural_Key and not just the ones with the greatest Surrogate_Key.

I hope this makes sense !!!

Thanks
Graham


GSmith :uk: (BOB member since 2006-10-25)

Remove the Map_Operation - normally not required.
In the TC, do you have the generated_key field filled with the surrogate key? That’s causing the TC to look for the highest key instead of anyone.


Werner Daehn :de: (BOB member since 2004-12-17)

Thanks wdaehn for you reply.

As suggested I removed the MO, so was left with the following DF

Source -> Query -> TC -> HP -> KG -> Target

I ensured that the Natural_Key was in the Input Primary Key Columns list and the Generated Key field in the TC was populated with the Surrogate_Key ( as I only ever want to check the LATEST Surrogate_Key per Natural_Key) and that my compare columns were in the HP transform and I all worked how I wanted it to.

I never realised that putting the Surrogate_Key in the Generated Key Field would only match the lastst records in the compare table. I do now though !!

Thanks again.
GS


GSmith :uk: (BOB member since 2006-10-25)