I have a scenario where the geography dimension value(primary key)
changes due to a new system being used. Its not SCD actually but the DWH is designed for it.
For eg: if I have a geography id A in the current world, I will be having geography id B in the future world but would want to report all data based on B (as A and B are the same, just because the new system chose to call it B)
Current scenario
Geography Sales Jan Feb Mar
A 10 11 12
Future scenario when from April Geography id changes to B
Geography Sales Jan Feb Mar April
B 13
In my report going forward I would want
Geography Sales Jan Feb Mar April
B 10 11 12 13
I wanted to know if a mapping table can be used in BO to solve the issue?
Being a new geography id, a new record gets inserted into the dimension table. There is a current record flag to identify the latest record. But in this case there is no evidence to say geography A and B are the same.
Hence the confusion, should we add anything in the load process?
Or can we manage this in reporting itself?Can we have a mapping table in the database which I can use in the universe(where we define a=b). I will have to hit the fact multiple times though with separate queries and then join up at report level. WOuld this be ideal?
Let’s talk about African countries.
Take Sudan as an example.
Before 2011
Country Name: Sudan
Country Area Rank: 1
After 2011
Country Name: Sudan
Country Area Rank: 3
After South Sudan was created, Sudan went from the largest to the third largest country in Africa by area but it is still Sudan.
What you would have in data warehousing in a SCD to reflect this is a surrogate key. This surrogate key is an integer and the end user will never care what it is.
You’d have pre-2011
Country Key: 34
Country Name: Sudan
Country Area Rank: 1
After 2011
Country Key: 87
Country Name: Sudan
Country Area Rank: 3
There is no significance to those numbers but you would also have an effective start and end date for that row.
Have a look through this page to see the different ways of implementing them:
Essentially you want to match the different facts over time against the correct version of Sudan (such as area and population drop to reflect the emergence of South Sudan as a country) while all the end user cares about is the population trend of Sudan steadily rising, a massive drop in 2011 and then continuing to rise.
But I think I didnt make my question clear enough.
With the example you gave,
Year Country SK Country Rank Country ID
2011 34 1 SU
2012 87 3 SSU
and there is Effective from, Effective to dates etc to track the period the record was active.
Please note the new record (country id) is SSU now and the report should look at the filter for SSU and bring back data for SU and SSU.
We havent defined anywhere that SU and SSU are literally the same.
I understand that if the attributes of your main key/natural key (Country ID) changes, it will be tracked (in my case - as an SCD 2). But here the main key itself is changing.
Hope what I am talking makes sense?
:?
just wanted add that the reason the ID is changed is because the source is changing. There is no way the new source is going to have the same id as that of the old source. we tried that option.
– Create a VIEW and group such records…
– It is better to take care of this issue at the database level by adding a new column that will group such countries in future…
– Use this VIEW in creating filters and joining tables to get your reports.
– You will manually group the countries in your VIEW code
For example, Sudan A and Sudan B
SK PARENT_GROUP ACTUAL_COUNTRY START_DATE END_DATE
34 Sudan A 1-1-1900 12-31-2011
87 Sudan B 1-1-2012 12-31-2999
In that case, you would probably need to build a bridging table to serve as a translation table (using the same principle as the recent topic posted on bridging tables).