Dealing with slowly changing dimensions in reporting.

system · January 13, 2014, 3:28pm

Hi

I have a scenario where the geography dimension value(primary key)
changes due to a new system being used. Its not SCD actually but the DWH is designed for it.

For eg: if I have a geography id A in the current world, I will be having geography id B in the future world but would want to report all data based on B (as A and B are the same, just because the new system chose to call it B)

Current scenario
Geography Sales Jan Feb Mar
A 10 11 12

Future scenario when from April Geography id changes to B
Geography Sales Jan Feb Mar April
B 13

In my report going forward I would want

Geography Sales Jan Feb Mar April
B 10 11 12 13

I wanted to know if a mapping table can be used in BO to solve the issue?

botech (BOB member since 2006-08-10)

MarkP · January 13, 2014, 3:47pm

If you’re doing proper SCDs you’d have a surrogate key join between your fact and dimension table. What does your database structure look like?

system · January 13, 2014, 3:57pm

Thanks for your reply Mark.

Being a new geography id, a new record gets inserted into the dimension table. There is a current record flag to identify the latest record. But in this case there is no evidence to say geography A and B are the same.

Hence the confusion, should we add anything in the load process?

Or can we manage this in reporting itself?Can we have a mapping table in the database which I can use in the universe(where we define a=b). I will have to hit the fact multiple times though with separate queries and then join up at report level. WOuld this be ideal?

botech (BOB member since 2006-08-10)

MarkP · January 13, 2014, 6:36pm

Let’s talk about African countries.
Take Sudan as an example.

Before 2011
Country Name: Sudan
Country Area Rank: 1
After 2011
Country Name: Sudan
Country Area Rank: 3

After South Sudan was created, Sudan went from the largest to the third largest country in Africa by area but it is still Sudan.

What you would have in data warehousing in a SCD to reflect this is a surrogate key. This surrogate key is an integer and the end user will never care what it is.
You’d have pre-2011
Country Key: 34
Country Name: Sudan
Country Area Rank: 1
After 2011
Country Key: 87
Country Name: Sudan
Country Area Rank: 3

There is no significance to those numbers but you would also have an effective start and end date for that row.

Have a look through this page to see the different ways of implementing them:

Essentially you want to match the different facts over time against the correct version of Sudan (such as area and population drop to reflect the emergence of South Sudan as a country) while all the end user cares about is the population trend of Sudan steadily rising, a massive drop in 2011 and then continuing to rise.

system · January 13, 2014, 7:04pm

Thanks Mark, this was very detailed and helpful.

But I think I didnt make my question clear enough.

With the example you gave,
Year Country SK Country Rank Country ID
2011 34 1 SU
2012 87 3 SSU

and there is Effective from, Effective to dates etc to track the period the record was active.

Please note the new record (country id) is SSU now and the report should look at the filter for SSU and bring back data for SU and SSU.

We havent defined anywhere that SU and SSU are literally the same.

I understand that if the attributes of your main key/natural key (Country ID) changes, it will be tracked (in my case - as an SCD 2). But here the main key itself is changing.

Hope what I am talking makes sense?

:?

just wanted add that the reason the ID is changed is because the source is changing. There is no way the new source is going to have the same id as that of the old source. we tried that option.

botech (BOB member since 2006-08-10)

system · January 13, 2014, 8:07pm

– Create a VIEW and group such records…
– It is better to take care of this issue at the database level by adding a new column that will group such countries in future…
– Use this VIEW in creating filters and joining tables to get your reports.
– You will manually group the countries in your VIEW code

For example, Sudan A and Sudan B

SK   PARENT_GROUP                  ACTUAL_COUNTRY             START_DATE       END_DATE
 34       Sudan                         A                      1-1-1900       12-31-2011
 87       Sudan                         B                      1-1-2012       12-31-2999

Hope that helps.

BO_Chief (BOB member since 2004-06-06)

MarkP · January 14, 2014, 9:16am

botech:

Thanks Mark, this was very detailed and helpful.

But I think I didnt make my question clear enough.

With the example you gave,
Year Country SK Country Rank Country ID
2011 34 1 SU
2012 87 3 SSU

and there is Effective from, Effective to dates etc to track the period the record was active.

Please note the new record (country id) is SSU now and the report should look at the filter for SSU and bring back data for SU and SSU.

We havent defined anywhere that SU and SSU are literally the same.

I understand that if the attributes of your main key/natural key (Country ID) changes, it will be tracked (in my case - as an SCD 2). But here the main key itself is changing.

Hope what I am talking makes sense?

:?

just wanted add that the reason the ID is changed is because the source is changing. There is no way the new source is going to have the same id as that of the old source. we tried that option.

In that case, you would probably need to build a bridging table to serve as a translation table (using the same principle as the recent topic posted on bridging tables).

system · January 15, 2014, 9:30pm

Thank you Mark and BO chief.

botech (BOB member since 2006-08-10)