Is it always necessary to have one context per fact table?

system · June 25, 2004, 1:28am

If there is a flight transportation data mart, and you have a fact table for the itinerary, another fact table for the trip, and a third for the individual flight segment, must you create 3 separate contexts (I’ve read all about the 1 context per fact table rule)?
These 3 fact tables could probably have all been part of one large fact table, but then the measures for the parent levels (itinerary, trip) would be repeated, and not additive at the lowest level (segment). The thing is, each has different dimensionality.
[list]Itinerary Fact => Frequent Flyer Key, Travel Agency, Airline Office, Booking Date, Itinerary ID (degenerate dimension, key for this fact table)
Trip Fact => Itinerary ID (degenerate dimension - foreign key to Itinerary Fact), Trip ID (degenerate dim, key for this fact table), Location Group, Trip Start Date, Trip End Date
Segment Fact => Itinerary ID, Trip ID, Location Group - all degenerate dimensions plus the Location Group, here to avoid a join if possible, Segment Start Date, Segment End Date[/list]
Do I really need 3 contexts in this case - or can I avoid this golden rule?
Thanks
Nile

Nile (BOB member since 2004-02-12)

system · June 25, 2004, 2:27am

You don’t have to follow the rule if the data permits it. If you have situations where getting data from all fact tables would cause data to be dropped, then you’ll have to worry.

You can use outer joins to reduce the need for contexts.

If you are not going to use contexts, you should also uncheck the "Multiple SQL Statements for each Measure option. Otherwise you’ll get poor-man’s contexts.

Steve Krandel (BOB member since 2002-06-25)

system · June 25, 2004, 2:33pm

Steve, can you please discuss this in further detail.

Many thanks,
Kendy

Kendy (BOB member since 2003-08-27)

system · June 25, 2004, 2:38pm

Nile,

A better solution in this case, IMPO, is not to drop your contexts, thus also dropping all their great advantages and help, but rather to adapt them to your situation.

You correctly linked these fact tables together, since as you point out they could be combined into one giant fact table instead of being exploded out. So now you can add these joins to the 2 contexts for the fact tables they connect, …et voila! (We also join some of our fact tables like this.)[/u]

JP Brooks (BOB member since 2002-10-22)

system · June 25, 2004, 3:25pm

No one like contexts more that I do. So, I hesitate to do this.

But… It is possible (depending on your database) to use outer joins to solve the problem. Consider Island Resorts. If I have a customer with history, but with no future reservations I either need 2 queries (to prevent data from being dropped) or I need to outer join to each are (history and future reservations). I would still need to resolve my loops using aliases.

Don’t do it this way. Use contexts. Your users will be MUCH happier.

Steve Krandel (BOB member since 2002-06-25)

system · June 25, 2004, 10:06pm

Well, actually I do need to create a couple of contexts because in the universe, there are those 3 fact tables, and then there is another fact table - so I needed 2 contexts for those.
The reason why I was avoiding contexts in the 3 fact tables that are related is that I don’t see why there would be a loss of data in this case. Seems more like a fan trap, which would inflate the results and therefore I have the choice of either creating an alias for the fact table and then creating a context or to use "Poor man’s contexts (or in my case, poor woman’s).
I know which joins to put in for the 3 fact tables vs the 1 fact table, but I’m not so sure about the joins for the other 3 contexts in relation to the context for the 1 fact table that would have to be created. Which joins in which context do you see in this picture? I left the dimension tables out.
Also, won’t adding contexts affect performance of the query?
http://home.comcast.net/~biexperts/FactTablesImage.html
Thanks
Nile

Nile (BOB member since 2004-02-12)

system · June 28, 2004, 1:02pm

Contexts per se do not affect query performance, as they are merely a map telling BusObj what the correct SQL path is, out of multiple choices.

JP Brooks (BOB member since 2002-10-22)

system · June 28, 2004, 1:18pm

Nile,

Each fact table having different dimensions is another reason to have more than 1 context, because using only 1 context will often return incorrect results. Let’s say a user wants to see the Booking Date and Segment Start Date, along with measures from only one of the fact tables. Writing this query as only one SELECT statement can return incorrect results, based on your design; it needs to be broken into separate queries, which separate contexts will automatically do for the user.

You should test this out with different queries/data, so you can see for yourself and not just take our word for it.

You lost me. But why don’t you first test what I suggested so you can get a better understanding of the need for contexts.

JP Brooks (BOB member since 2002-10-22)

MarkP · June 28, 2004, 4:28pm

As previously mentioned, the granularity of the fact tables must be the same to even consider the approach that Steve doesn’t like to consider Think about the scenario of two data providers, one with month, the other with year and the problems associated with combining the two. The user could be innocently multiplying figures up and believing (fairly so) that they are correct, unless they really know their data.

system · June 28, 2004, 11:42pm

I’m in the process of running several tests to see where the need for contexts come in. I had previously aliased the Date dimension for each of the 5 scenarios to take care of that potential issue. So far, I haven’t seen any incorrect numbers yet, but I’ll keep trying.

Thanks
Nile

Nile (BOB member since 2004-02-12)

system · June 30, 2004, 11:04pm

I have added contexts but am getting synchronized queries (between the context for the Personnel fact table and the other contexts - referring to the previous image) , as as Cindy Clayton has written previously, “It is very rarely ever acceptable to have synchronized queries”.
So, can someone please tell me -

Am I just better off combining the 3 fact tables into 1 even though I would have a bunch of redundant data repeating and have to treat the nonadditive measures in a special way in my indexed view
Is it true that 1 context per fact table means the context = [list]the dimension table joins to the fact table plus
1 join to the next fact table [/list](or should I be leaving that join out?)
Thanks
Nile

Nile (BOB member since 2004-02-12)

system · June 30, 2004, 11:06pm

No, you should only have: the dimension table joins to the fact table

Andreas (BOB member since 2002-06-20)

system · June 30, 2004, 11:28pm

Ok - I took out that join, and now I have 3 synchonized queries instead of 2, and the data is all lumped together - it doesn’t understand the hierarchy of Itinerary<=Trip<=Segment at all!

Nile (BOB member since 2004-02-12)

system · July 1, 2004, 5:25am

According to JP Brooks’s earlier reply “You correctly linked these fact tables together,…So now you can add these joins to the 2 contexts for the fact tables they connect” - so I’m thinking it’s the fact table to fact table join being implied here.
But traditionally and according to Andreas, yes, you should only have the dimension joins and not that extra join.
But either way it all becomes one big mess (Synchronization - which by the way is a problem because I need to use WebI reports, and in 1 case I would have to link the dimensions) all because there is another fact table for Personnel that needs to be linked in.

It’s a Many to One to Many to Many case

Help please?
Is there an easy solution that I am completely overlooking?

Thanks
Nile

Nile (BOB member since 2004-02-12)

system · July 1, 2004, 11:51am

A) You really should revisit your data model

B) The problem you must look out for is a join path starting with any fact table and having 1: many (or even worse a many:many) join down the line, for example (looking at join path from FactTable 1 to FactTable 2):

FactTable 1 >-- LookUpDimension –< FactTable 2

In this example you ran into a classical chasm trap, which must be resolved by using contexts.

You need to know the cardinalities of all joins beforehand (not just guessing).

Andreas (BOB member since 2002-06-20)

system · July 1, 2004, 1:34pm

Don’t give up the ship, Nile, you can figure it out! And as G.K. Chesterton reminded us, “An inconvenience is merely an adventure wrongly considered, and an adventure is merely an inconvenience rightly considered.” You’re merely in the middle of an intellectual adventure. (BTW, let me urge you to read Ralph Kimball’s The Data Warehouse Toolkit if you haven’t already; it is a tremendous help. Not to mention anything by Chesterton!)

“Redundant data = wasted space” does not compute in a data warehouse: disk space is cheap, far cheaper than the users’ time wasted waiting for a query to return their data. A star schema explicitly creates tons of redundant data for exactly this reason. And even tables with many millions of rows are not a problem as long as the DB is organized and optimized correctly and the server is powerful enough for the DB size.
Non- or semi-additive measures do have to be handled with care, but Designer gives a lot of options to do this. Or you can create aggregate tables, where the ones with a lower grain repeat all the data in the higher-level ones, and then add their lower-level details. (Again, redundant data cannot be allowed to be an issue here.)

Yes, I meant your join between the 2 fact tables. Andreas is correct that normally this should not be done, but in your case where your fact tables are actually/also 1 exploded fact table, joining them is another way of expressing their relatedness. But then you still have to watch out for possible multiplication of returned rows when querying between them, due to the fan trap created by joining them.

Hmmm, this is making me think… (“Amazing!” ) I’m basing my statements on what I have learned over the past months from our (imperfect) star schema where we also have an exploded fact table that we have joined. One problem with our schema is that all higher-level measures and dates are not duplicated on the lower-level tables, a decision made before my time, precisely to keep from duplicating data, and thus save space. (I have suggested several times that we copy these fields to the other tables, but so far without success.) Fortunately we don’t normally have a problem with this fan trap returning multiple rows, but only because we very rarely query the lowest-level measures; we normally only query the same-level measures on either fact table. But we do often run into queries between these fact tables taking much longer than we or our users want; this is caused by that space-saving decision to not copy to all relevant fields to all relevant fact tables.

HTH. Cheers!

JP Brooks (BOB member since 2002-10-22)

system · July 1, 2004, 3:45pm

This is an interesting theoretical and practical puzzle to me: the idea of directly joining fact tables that are directly related. I only discovered this concept due to our pre-existing star schema, as I mentioned.

Initially I completedly rejected joining these tables (Shipments and Orders in our case) as being ideologically incorrect , but since I couldn’t change reality , I had to deal with it. As I asked questions and thought about this join, I came to the realization that 2 such facts could also potentially be expressed in 1 table, as Nile pointed out, or if the grain is different, as aggregate tables. Then after working with many queries from these tables I realized that our schema can actually work (although as I mentioned above, one must be very careful of the fan trap.)

So although I think that ours is not an ideal design, given the caveat that we rarely query on the lower level measures and thus don’t experience the results of this fan trap, I now see no major problems with joining these tables, since I now see both the theoretical rationale and practical sense of it. Maybe this is an instance of the old principle: once one knows and accepts the rules and understands the reasons behind them, then and only then is one able to know when an exception to those rules is valid.

But I would be very interested in hearing others’ comments and experience with such contexts/joins, and especially a book or article that discusses them; I would like to check my personal experience and theorizing with someone who is an expert in these matters. Maybe a seminar such as Dimensional Modelling Beyond the Basics at the Data Warehousing Institute conference addresses this issue.

Cheers!

JP Brooks (BOB member since 2002-10-22)

system · July 2, 2004, 12:13am

Thanks JP for your encouraging posting on Bob.

Kimball’s book, yes it is very informative. There is something in Chapter 11 on the transportation model. However, he doesn’t discuss different level facts in the model, only higher level dates being added.

There are only 2 options when there are facts of differing granularities, allocation to the lowest level, or creation of another fact table. Since the 1st isn’t an option, I did the second - thus causing my current problem. It’s not really an aggregate table, it’s a fact table at a higher level - although I guess I could sum up the lower level measures and stick them in there.

So what are the options when you have multiple fact tables?
I guess if you have the same exact dimensionality (repeat all dimensionality), then if you have separate contexts for each fact table they will not synchronize, hopefully anyway.

If you do not have the same dimensionality,
1)you need that join between the fact tables or
2)to create separate data providers and link them (which then rules out report creation in WebI).
3)create a view with all the information necessary for the report and use that in the universe

I need to use WebI so 2 is out. I’m still exploring option 3, but it seems to imply my model is not perfect then , and option 1 causes a fan trap for the 3 linked fact tables.
A fan trap can be solved with an alias + contexts or alias + aggregate awareness - but then the 1st fact table (Personnel) has an issue with synchonization.

So it seems to me, I need to redesign the Personnel to be part of the Itinerary - and allow for that major increase in width. Then I need to solve the Itinerary=>Trip=>Segment using alias + contexts or aggregate awareness. I hope. Any ideas on which performs better alias + contexts or alias + aggregate awareness in this case?

And maybe I should aggregate the lower level measures and stick them in each higher level table. Maybe performance would be better - although maintenance may get kinda tricky. Any ideas?

Thanks,
Nile

Nile (BOB member since 2004-02-12)

system · July 2, 2004, 7:19pm

Hello Nile,

This is getting somewhat convoluted, with many different criss-crossing possibilities and ideas; I’ll try to keep from getting lost in the labyrinth of mirrors. 8)

This is exactly why we use full-client for report creation, and WebI only for viewing/distribution: since so many of our reports have multiple DPs. Why can’t you do this? If it were me, I would seriously re-examine this decision in light of the problems you have discovered and of all the reports/universe changes you will have to make in the future. Changing this one decision would quickly and easily solve significant problems for you, apparently.

Isn’t Personnel just a normal Dimension table, like Customers? Any measures about them, if any, in this schema I would think would really be measures of the “fact”, Trip. But if it is a fact table, then yes, a view (or materialized view with Oracle) can be a good/necessary solution, when truly needed. (You definitely want to avoid making it of the entire report, though, if at all possible. This is to start down the slippery slope of creating a separate new view for each new report that looks like it might take a little time to fit into the existing universe design. You eventually end up not only with a universe design that is a long way from your reporting needs, but dozens of views that have to be individually maintained. Then you eventually have to completely redo the universe anyway! )

I think I would combine Itinerary and Trip, since their grains are the same. Yes, it makes a bigger table with more empty fields on the Itinerary records, but to paraphrase Saint Ralph: “so what?”; You just need a flag column to allow for easy differentiation between Trips and Itinieraries; and multi-million record, wide fact tables are no big deal with modern servers and DBs.

Cheers!

JP Brooks (BOB member since 2002-10-22)

system · July 2, 2004, 7:59pm

Solve it, yes. But is it an option, unfortunately no, WebI created reports is the only option and although multiple DP is supposed to be in version 6.5, I don’t thinking linking them is part of that release.

Personnel is a fact table (time allowances, some bit flags, etc.) and is different for each member of the crew, but the maximum crew size is 6, so I’m thinking of increasing the width of the Itinerary table by 6 fields, plus measures (4) for each (so 4 measures * 6 crew members = 24 measures) = a total of 30 fields - I don’t like this, but it may be the only solution. It’s Itinerary, not Trip because the time is summarized across all Trips on the Itinerary.

Unfortunately again, the grain is not the same between Itinerary and Trip, an Itinerary has 1 or more trips (no maximum, at least imposed in the database). The idea of mixing grains in a fact table has long been considered a taboo but maybe it’s my only choice at this point, and I’m actually thinking of combining Trip with Segment since they are the largest, furthest downstream and most of the time there could be 1 segment per trip.

Thanks for letting me share my nightmare

Nile (BOB member since 2004-02-12)