Connecting Fact Tables

system · May 31, 2011, 6:27am

Hi,

I was trying to find out the reason of why Fact tables should not connected in Universe. Can someone confirm whether I am correct or not:

Fact Tables represent information on different grain, so if a Fact Table joins with another one, they both represent information in same level (grain). If they both are at different grain, we may not get useful results.
Fact tables are very huge in size, so this will be terrible in terms of performance.
Usually, to avoid multiple Universes, we join all Fact tables via confirmed dimension.

As I understand, if I take one Context/Fact Table approach, I can query multiple Fact tables with dimensions? (This will use multiple contexts). But, by this method, will I be able to generate a single table with data from different fact tables along with dimensions from dimension tables?

Smith85 (BOB member since 2009-12-10)

system · June 1, 2011, 5:09pm

All depends on the design/model.

Report performance is determined by many other factors not just the size of the fact tables, but obviously ther bigger the fact table, the more performance concern you may have

Yea many times

katullus (BOB member since 2009-08-21)

system · June 1, 2011, 9:50pm

It completely depends on the business model (which will translate into a data model) and your requirements.

Personally, I am allergic to brush stroke statements like “never join fact tables directly” or “always use surrogate keys” etc. They usually come from “textbook experts” - people who read a few books but have very little actual experience.

In banking EDW’s, I have joined fact tables on serviced loans with the fact table(s) associated with loan applications. In manufacturing, I have joined production and shipment fact tables - it just depends on how the data model works and the requirments behind that particular Universe.

One thing I do not recommend is to create the “Universe from hell” by stuffing everything into a single, complex, snowflaked and heavily bloated universe … which might be “easy” to maintain but could be so complex that end users cannot use it with confidence - this would completely defeat the purpose of self service BI.

Yes there are cases where super users might want to “free flow” across all the data in the EDW for discovery, data mining etc but that’s not reporting - other tools may be more effective here. The output of which could then lead to structured information requirements which can be more easily modelled into a BO Universe.

ErikR (BOB member since 2007-01-10)

system · June 2, 2011, 5:29am

Great inputs katullus and ErikR

I have always thought that the Fact tables being joined together should have exactly the same grain of data, for a perfect 1-1 mapping.

A lot of databases I have seen are not actually based on Dimensional Model, more as of OLTP tables; there is hardly a difference between a dimension and a fact table. We just join tables based upon business logic (and of course, considering cardinality and grain of tables).

Smith85 (BOB member since 2009-12-10)

system · June 2, 2011, 7:57am

Well you of course have to keep record grain into account to avoid fact blow-out when aggregating.

What is a dimension and what is a fact - there are lots of thoughts about that. Sure, if you read Kimball’s books or attend his dimensional modelling course, things are very clear cut - sometimes way too clear cut or simple for my taste as in my experience requirements and processes are never that clear or simple.

At a higher level, dimensions and facts are just data elements that each have a part in business processes throughout the enterprise. Some may be tied to a specific process whereas other elements are shared across multiple processes.

When modelling business processes, nobody talks about “facts” or “dimensions” - and one person’s dimension, is the other person’s “fact”. Customer is just another dimension for a finance model - a controller will be more interested in the financial transactions than the customer details. But for someone in marketing, all the details of that customer are really important and all interactions with the company are core facts - with the actual sales transaction just one of many interesting facts.

ErikR (BOB member since 2007-01-10)

MarkP · June 2, 2011, 8:13am

Smith85:

Hi,

I was trying to find out the reason of why Fact tables should not connected in Universe. Can someone confirm whether I am correct or not:

Fact Tables represent information on different grain, so if a Fact Table joins with another one, they both represent information in same level (grain). If they both are at different grain, we may not get useful results.

Fact tables are very huge in size, so this will be terrible in terms of performance.

Usually, to avoid multiple Universes, we join all Fact tables via confirmed dimension.

As I understand, if I take one Context/Fact Table approach, I can query multiple Fact tables with dimensions? (This will use multiple contexts). But, by this method, will I be able to generate a single table with data from different fact tables along with dimensions from dimension tables?

If you have an imbalance of cardinalities between two tables containing measures, you will create a fan trap. This leads to incorrect results.

It’s simply not best practice to join fact tables. Joining really big table to really big table isn’t a good idea.

If you have conformed dimensions that relate to both fact tables, then your SQL will be generated with two statements that join. You will be able to use all objects in the same table without problems. If you use an object that only belongs in one context, then you will get a synchronisation of your sql statements and you must know how to use the “odd” object.

system · June 2, 2011, 9:12am

Thank you Mark for your inputs. Just to make sure I understand this completely:

So, this means that I can use the common dimension along with the measures from the different fact tables (cross-context operation) to get results in a single table?

I am not sure if I understood this point

Supposing a Customer Dimension table (D1) is joined with Fact Table (F1) along with other Dimension tables in Context (C1) and is not included in Context (C2) when the tables are joined with Fact Table (F2) implying that D1 is not part of C2, will I get a synchronisation if I take objects from D1, C1 and C2?

Its interesting to know what essentially happens during synchronisation .

My apologies for such a trivial question, I haven’t done cross context operations in Universe.

Smith85 (BOB member since 2009-12-10)

MarkP · June 2, 2011, 4:17pm

Yes. Say you have a Showroom dimension and you have a rental fact table and a sales fact table. You could drag objects Showroom Name, Sales Value and Rental Value into your result objects area. With a context for each fact table, you would see two SQL Statements generated:
1/ Rental context

select D_Showroom.Showroom_Name, sum(F_Rentals.Rental_Value)
from D_Showroom join F_Rentals on D_Showroom.ShowroomID = F_Rentals.FK_ShowroomID
group by D_Showroom.Showroom_Name

2/ Sales context

select D_Showroom.Showroom_Name, sum(F_Sales.Sale_Value)
from D_Showroom join F_Sales on D_Showroom.ShowroomID = F_Sales.FK_ShowroomID
group by D_Showroom.Showroom_Name

You will need to amend your universe parameters to allow the selection of multiple contexts though (it’s a simple parameter setting, you’ll find it )

I am not sure if I understood this point

Supposing a Customer Dimension table (D1) is joined with Fact Table (F1) along with other Dimension tables in Context (C1) and is not included in Context (C2) when the tables are joined with Fact Table (F2) implying that D1 is not part of C2, will I get a synchronisation if I take objects from D1, C1 and C2?

Its interesting to know what essentially happens during synchronisation .

My apologies for such a trivial question, I haven’t done cross context operations in Universe.[/quote]

Take my first example. If F_Rentals contained return date, you wouldn’t be able to use it with Sales Value because it would be out of context. Similarly your queries would produce synchronisation because they have been aggregated at different levels of granularity - one just by showroom, the other by showroom and return date.

If you need it clarifying any further, please shout.

system · June 2, 2011, 5:51pm

Thank you Mark, that was a great explanation.

As I understand, I would end up with different SQL Queries for the fact tables (with Multiple SQL settings on), and in Synchronization, I would actually get multiple tables.

I suspect Synchronization can be little dangerous, if the joins are not handled carefully (in right contexts) we may get incorrect results.

Let me play with my test Universe for a while, probably I can learn a lot by trying different combinations (and can come back to this thread with my observations/doubts, if any)

Thank you for your time Mark, appreciate your replies.

Smith85 (BOB member since 2009-12-10)

MarkP · June 3, 2011, 7:47am

No problem. Synchronisation occurs when you select non-conformed dimensions, simple as that.

system · January 30, 2014, 11:12am

You are absolutely right about the use of facts and dimension tables in the practical environment. Although, the connection between facts is condemned, how do you avoid the Loop Join and/or the ambiguous query issue between them when they are also connected via conformed dimension tables, when fulfilling a reporting requirement?

Please elaborate more on the scenario and the model design.

y_allawala (BOB member since 2014-01-30)

MarkP · January 30, 2014, 11:36am

No, he’s absolutely wrong. There’s not a single instance where I would join fact tables.

You resolve loops with contexts.

system · January 31, 2014, 9:42am

[Moderator Note: Moved from General Discussion to Semantic Layer / Universe Designer]

Marek Chladny (BOB member since 2003-11-27)

system · January 31, 2014, 10:43am

Chiming in: I would for example join the fact tables ORDER HEADER with ORDER LINE ITEM or the facts INCIDENTS with INCIDENT LOGS. The resulting fan traps must be resolved then… Of course.

Andreas (BOB member since 2002-06-20)

system · January 31, 2014, 11:23am

Depends on the reporting requirements, of course, but when dealing with the Orders and Order Lines example, I have created Orders and Order Lines as seperate contexts.
This seemed to work well for me.

In fact we had a big discussion in the past on this very topic:-

https://bobj-board.org/t/166900

Mak 1 (BOB member since 2005-01-06)

MarkP · February 3, 2014, 5:04pm

Yes, so long as they are cascading, but only to create and resolve a fan trap, not just because they have columns that match.