I would be particularly interested to hear from people who have experience
in designing universes and building reports based upon the ‘star’ database
topology (of data warehousing / data mart fame)…
Being new to this database methodology, I would be interested to discuss
specific considerations when designing universes…
e.g. I have four fact tables that share the same dimension tables. I
resolve loops by creating a different alias of each dimension table for each
corresponding fact table (four separate star structures)…
Alternatively, I would be grateful if someone pointed me to info or archives
on ‘do’s and dont’s’ when designing with star schemas…
Thanks for your help
Ferd
Ferdose Ahmed
Logica UK Ltd
Station Approach
Leatherhead
Surrey KT22 7LG
Hi Ferd,
One method I frequently use with a star schema model is building universe
contexts for each “star” or fact table and associated dimension tables.
This removes the need of creating aliass over each dimension table. In
your example, there would be 4 contexts. Ensure the universe SQL parameter
Multiple SQL statements for each context is checked. When the user queries
information from different fact tables, BusinessObjects will automatically
split the query based on the “star” or context.
> We resolved loops by creating a context for each fact table.
> This keeps the universe structure far more simple and easy to
manage.
If your fact tables are at different grains, creating contexts and
having BusinessObjects create separate SQL also allows you to easily
combine facts from different fact tables.
I’m also for using the contexts instead of aliases in your setup. I’d just
like to point out to be careful with one particular dimension … “Time”.
“Time” usually appears in the fact table alongside the measures. It’s pretty
common to have stuff like InvoiceAmount, InvoiceQuantity, and InvoiceDate in
the fact tables. If your InvoiceDate (or derivates like “Month”, “Year”,
etc…) appear only in the fact table, all the benefits of “Multiple Select
Statements for Each Context” are pretty much lost.
Why?
You will need to create multiple “InvoiceDate” objects, one for each fact
table. Your users will feel the “cognitive pain” when trying to pick the
correct “InvoiceDate” object.
If your users pick measures from multiple fact tables, and any of the
“InvoiceDate” objects, BusinessObects will go orbital … not seriously
(actually by design), but it’s scaring enough for the users.
So, my solution would be to have a single “Time” dimension table, containing
all the possible invoice dates from all the fact tables, and joining it with
all the fact tables, and deriving my “InvoiceDate” object from the “Time”
dimension table, as with any other dimension.
Ok, hope I didn’t mess your head. This “Time” stuff is something that pops
up every now and then, I must admit I don’t have any better solution than
the one I gave.
Any suggestions, listers?
-Harri
Ps. Actually, having all the possible dates in the dimensions table,
regardless if there is a corresponding “InvoiceDate” in the fact table, and
then doing an outer join between “Time” and the fact table would be even
neater.
Pps. Ferd, if you are having trouble creating the contexts, just let BO do
it (“Detect Contexts”). It’s pretty smart with it, assuming you have your
cardinalities set.
-----Original Message-----
Behalf Of Ioannis Hasandras
Sent: 27. tammikuuta 1999 19:19
[snip]
One of your last statements was: “It’s pretty smart with it,
assuming you have your cardinalities set.”
Could you elaborate on how BO is using cardinalities to detect contexts?
Ionnais,
sorry, I cannot tell you about the context detection. It’s classified.
Not really. I just don’t know.
It just seems to be pretty smart with it. Except for that self-joins (joins
that affect only one table) are not included by default.
The Designer help documentation says:
‘Contexts can be used to resolve loops’
and
‘The purpose of an Alias is to resolve loops’.
When is it best to use what between the two?
Aliases: I would use aliases in “shared lookup table” cases. In star
schemas, this could be called “shared dimension table”.
For a classic example, assume you have a table called “Country”, which has
columns CountryID and CountryName. Now assume that in your fact table
“Flights” you have accounting data about company flight reservations. This
data includes OriginatingCountryID and DestinationCountryID, and a measure,
say “NumberOfFlights”.
Ok, you make a join
from Flights.OriginatingCountryID to Country.CountryID.
Then you make a join
from Flights.DestinationCountryID to Country.CountryID.
Now, this will cause some trouble. Why?
How would you define your “From Country” and “To Country” objects to be
used by end users? Both as “Country.Countryname”?
If you defined them as such, the resulting SQL would return only those
trips that are made within a country. Why?
-Because your join definition will produce following WHERE statements in
the SQL:
Flights.OriginatingCountryID=Country.CountryID
AND
Flights.DestinationCountryID=Country.CountryID
, meaning same as:
Flights.OriginatingCountryID=Flights.DestinationCountryID,
which is true only for in-country flights.
Resolution: Define aliases for the “Country” table, for example
“DestinationCountries” and “FromCountries”, and make joins from the fact
table columns to their respective dimension table columns.
Issues solved:
For object definition you can now derive these “semantically different”
objects from their respective tables.
For correct results from SQL , they now become:
Flights.OriginatingCountryID=FromCountry.CountryID
AND
Flights.DestinationCountryID=DestinationCountry.CountryID,
which isn’t the same as:
“Flights.OriginatingCountryID=Flights.DestinationCountryID”, that caused the
problem earlier.
damn … this is starting to look like a lecture… oh, well I’ll finish off
what I’ve started…
Contexts:
I would look at contexts when the loop appears in the middle of a longer
join path. You can see an example of this at the Beach.unv (Island Resorts
Marketing), where Reservations and Orders form a “mini-loop” in the
universe. Making an alias of a table in the middle of a join path doesn’t
solve your problem, it just moves the problem to the next step in the join
path. You would still have an loop, it would just be longer. To resolve loop
entirely, you would need to make an alias of each table in in the join path.
I would also look at contexts in the star schema thing, benefits as
discussed in the earlier posts. To think about this in the flight
information context, consider that you have two fact tables “AirlinePrice”
and “AirlineQuality”, and a bunch of dimension tables, one of them being
“Airline”. Now, you would just do the join based on “AirlineCode”. You would
derive the airline name only once from “Airline” table. It’s the same
airline we’re talking about all the way, there is no “DestinationAirline” or
“FromAirline”.
Ok,
summa sumarum (yes, it’s a lecture… ):
I would make an alias:
When the same dimension table contains “semantically different”
information, i.e. County table is used to get the names of borh destination
countries and originating countries.
When I’m in a hurry and need a quick-and-dirty -solution.
I would make an context:
When the loop appears in the middle of a join path.
When the same same dimension table contains “sematically same”
information. As the airline information, just one airline per flight. If
there were codes for “Airline” and “ConnectingAirline”, then I would do an
alias, again.
Thanks Harri that was a great “lecture” on the use of a context or an alias.
In fact your example fit my Travel System Universe very well. I do use an
alias for the origin/destination city too.
We take context one step further. I have two star schemas that are joined.
One schema deals with the detail about a ticket, the other deals with travel
expense and their relation to a customer. Because both use many of the same
tables we decided to join them. However, because the level of granularity in
each schema is not the same we use context to prevent a user from creating
joins they should not. Each context includes the joins that are specific to
that context as well as the joins that can be used by either context.
It really works well.
We take context one step further. I have two star schemas that are joined.
One schema deals with the detail about a ticket, the other deals with travel
expense and their relation to a customer. Because both use many of the same
tables we decided to join them. However, because the level of granularity in
each schema is not the same we use context to prevent a user from creating
joins they should not. Each context includes the joins that are specific to
that context as well as the joins that can be used by either context.
It really works well.
And, to mention it again: Designer automatically can detect these
sub-star(snowflake) schemes if you set the cardinalities of your joins correctly.
There are two drawbacks:
The first one is that self-joins (i.e. joins with constant values, like
"Country.country=‘US’ ") will NOT be included in the detected contexts and thus
will be missing in the generated SQL, so you have to include them manually.
The seond is: if you already have defined shortcut-joins, they too, will not
be included in the detected contexts, unless you re-set them to “normal” joins,
then detect context and set them again to short-cut.
Walter
DI Walter Muellner
Delphi Software GmbH, Vivenotgasse 48, A-1120 Vienna / Austria
Tel: +43-1-8151456-12, Fax: +43-1-8151456-21
e-mail: w.muellner@delphi.at, WEB: http://www.delphi.at