Outer joins

Hi all,

Is it right that if you have no outer joins between tables A, B and C say (in a line with A being top of the hierarchy in your database) then if you display information from a report that is in tables A and B and sum up a column in C using a measure to display it at the level of the B table then it will return no rows at all if there are no rows to sum up in table C? Anyone know a way round this if this is right?

This is causing confusion with some of our users because they can’t understand why they should need to have rows in table C to get back a B hierarchy level report.

What ideas do people use on when to use outer joins in the universe. Obviously on the ends where reference data may not be recorded but what about between main database tables? In a C level report we would not really want to return rows from A and B where no rows in C exist so we don’t use outer joins between A, B and C.

If outer joins are used on all joins, is there any cost to report optimisation or any database issues I need to be aware of?

Louise Priest
Fraser Williams Pharma Systems (UK)
lpriest@pharma.fraser-williams.com


Listserv Archives (BOB member since 2002-06-25)

Is it right that if you have no outer joins between tables A, B and C say (in a line with A being top of the hierarchy in your database) then if you display information from a report that is in tables A and B and sum up a column in C using a measure to display it at the level of the B table then it will return no rows at all if there are no rows to sum up in table C?

Yes.

Anyone know a way round this if this is right?

Not with a relational database. A relational database doesn’t really understand the concept of a “hierarchy”. A relational database isn’t a hierarchical database. You can MIMIC a hierarchy in a relational database, but it isn’t really a hierarchy.

This is causing confusion with some of our users because they can’t understand why they should need to have rows in table C to get back a B hierarchy level report.

Because you have defined the join between B and C to be MANDATORY, not optional. If you are querying tables B and C, for every row in B there MUST be a row in C; and for every row in C there MUST be a row in B.

What ideas do people use on when to use outer joins in the universe. Obviously on the ends where reference data may not be recorded but what about between main database tables?

If you have properly and fully defined the cardinalities in the joins between your tables, then deciding whether to make a join outer or not should not be a problem. The key is looking at the minimum cardinality, which is not usually what we look at when we consider our cardinalities. We typically say things like, “These two tables have a one-to-many relationship.” But that’s an expression of the maximum cardinalities between the two tables, not the minimum. The true, full expression would be something like, “Table A has a (zero-or-one)-to-(one-or-many) relationship with table B.”

You have two tables, B and C, with a maximum cardinality of one-to-many, i.e., one B record may have many corresponding C records. First consider: Must every record in B have at least one corresponding record in C as a rule? If so, then the cardinality becomes one-to-(one-or-many). If not, then the cardinality becomes one-to-(zero-or-many). Assume it is the latter, one-to-(zero-or-many). Now consider: Must every record in C have a corresponding record in B as a rule? If so, then the cardinality becomes (one-and-only-one)-to-(zero-or-many). If not, then the cardinality becomes (zero-or-one)-to-(zero-or-many).

There’s your 2-minute course in defining cardinalities in relational databases!

It should be obvious now, that once your cardinalities are fully defined, deciding whether to make a join inner or outer is trivial. In our case, assume the full cardinality is (one-and-only-one)-to-(zero-or-many). In other words, every B record has zero or more corresponding C records; while every C record must have one and only one corresponding B records. If the two tables are joined on the field F1, the SQL would look like this (in Oracle):

B.F1 = C.F1 (+)

Keep in mind, though, that if your cardinality becomes something like (zero-or-one)-to-(zero-or-many), you have a problem. No relational database that I’m aware of will let you define a join like:

B.F1 (+) = C.F1 (+)

If you really have two tables with that cardinality, introduce a third table to act as an intermediary:

B.F1 (+) = Z.F1 AND
C.F1 (+) = Z.F1

Where Z.F1 contains all possible values of B.F1 and C.F1 (SELECT B.F1 UNION SELECT C.F1).

In a C level report we would not really want to return rows from A and B where no rows in C exist so we don’t use outer joins between A, B and C.

Well, then you’re stuck. Keep in mind that a “C-level” report includes a “B-level” report, since C is a lower level of the hierarchy, and you said before that you DID want to make C optional for your B-level reports.

I shouldn’t say you’re stuck. You can always create pre-defined conditions such as “C.F1 is not null” to effectively force the outer join back to an inner join.

If outer joins are used on all joins, is there any cost to report optimisation or any database issues I need to be aware of?

Well, it may be different for other RDBMS’s, but for Oracle there really isn’t much of a cost to outer joins. The key is to have proper indexing.


Erich Hurst
Compaq Computer Corporation


Listserv Archives (BOB member since 2002-06-25)

Louise,

Your understanding is correct about inner-joins, ie, no rows are returned from A/B when no rows exist on C if A-B-C are related via an inner or equi join. Using an outer-join at the database level will preserve all the rows from one table. In your case you could outer-join tables B and C preserving rows in B when no match in C exists. When doing so, however, when you have columns from C that you are agregating, then you have to handle the NULL values from column C that will get returned when no rows in table C exists; there are DBMS functions that help to do this (eg, Sybase function IsNull() and Oracle function NVL(); use them to return 0 when the column’s value is NULL).

I wouldn’t recommend using ourter joins for all joins in your universe for the simple reason that the results from outer joins are not the same as the results from inner-joins since you always get ALL the rows from one of the tables; that is usually not the behavior you always want.

What we recommend to users that need this capability is handle the joins at the report level by the data provider linking option under Data => View menu (Definition tab) when multiple data providers exist. Linking data providers essentially performs a full outer join between two result sets. This will allow you to have an equi-join Universe without having to support all possible outer-join senarios that might exist and then have to potentially create contexts to handle loops. So in your case, you would have one data provider relating A-B via an inner-join, and a second data provider selecting from C the columns that relate to B and the columns you wish to sum. Be sure to select the NVL(columnname,0) value of the columns you wish to sum. Then link the data provider containing table C columns by its foreign key fields to the data provider containing tables A/B.

Refer to the thread entitled ‘Outer joins in synchronised data providers’ which concisely describes BO behavior with synchronized or linked data providers.

Hopefully that helps.

Donald May
MIS Eagle Services
Pratt & Whitney
Ph: (860) 565-5253 Fax: (860) 565-4347 M/S: 117-35
E-mail: maydp@pweh.com


Listserv Archives (BOB member since 2002-06-25)

Is it right that if you have no outer joins between tables A, B and C say (in a line with A being top of the hierarchy in your database) then if
you
display information from a report that is in tables A and B and sum up a column in C using a measure to display it at the level of the B table
then
it will return no rows at all if there are no rows to sum up in table C?

Yes.

I have a (one-and-only-one)-to-(zero-or-many) situation between B and C with the join if on column F1 set as B.F1 = C.F1, ie. not outer at present.

In a C level report we would not really want to return rows from A and B where no rows in C exist so we don’t use outer joins between A, B and C.

Well, then you’re stuck. Keep in mind that a “C-level” report includes a “B-level” report, since C is a lower level of the hierarchy, and you said before that you DID want to make C optional for your B-level reports.

I shouldn’t say you’re stuck. You can always create pre-defined conditions such as “C.F1 is not null” to effectively force the outer join back to an inner join.

If users insist that they have to see all records on a B level report which includes summing of records in C even if there are no records but don’t want to see B rows on a C level report if there is no rows in C then I am going to have to use outer joins everywhere and then add conditions as necessary to reports to stop blank type rows being displayed. This is a bit of a pain really but if there is no other way I will have to do it like this.

Well, it may be different for other RDBMS’s, but for Oracle there really isn’t much of a cost to outer joins. The key is to have proper indexing.

What does proper indexing entail? I’m not sure what we have actually, but could ask a DBA if necessary or look at our database creation scripts. Sorry, I should have said we are using 4.1.3 and oracle 7.3.

Louise Priest
Fraser Williams Pharma Systems (UK)
lpriest@pharma.fraser-williams.com


Listserv Archives (BOB member since 2002-06-25)

I wouldn’t recommend using ourter joins for all joins in your universe for the simple reason that the results from outer joins are not the same as the results from inner-joins since you always get ALL the rows from one of the tables; that is usually not the behavior you always want.

That is very much my feeling as well. Users do not understand that changing all the joins to fix what they think is a fault with one particular report will affect every other report as well.

What we recommend to users that need this capability is handle the joins at the report level by the data provider linking option under Data => View
menu
(Definition tab) when multiple data providers exist. Linking data
providers
essentially performs a full outer join between two result sets. This will allow you to have an equi-join Universe without having to support all possible outer-join senarios that might exist and then have to potentially create contexts to handle loops. So in your case, you would have one data provider relating A-B via an inner-join, and a second data provider selecting from C the columns that relate to B and the columns you wish to sum. Be sure to select the NVL(columnname,0) value of the columns you wish to sum. Then link the data provider containing table C columns by its foreign key fields to the data provider containing tables A/B.

This sounds like a better idea except I seem to have problems linking data providers like that. I typically have a section of level A type detail in my report which in turn contains a section of level B detail which in turn contains a section of level C detail. I always seem to have to include all details in the section A query in all data providers used in the report else other tables in the main section A part of the report seem to return errors (multivalue I think). Am I doing something wrong? We do not have many reports with multiple data providers so I am not entirely too familiar with them.

Refer to the thread entitled ‘Outer joins in synchronised data providers’ which concisely describes BO behavior with synchronized or linked data providers.

I will definitely have a look at this and see if it increases my understanding.

Louise Priest
Fraser Williams Pharma Systems (UK)
lpriest@pharma.fraser-williams.com


Listserv Archives (BOB member since 2002-06-25)

If users insist that they have to see all records on a B level report which includes summing of records in C even if there are no records but don’t want to see B rows on a C level report if there is no rows in C then I am going to have to use outer joins everywhere and then add conditions as necessary to reports to stop blank type rows being displayed. This is a bit of a pain really but if there is no other way I will have to do it like this.

I haven’t tried this, but you may be able to resolve this by aliasing C.

Keep the mandatory join on C, and use an outer join on alias of C.

Steven Jones
Consultant at BT Office Products International sjones@btopi.com


Listserv Archives (BOB member since 2002-06-25)

Your understanding is correct about inner-joins, ie, no rows are returned from A/B when no rows exist on C if A-B-C are related via an inner or equi join. Using an outer-join at the database level will preserve all the
rows
from one table. In your case you could outer-join tables B and C
preserving
rows in B when no match in C exists. When doing so, however, when you
have
columns from C that you are agregating, then you have to handle the NULL values from column C that will get returned when no rows in table C
exists;
there are DBMS functions that help to do this (eg, Sybase function
IsNull()
and Oracle function NVL(); use them to return 0 when the column’s value is NULL).

Be careful here. You don’t ALWAYS want to translate a null to a zero. Doing so would change how aggregate functions work, both at the database level and in BusinessObjects.

SUM() is not much of a change. If all of the values being summed are null, then SUM() returns a null. Otherwise, SUM() will add up the non-null values. For example, the sum of (1,2,3,null) is 6, just as you would expect.

AVG() and COUNT(), though, will be significantly affected. The count of (1,2,3,null) is 3, while the count of (1,2,3,0) is 4. For the same reason, the average of (1,2,3,null) is 2 (6 divided by 3), but the average of (1,2,3,0) is 1.5 (6 divided by 4).


Erich Hurst
IQDC Systems Analyst
Compaq Computer Corporation


Listserv Archives (BOB member since 2002-06-25)

Well, it may be different for other RDBMS’s, but for Oracle there really isn’t much of a cost to outer joins. The key is to have proper indexing.

What does proper indexing entail?

Oh, all I meant was that the key to report performance is indexing, not the outer/inner question. You should have the same indexes whether using outer or inner joins; the outer vs. inner question does not affect your indexing decisions.


Erich Hurst
Compaq Computer Corporation


Listserv Archives (BOB member since 2002-06-25)

Be careful here. You don’t ALWAYS want to translate a null to a
zero.
Doing so would change how aggregate functions work, both at the
database
level and in BusinessObjects.

This is true. Thanks for the further clarification Erich.

The issue is whether or not you wish to have, in this case, tuples of A-B-C accounted for although there was no C contribution for a given tuple; that will vary by context.

Donald May
MIS Eagle Services
Pratt & Whitney
Ph: (860) 565-5253 Fax: (860) 565-4347 M/S: 117-35
E-mail: maydp@pweh.com


Listserv Archives (BOB member since 2002-06-25)