Outer Joins in Synchronised data providers (long reply)

The original poster sent a question about writing a query from two universes, and linking the two queries in BusinessObjects…

In a message dated 98-08-05 06:53:44 EDT, you write:

Possibly missing something simple -

I have data coming from two universes, say Col A from Universe A and Col B from Universe B.

The objects are linked Col A to Col B.

The default action of synchronisation is for a combined result set to show only those records where the values are in Both A and B.

Part of this is true; the linking is done between the two columns. But, it is not limited to those rows with values from both queries (i.e. an Outer Join). Not only is it an outer join, it should be a full outer join. More details below.

What I need is for an outer join so that the result set is all values in Col A and those in B where they match.

ie
Col A Col B
A A
B B
C
D D
E
F F
G G

I can’t see an obvious way to do this especially since it is going between two universe so sub-queries etc can’t work.

This should already be happening!

When you combine two queries in BusObj you will see all rows of data, even if there is missing data from A or B! In fact, I have spoken with several people that have complained about this. (solution: filter out all of the #EMPTY rows). The only way that BusObj knows how to link data from two different data providers is a full outer join (an outer join that goes in two different directions). In other words,

DP1 DP2
A A found a match
B no row from universe 1
C no row from universe 2
D no row from universe 1
E no row from universe 2
F F found a match
G G found a match

Be aware, however, that if your data looks like the example above when you get to the actual report you will see instead:

DP1 DP2
A A
B B
C C
D D
E E
F F
G G

That’s because when you link a column, BusObj duplicates the data whenever a linked column is found missing in another data provider. So, you may think that you are not seeing an outer join when in fact you are. Check the actual data for the two different data providers by building a block that shows only data from one data provider or the other. That’s the only way that you can verify if you have missing data from the linked columns.

Perhaps synchronisation isn’t the right way to go about it.

I’m concerned about your use of the word “synchronization”. If you have an improperly defined universe, then BusinessObjects will sometimes run two separate pieces of SQL and “synchronize” the result sets for a single query. Very often this gives you the wrong answer. In your original question you specifically mentioned two universes, so I am assuming two totally separate data providers that are linked via the Data Manager screen.

Regards

Jonathan

Project Leader
Global Medical, Regulatory and Product Strategy (GMRPS) IS

n a message dated 98-08-05 10:14:59 EDT, PIMorris writes:

I’ve had this problem myself very recently. If your data sources are all
ODBC (i.e. not Oracle SQL*Net or similar) then you could use Microsoft Access, with linked tables to provide a ‘virtual’ database containing the tables you require. You can then do an actual outer join between them.

This is also true. But, as a matter of fact, Access can attach Oracle tables via SQL*Net as well. But the bottom line is that a trick like this is not required to have an outer join when linking two data providers. It already works that way.

Perhaps an ‘outer’ type of synchronisation is required… and Business Objects folks listening?

As I said earlier in this post, this outer join linking is already happening. That is the default behavior of the product. Try this to verify for yourself:

Using two different universes…
Query 1
Build a query from Island Resorts with the following result objects: Resort
Service Line
Revenue
(should be 9 rows of data)

Query 2
Build a second query from Evaluation Kit with: Resort
Department
Number of Guests
(should be 3 rows of data)

Link Query 1 to Query 2…
Link Resort from Query 1 to Resort from Query 2. Link Service Line from Query 1 to Department from Query 2.

Now build a table with Resort, Service Line, Revenue, and Number of Guests. Notice the outer join from Query 1 to Query 2, where Number of Guests is displayed only for the Accomodation lines? No data is displayed for the other rows for number of guests, yet the rows are still available in the report.

Now, Edit Query 1 and place a condition on Service Line for “Different From ‘Accomodation’” and run the query. Notice that Accomodation still shows up? The outer join from Query 2 to Query 1 forces that entry into the block so that the Number of Guests can still be displayed. Notice also that the Revenue data only exists for the rows that came back from the first query now.

If enough people are interested, I will post this report to the mailing list.

I mentioned this once before (and should keep my mouth shut until it is actually ready!) but I am working on a white paper titled, “The Mystery of Multiple Data Providers - Explained!” which will be available on our web site. At this point I hope to have it finished and posted on our site before the user conference in Orlando. The paper explains this outer join behavior, the reason why dimensions, details, and measures behave differently, and other topics along these lines. The ability to leverage data from multiple data sources is an excellent improvement in the 4.x product over prior versions, and really understanding how to use that feature will enhance your reports!

Regards,
Dave Rathbun
Integra Solutions
www.islink.com See you in Orlando in '98!


Listserv Archives (BOB member since 2002-06-25)

David,

Thanks for the very thorough and correct explanation of how the multiple data providers work.

It just proves that I needed to look at the combined data more closely as certainly the items that I assumed were missing (hence not outer-joined) were in fact present BUT, as you say, BusObj duplicates the data whenever a linked column is found missing in another data provider.

Now, although your example shows the behaviour perfectly with the ‘missing’ rows of data for the measures, I still don’t really have a way of getting the results that I need displayed in the report. This really comes back to the ‘duplication of data’ issue.

Lets look at the problem again (using your example tables)

What we expect to get…
DP1 DP2
A A found a match
B no row from universe 1
C no row from universe 2
D no row from universe 1
E no row from universe 2
F F found a match
G G found a match

What we get…

DP1 DP2
A A
B B
C C
D D
E E
F F
G G

What I need is to find the values from DP2 where there isn’t a match in DP1 (or vice-versa) ie I expect to get a listing like :-

DP1 DP2
B
D

You suggest that the solution is as simple as placing a filter on the relevant column and filter out all the #EMPTY rows. The problem is that the ‘false’ duplicates also appear in my list to filter (ie B and D appear in the list of values for DP1)! So, unless I am missing something obvious, I can’t see how to get the above listing.

p.s If I still can’t sort this out I might be able to justify a trip to Orlando to speak to you!!

Regards

Jonathan

Project Leader
Global Medical, Regulatory and Product Strategy (GMRPS) IS


Listserv Archives (BOB member since 2002-06-25)

In a message dated 98-08-06 05:49:14 EDT, you write:

David,

Thanks for the very thorough and correct explanation of how the multiple data providers work.

You are welcome!

<>

What I need is to find the values from DP2 where there isn’t a match in DP1 (or vice-versa) ie I expect to get a listing like :-

DP1 DP2
B
D

You suggest that the solution is as simple as placing a filter on the relevant column and filter out all the #EMPTY rows. The problem is that the ‘false’ duplicates also appear in my list to filter (ie B and D appear in the list of values for DP1)! So, unless I am missing something obvious, I can’t see how to get the above listing.

I think that the “made up” data (for filling in the outer join) is done as the combined (linked) report data is created. That means that by the time you get to actually displaying the report, there is no indication (using the linked fields) that something is missing. Because it really isn’t missing!

Now, if you could add a third object (a detail or measure of some kind) then you could filter the empty rows. For example, lets say that DP1 has the join column and a measure M1. DP2 has the join column and M2. The data would look like (cut this message and paste into notepad if the format looks strange…)

DP1 M1 DP2 M2
A 10 A 1
B 2
C 20
D 3
E 30
F 40 F 4
G 50 G 5

And the report results would look like (note the “duplicated” key and the missing support information):

DP1 M1 DP2 M2
A 10 A 1
B #EMPTY B 2
C 20 C #EMPTY
D #EMPTY D 3
E 30 E #EMPTY
F 40 F 4
G 50 G 5

So, you could filter on the #EMPTY values for M1 or M2, but not on the linked dimension itself. I don’t know for sure if this is exactly what is going on, but it seems to be a good guess.

Regards,
Dave Rathbun
Integra Solutions
www.islink.com See you in Orlando in… a few weeks!


Listserv Archives (BOB member since 2002-06-25)