How does the aggregation/sql actually work?

I’ve been told I’m wrong about how something works in Business Objects and I’m convinced I’m right. If I’m wrong, I need somebody to explain it to me because it changes everything I ever thought I understood. If you help prove me right, I’ll be forever grateful…I’ve been told part of my misunderstanding is because of the data so I’ll try to give the whole story…

We have a data warehouse. It is not normalized. I am retrieving sales data for products. Each product has a product number and most have a size. The data warehouse folks have added a class with regulatory data for me. The regulatory data can send as many as three records (currently) for each product/size. These are unique based on the “Regulatory Location”. Obviously if I query sales based on product, size, and regulatory information, I will get the sales three times…once for each regulatory location.

The query I wrote included objects from my class for product number and size and from other classes for quantity, source, uom, and some other miscellaneous objects. I then limited the query to the Regulatory Location of “SC”.

It has been my understanding that when I choose a combination such as product number and size that the sum of the quantity will be based on the unique combination of product and size. They are insisting the query contained duplicates in the sum. Here is the SQL…

SELECT
  SPA_ADMIN.HEARS_SAM_REGULATORY_INFO.HEARS_FILL_REX,
  SPA_ADMIN.HEARS_SAM_REGULATORY_INFO.HEARS_SIZE_CD,
  SPA_ADMIN.PRODUCT.STD_CONVERSION_FACTOR,
  sum(SPA_ADMIN.SHIPMENT_FACT_QAD.ORDERED_QTY),
  sum(SPA_ADMIN.SHIPMENT_FACT_QAD.SHIPPED_QTY),
  sum(SPA_ADMIN.SHIPMENT_FACT_QAD.DUE_QTY),
  sum(SPA_ADMIN.SHIPMENT_FACT_QAD.ORDERED_UNITS),
  sum(SPA_ADMIN.SHIPMENT_FACT_QAD.SHIPPED_UNITS),
  sum(SPA_ADMIN.SHIPMENT_FACT_QAD.DUE_UNITS),
  SPA_ADMIN.CUSTOMER.SOURCE_ID,
  SPA_ADMIN.PRODUCT.UNIT_OF_MEASURE,
  SPA_ADMIN.PRODUCT.SOURCE_ID,
  SPA_ADMIN.SHIPMENT_FACT_QAD.SOURCE_ID
FROM
  SPA_ADMIN.HEARS_SAM_REGULATORY_INFO,
  SPA_ADMIN.PRODUCT,
  SPA_ADMIN.SHIPMENT_FACT_QAD,
  SPA_ADMIN.CUSTOMER
WHERE
  ( SPA_ADMIN.SHIPMENT_FACT_QAD.CUSTOMER_KEY=SPA_ADMIN.CUSTOMER.CUSTOMER_KEY  )
  AND  ( SPA_ADMIN.SHIPMENT_FACT_QAD.PRODUCT_KEY=SPA_ADMIN.PRODUCT.PRODUCT_KEY  )
  AND  ( SPA_ADMIN.PRODUCT.SKU=SPA_ADMIN.HEARS_SAM_REGULATORY_INFO.HEARS_AOM_PARTNO  )
  AND  (
  SPA_ADMIN.CUSTOMER.SHIP_TO_STATE  =  'CA'
  AND  SPA_ADMIN.SHIPMENT_FACT_QAD.INVOICE_DATE_KEY  BETWEEN  20030101 AND 20030131
  AND  SPA_ADMIN.HEARS_SAM_REGULATORY_INFO.HEARS_REG_GROUP_ID  =  'SOUTH COAST'
  )
GROUP BY
  SPA_ADMIN.HEARS_SAM_REGULATORY_INFO.HEARS_FILL_REX, 
  SPA_ADMIN.HEARS_SAM_REGULATORY_INFO.HEARS_SIZE_CD, 
  SPA_ADMIN.PRODUCT.STD_CONVERSION_FACTOR, 
  SPA_ADMIN.CUSTOMER.SOURCE_ID, 
  SPA_ADMIN.PRODUCT.UNIT_OF_MEASURE, 
  SPA_ADMIN.PRODUCT.SOURCE_ID, 
  SPA_ADMIN.SHIPMENT_FACT_QAD.SOURCE_ID

Because I selected objects from both classes this caused the join

SPA_ADMIN.PRODUCT.SKU=SPA_ADMIN.HEARS_SAM_REGULATORY_INFO.HEARS_AOM_PARTNO  )

which I believe created a query to sum based on product and size and was unique on that basis.

I said that the whole purpose of Business Objects is that it will sum based on the scenario created by the objects chosen. I’m really confused…

Help?


Eileen King :us: (BOB member since 2002-07-10)

What is the cardinality between these two tables:
SPA_ADMIN.PRODUCT.SKU=SPA_ADMIN.HEARS_SAM_REGULATORY_INFO.HEARS_AOM_PARTNO

Looks like Many to Many…correct??


dcdas :us: (BOB member since 2002-06-17)

Hi Eileen, ok, if you are joined to the table that has 3 records for each product, then you will get 3 records for each product, each with a duplicate amount field, if you pull only product and amount and not a field from the regulatory table, but the table exists in the query, you will still get 3 records, so when you sum, you will sum all the amounts and get double or triple the expected amount.

Lets say you have Product A, it has a amount value of 10. Product A has 3 regulatory blah blahs. then your you would have a record for each blah blah with the same product ID and size. Now, in your case it looks like you have objects from the regulatory table, are these 3 values unique that you pulled? If so, you wont sum your amount, it will give 3 records for each. So in your report, if you only pull product ID and amount, it is going to sum those amounts up and give you incorrect results.

This help any?

So, I am thinking that you are pulling Products for a specific regulatory loacation and the location exists in the regulatory table. So when you put the condition on location, the sql had to include the table and create a join for it. Now, this is where your multiple records come in. The join says give me records from table a and table b where a.id = b.id. Well, b has 3 records that are equal to the one in a, so there are your 3 records giving you multiple amounts for one ID.


Scott Bowers :us: (BOB member since 2002-09-30)

It’s not a universe that I have designed, so I’m not completely sure. I believe that the SKU is supposed to be unique based on the SKU and the PRODUCT.SOURCE ID. We can have the same product in many source systems but it should be unique for each source. It is currently set in the universe as a one to one…but I believe it may have been previously set as a many to many.

It would be one to many if the regulatory location was not set but it was so that should have created a one to one.


Eileen King :us: (BOB member since 2002-07-10)

This should have made it pull only one record for each product/size and given me only the ones that have a Regulatory Location = “SC”. Right?


Eileen King :us: (BOB member since 2002-07-10)

If there is one record per location, then you should only get one record yes. Then you would be ok.


Scott Bowers :us: (BOB member since 2002-09-30)

Now…any ideas how I prove that I was right? This was stated in a meeting (with my boss who now thinks I don’t have a clue)…and I said that based on everything I knew the aggregation would be based on the objects selected…that’s what business objects does.

The other person’s insistence is that in the group by it will sum up for all of the fill rexes, then all of the size codes…and that it is the group by that matters over and above the join.


Eileen King :us: (BOB member since 2002-07-10)

Just create 2 temporary tables with a few test rows… and run your query against it…


Andreas :de: (BOB member since 2002-06-20)

All the group by does is group common values of the fields in it. If you have an aggregate in a select, you have to do something else with the rest of the fields in the select, otherwise your query, wont know what to do if there are multiples. So know, it doesn’t over ride the join.

A way easy way to prove this. Limit the query you posted to one Account that you know has multiple regulatory records. You should get one row. Create a second query and take out the group by and the aggregations in the select, limit this query to the same account. Should get the exact same results.


Scott Bowers :us: (BOB member since 2002-09-30)

I have one more comment that could prove the other person correct. In a query with aggregation, if a field doens’t have an aggregation, it must be in the select, you can also add other fields that are not in the select. So lets say there is a field in Regulatory that would give you multiple records for one location, not sure if you have this, and it is added to the group by, not, BO wont do this on its own, you would have to manually edit the SQL, then, you would get multiple records. I dont think this is the case, just wanted to mention it.


Scott Bowers :us: (BOB member since 2002-09-30)

The source problem seems to be an unresolved fantrap in the universe…
Any chance of resolving it at the universe level (aliases and contexts)?

Regarding how the SQL is executed:
Does the DB engine not just create some kind of temporary table to store all the rows from the joins, applies the WHERE conditions (order of joins and WHERE conditions dependant on DB optimizer), then does a Group by, then a Having by?


Andreas :de: (BOB member since 2002-06-20)

The following SQL will validate whether or not there are “duplicate” rows in your REGULATORY_INFO table. If that is the case, then you will have multiple rows with the same value returned (in essence fanning your results out).


select TOT_ROWS,
		count(PART_NUM)
from 
	(
		select 
			HEARS_AOM_PARTNO PART_NUM,
			count(*) as TOT_ROWS
		from 
			SPA_ADMIN.HEARS_SAM_REGULATORY_INFO
		group by 
			HEARS_AOM_PARTNO 
	)
group by 
	TOT_ROWS
order by 1;

The TOT_ROWS represents the total_number of rows per PARTNO. The count(PART_NUM) shows how many parts have x number of rows. So, fanning is probably not always 3x the amount. Some Parts may have 1 line, other 2 and yet others 3.

You can either COUNT these “rows” and divide your value by the count (for each part), or you can ensure that you go off a view that only selects distinct rows for obtaining the parts.

-RM


digpen :us: (BOB member since 2002-08-15)

I am having difficulty running free hand sql against that universe because I don’t have the rights…

The other thing I did was to make sure that “avoid duplicate row aggregation” was checked and did count all on the product number and the sku. They’re equal.


Eileen King :us: (BOB member since 2002-07-10)

This isn’t really a BusinessObjects question, it’s a fundamental SQL question. BusinessObjects simply generates the SQL, it’s up to the database to run it.

And my understanding is that the query FROM clause and WHERE clause are evaluated first. In other words, where am I going, and how do I get there. The rows are then retrieved based on the items in the Select clause. Once that has been done, the group by (order by, having) all happen last. So as you (Eileen) suggested, the items in the select clause have a direct bearing on how the group by is implemented.

However, if you have multiple rows per measure (fan trap) you will see inflated values due to the “child” (detail) table inflating the parent (master) table rows. But as long as you can ensure that there is only one child per parent, then you don’t have a problem.

Again, this is not a BusinessObjects discussion, really. Any SQL tool would work the same way.

Dave


Dave Rathbun :us: (BOB member since 2002-06-06)