Inner Join results used in criteria of Outer Join Query

system · June 16, 2008, 5:32pm

I’m sure this has been discussed extensively, but I each post I try seems to lead me into another problem. So, here it goes.

Job: Create daily flat file of Surgery Logs based on Surgery date and Posting Date

Query 1: Inner Join query on Logs and Audit trail. Returns ~100 logs IDs. Log table 1,000,000. Audit table 10,000,000.

Query 2: Outer Join query on Log table, providers, diagnosis, providers, etc where I would ideally like to limit to just the 100 logs identified in Query 1.

Query 1 is efficient. I cannot store the results of Query 1 in the same datastore as Query 2 because of existing busines policies. If I could, I would just include the Query1_Results table in Query 2 as the first table and everything would be fine.

So instead I chained query 1 into Query 2 in the same dataflow where Query2.LogID = Query1.LogID. When I do this, Query 2 decides that it is going to query every table individually and perform the joins in memory. This causes major performance problems due to the size of the other tables in Query 2.

All of the tables feeding Query1 and Query2 are listed as “Cached” = No. Join Rankings don’t seem to be helping.

In a different enviornment I would normally make an array with the 100 IDs and include them in a sub-select; or nest the Query1 in a subselect; or combine Inner/Outer joines in a single query. I don’t believe these are options in DI.

Query 1 returns the Primary Key of Log. I am attempting to use this PK to drive Query 2. Any suggestions? Should I be able to do this with join ranks?

I could push down the entire Query 2 WHERE clause, but that is less than optimal.

BTW, Query2 always generated a nice query until I tried to add the results of Query 1. Query 1 was a late modification to the selection critiera.

Redirects to prior posts are always welecomed.
Thanks

hellmanj (BOB member since 2007-09-14)

system · June 18, 2008, 11:20am

Why is a pushdown less than optimal? If the where clause can be pushed down, that would seem to solve the problem entirely?

And yes, DI can do a nested subselect in the where clause. I use it all the time to accomplish exactly what you are doing here. It uses pushdown_sql to do it.

Example where clause:

pushdown_sql('DS', 'query2.LogID in (select Log_ID from table1 where criteria1 = \'A\' and criteria2 = 52 and criteriaDate >= \'01-JAN-08\' and criteriaDate < \'01-FEB-08\')' and query2.criteria1 = \'B\')

Ernie

eepjr24 (BOB member since 2005-09-16)

system · June 19, 2008, 8:44pm

Regarding my usage of the term ‘optimal’, I am using DI to track table usage and join criteria. The less I put into pushdownsql or SQL transforms should translate into more robust change managment reports detailing where and how objects are used. Not sure if this is going to pay off, but I’m giving it a shot.

Your example is amazingly close, too bad I didn’t give enough info. Query2 is an outer join query with a number of tables. To the best of my knowledge, you can’t use the pushdown_sql() in an outer join QUERY object. The following error results.

[Query:GET_LOG_INFO]
BODI-1111077: Invalid query <Constant predicate <pushdown_sql(‘datastore’, ‘select 1’ )> is not allowed in the WHERE clause of an outer join query transform.>.

Am I wrong about this?

hellmanj (BOB member since 2007-09-14)

system · June 20, 2008, 12:01pm

Sort of. =)

You can use a pushdown in an outer join, as long as you are not doing a self join.

You have to cheat and do all the join criteria yourself, though. Check this thread for a more detailed discussion:

If you need an assist, post the SQL that is being generated and I can try to point you in the right direction.

Ernie

eepjr24 (BOB member since 2005-09-16)

system · March 11, 2010, 3:40pm

Thanks eepjr24. I did read and attempt this when you posted, but failed to post my results.

In our case the DBA has set the database to Sql 2005 compliant and thus cannot use the *= type syntax.

Msg 4147, Level 15, State 1, Line 4
The query uses non-ANSI outer join operators ("=" or "="). To run this query without modification, please set the compatibility level for current database to 80 or lower, using stored procedure sp_dbcmptlevel. It is strongly recommended to rewrite the query using ANSI outer join operators (LEFT OUTER JOIN, RIGHT OUTER JOIN). In the future versions of SQL Server, non-ANSI join operators will not be supported even in backward-compatibility modes.

hellmanj (BOB member since 2007-09-14)

system · March 11, 2010, 3:57pm

Wow, talk about a necropost. =P

I don’t know how to get around it if the DBA is excluding this type of join. The only thing that comes to mind is trying a function call lookup with a custom SQL and pre-cache?

Ernie

eepjr24 (BOB member since 2005-09-16)

system · March 11, 2010, 6:22pm

I used an inner join on my core tables where I am confident I will always have a record to join to. On those that I wanted to left join to, I just excluded them. Then I perform a lookup for those that would have been left joined. In most cases I can get away with this. In the cases where it is possible I miss something due to really bad data, I setup an exception report in webi that looks just for those records.

In those cases you really, really need a left join across databases due to ten’s of millions of records in several tables, then I have to use a sql_transform.

Thanks

hellmanj (BOB member since 2007-09-14)

system · March 11, 2010, 8:15pm

Before using a SQL Transform and assuming there is no other way, please consider using a database view that does the join instead. Logically the same as a SQL Transform but more maintainable.

Werner Daehn (BOB member since 2004-12-17)