I’m sure this has been discussed extensively, but I each post I try seems to lead me into another problem. So, here it goes.
Job: Create daily flat file of Surgery Logs based on Surgery date and Posting Date
Query 1: Inner Join query on Logs and Audit trail. Returns ~100 logs IDs. Log table 1,000,000. Audit table 10,000,000.
Query 2: Outer Join query on Log table, providers, diagnosis, providers, etc where I would ideally like to limit to just the 100 logs identified in Query 1.
Query 1 is efficient. I cannot store the results of Query 1 in the same datastore as Query 2 because of existing busines policies. If I could, I would just include the Query1_Results table in Query 2 as the first table and everything would be fine.
So instead I chained query 1 into Query 2 in the same dataflow where Query2.LogID = Query1.LogID. When I do this, Query 2 decides that it is going to query every table individually and perform the joins in memory. This causes major performance problems due to the size of the other tables in Query 2.
All of the tables feeding Query1 and Query2 are listed as “Cached” = No. Join Rankings don’t seem to be helping.
In a different enviornment I would normally make an array with the 100 IDs and include them in a sub-select; or nest the Query1 in a subselect; or combine Inner/Outer joines in a single query. I don’t believe these are options in DI.
Query 1 returns the Primary Key of Log. I am attempting to use this PK to drive Query 2. Any suggestions? Should I be able to do this with join ranks?
I could push down the entire Query 2 WHERE clause, but that is less than optimal.
BTW, Query2 always generated a nice query until I tried to add the results of Query 1. Query 1 was a late modification to the selection critiera.
Redirects to prior posts are always welecomed.
Thanks
hellmanj (BOB member since 2007-09-14)
(BOB member since 2005-09-16)
(BOB member since 2004-12-17)