Table Comparison Performance

system · January 9, 2013, 3:41pm

I have an issue with one of our table comparisons in that it is hanging after processessing about 8000 rows of data which complete in approximatly 40 seconds.

the process is comparing 35,000 rows against 2.5 million rows in the target table.

The source and target tables are in the same database (forced via a data transfer).

The comparison method is on row by row select and the comparison key has been indexed on the target table.

checking the database where the process is running then all we have is a select statement from the source and


FETCH API_CURSOR000000000000BDBA

is the other query that is running (the characters at the end are different)
the waits on the queries are all ASYNC_NETWORK_IO but these seem to clear quickly enough.

I should note that the database is SQL Server 2005 and i am running Data Services 12.2.3.3
I should also note that up to 2 days ago this process worked without issue (if a little slow) but there are dependencies on this job completing so several other processes cannot be started due to this.

I have also tried the sorted input method but to no avail, the Primary Key on the source is a varchar and i believe this could be the issue with the sorted input option.

Any help would be greatly appreciated.

davc4 (BOB member since 2009-07-03)

system · January 9, 2013, 5:38pm

Hi,
What problems are you facing with the sorted input option? This should work with a varchar PK.

If you view the optimised SQL you should see the ORDER BY statements for both the source query and the table comparison query - are these the same?

Regard’s
Ehtesham

Ehtesham (BOB member since 2012-09-14)

system · January 17, 2013, 5:11pm

The sorted input option often doesn’t like varchar column in the ORDER BY. At least on Oracle it will change the sort to use a binary version of the column. Since this is a function then Oracle won’t use the index to perform the sort.

I would try to figure out the row-by-row select issue first. That compare method may be your best bet since you have a target table with many more rows than you have in your input, though 2.5 million isn’t that many rows.

Did an index get dropped?
Are your database statistics up to date?

eganjp (BOB member since 2007-09-12)

system · January 17, 2013, 5:24pm

All the indexes are still there, that was my first thought as well.
I checked for index fragmentation, i have just checked the statistics last computed on the 9th of this month.

I will run a script to update those probably tommorrow.

we were having some virtual memory issues on one of the boxes but these have now been sorted, in the short term i added a temporary fix to run an update statement on the old rows and to just insert the new rows as current

I will be investigating this again shortly but i have pressures of other work that needs to be completed first.

and i agree 2.5 million is not to many rows we have some tables which are at least 20 times that number and these complete without to much of an issue.

I may try creating a view on just the current rows for use within the table comparison and see if this prgresses any better.

But again thanks for the help guys
Dave

davc4 (BOB member since 2009-07-03)

system · January 17, 2013, 5:30pm

I’ve had good luck using views in the Table Comparison. But this was only when I used the Sorted Input compare method. The view won’t do you much good for the Row-by-row Select compare method.

eganjp (BOB member since 2007-09-12)

system · January 17, 2013, 7:29pm

Jim, just out of interest - if you create a view to limit the target data set to only the current records, would row-by-row also not benefit from that?

As far as I know, and I could be wrong, row-by-row is also looking at all the records in the target table and not just the current ones?

ErikR (BOB member since 2007-01-10)

system · January 18, 2013, 11:09am

I have definetly found that it helps especially with data that changes frequently.

I just usually try to judge whether performing the view query first is faster than the individual queries, some of the rows in this particular flow have over 30 updates to them in the past few months to be honest i do this by trial and error.

Now i am begining to wonder if the issue is with the data transfer table which i created in the flow to use as a source to the table comparrison. I do this such that the source and the target can be in the same database on the same machine which usually leads to greater performance of the table comparison and i do not want to store staging tables directly on the data warehouse but again i haven’t had any issues with this approach for my other dimensions.

davc4 (BOB member since 2009-07-03)

system · January 18, 2013, 2:02pm

I was thinking about the performance of the view from a different perspective. But the way you’re trying to make it work might not be so bad. Using row-by-row and no view the TC would generate a result set with multiple rows in it which if the ORDER BY was used to sort the surrogate key in descending order the first row would be the one you want to compare against and there is no need to fetch the cursor past the first row. Using the view there would only be a single row in the result set but I suspect that the TC would still either do an internal sort or submit an ORDER BY with the SELECT statement. (I don’t recall if the row-by-row SELECT shows up in the Display Optimized SQL. That would tell you if an ORDER BY is used or not - or trace the SQL).

Use a view and you produce a smaller result set. Skip the view and you produce a larger result set but you may gain a very small amount of time because the DB optimization doesn’t have to consider the view for each input row.

eganjp (BOB member since 2007-09-12)

system · January 25, 2013, 3:40pm

Another possibility would be to use a SQL Server EXCEPT (equivalent to an Oracle MINUS) to identify the changed rows, and then use a MERGE (DS can generate this via the auto-correct load option on a target table) to apply the updates. It does require a DB view or a SQL transform to do the EXCEPT, though. I have seen this working very well for certain situations in Oracle, but I’ve never tried it in SQL Server.

dastocks (BOB member since 2006-12-11)