Table Comparison, detect deleted rows causing insert issue

I am relatively new to BODI and am using version 11.5.

I have done several DF with Query, Table Comparison without issue. All Table Comparison have ‘Detect deleted rows’ checked so that deleted rows are deleted in target table.

However, recent DF fails to insert all rows into an empty table when ‘Detect Deleted rows’ is checked. Debug shows that the rows have an op-code of ‘I’, but the rows do not exist in table upon job completion. If I uncheck ‘Detect deleted rows’, all rows load to table. Table comparison is followed by key_Generation, if that matters at all. About 50% of the rows are inserted if ‘detect deleted rows’ is checked and there is no pattern as to why certain rows are not inserted.

The Table comparison has primary key of three columns and there are three other columns designated for column comparison.

If an initial load of table requires that ‘Detect deleted rows’ be unchecked, that is fine, but I would like explanation as to why–since logically to me— deleted records should not be detected during an initial load of a table.

Thanks much for any help on this.


ssmith (BOB member since 2011-03-04)

HI Smith,

Please check the below post, it may be help you how to insert into target when you are using table comparison.

Thanks & Regards,
Ramana.


Ramana :india: (BOB member since 2009-04-30)

Thanks for the link, but I do not think it applies to my scenario.

My DF is simple: Query, Table Compare, Key Generation. Target

I am using same table for Table Compare, Key Generation and final Target steps. I am not doing any kind of map operation, or history preserving. I have no special column value updates to do after the table compere. There are no duplicate records going into the table compare and I have no fancy primary key or column compares within it. All my DFs are like this and all work for the initial load except the latest one. If I uncheck the ‘detect deleted rows’, everything initially loads (inserts) fine. Then if I check ‘detect deleted rows’ and run the data flow again, after updating, adding and deleting rows in the source table, all updates work just fine. I only seem to have trouble with the DF if the ‘detect deleted rows’ is checked on the initial load of the table.

I can certainly change the flow to be more complex like that suggested in the link, but it seems to be a lot of extra work and processing time just to handle what the table comparison is supposed to do, and appears to do successfully, most of the time.

I was hoping for a logical reason for the table comparison’s odd behavior, but perhaps it is just a bug or glitch that randomly occurs.

I will gladly adjust the DF, if someone provides a reason why such changes are necessary.


ssmith (BOB member since 2011-03-04)

Can we go through this case very carefully? I have no idea what could cause that but maybe something obvious pops up once we compare the settings…

  1. What are the PKs of the source table?
  2. What comparison method are you using in TC?
  3. What is the input-primary-key list of the TC?
  4. What is the comparison table in TC?
  5. When drilling into the target table, what column is marked as PK?

Werner Daehn :de: (BOB member since 2004-12-17)

Thanks for working with me on figuring this out—I appreciate it.

  1. There is no PK on the source table for the TC.

  2. I am using Row by Row select comparison method

  3. The input-primary-key list of the TC is: re_id, cd, freq_cd

  4. The comparison table for the TC is: nm, last_lgcy_updt_ts, srt_ordr

  5. The cust_sctn_defn_id column is the primary key on the target table. The value for this field is generated using a Key_Generation right after the TC.

I have attached a screen print of the table compare and one of the entire DF as well.
DI.doc (213.0 KB)


ssmith (BOB member since 2011-03-04)

Still I fail to understand why detect deletes has the effect you are describing but given your settings I believe we are on the right track.

I suspect the three columns you defined as input-primary-key are actually not unique. So I would add another template table right before the TableComparison and then execute a

select re_id, cd, freq_cd, count() from template_table group by re_id, cd, freq_cd having count() > 1

If these three columns really are unique, above query should return no rows. But if I am right, and you haven’t set detect-duplicates in the TC transform, this together with the row-by-row setting can result in unstable results.

Detect-Deletes works by taking the complete target table data MINUS the source data at the end. And what’s left by this operation are those rows that exist in the target but no longer in the source, hence they had to have been deleted in the source. But without a PK…


Werner Daehn :de: (BOB member since 2004-12-17)

The query returned no rows…the three fields I have for primary key are a unique combination. I was 99% positive sure of it, but checked anyway–for you and myself. I am now 100% sure.

But maybe this additional information I have found may shed some light: The target file has a key of cust_sctn_defn_id and this column is serial in the DB with a sequence. The sequence value for this column is always 1 no matter how I load the table—and I understand why. DI is not really using the sequence on the column when inserting rows and that is fine for now. However, when I check ‘detect deleted rows’ the starting sequence used for the inserted rows is always 1201 and the number of rows inserted is 388. When I uncheck the option, the starting sequence is 1 and 1588 rows are inserted. 1588 records is correct.

Any clue where the starting sequence of 1200 may be coming from when ‘detect deleted rows’ is checked? This is the first table that has had a sequence associated with the primary key column. Perhaps that has something to do with it? The target table is an Enterprise DB table—postgre SQL, if that sheds any more light.


ssmith (BOB member since 2011-03-04)

Sorry, previous reply should have stated ‘where is starting sequence of 1201 coming from’. Sorry for confusing the matter.


ssmith (BOB member since 2011-03-04)

Well, eventually subsequent loads on the table failed to insert all new rows. And the sequence value was jumping around on those that were inserted. I definitely suspect the sequence on the table is the issue. So, I am posting this reply to help some other poor soul who runs into the same issue.
I finally used the suggestion given ealier, of splitting the inserts from the updates and deletes with a Mop Operation. And after the MO for the inserts, I used a Query to obtain the next value from the sequence that is on the table. I created a function in the database to obtain the next sequence value. But I had to use an SQL select to call the function. I could import the function just fine, but I could not call the function from mapping on an output schema field, nor could I call the function from the output scheuma using ‘New function call’. Both produced the same error upon execution—parameters of 2 do not match requested parameters of 0 (something like that).
Calling the function direct is probably faster, but I could not get it to work at this time.


ssmith (BOB member since 2011-03-04)

Hi,

I noticed that when you use a MO after a TC with “Detect deleted rows” option, the “delete” operation code is correct after the MO.
But once you add a query transform after the MO, your “deleted rows” become “normal” rows… that may explain a “insert” error at the target…

My aim was to update rows in target that have been deleted in source (i.e. closing a validity period). I used a specific DF branch to manage only deleted records and add another MO (normal -> update) just before the target.
This solution is similar to the one described in topic 171576: https://bobj-board.org/t/171576

I guess delete operation is very special because you get data from target (comparison table) instead of from source.
I use BODS 4.0.

Regards,
Jerome


jd.pro :fr: (BOB member since 2010-11-15)

Thanks for the reply which reminded me to post a possible final solution to this problem.
I still had intermitten problems with the incorrect deletion of rows. All the rows would be inserted correctly, but then rows would be subsequently deleted for no reason. The number of deleted rows was always an even 1000s (2000, 3000, 4000, etc) leaving an odd number of rows remaining in the table. And I was having this issue on different tables at different times. This lead me to believe it was a committ issue of some sort—due to the deleteion of exactly 1000s of rows—always. So I divided the logic into two different types of data flows. One does the TC just for adds/updates. The second does the TC just for deletes. The second data flow immediately follows the data flow for add/updates. When I went through all my data flows to make this change, I found that some were already doing this type of flow due to the logic involved. The flows I ended up changing were the flows where the tables were having 1000s rows intermittenly deleted. I had tried everything else and could not get a 100% dependable solution. These two flows do take longer, but it seems to be working correctly and consistently at this writing. I will post again if this was not the final solution.


ssmith (BOB member since 2011-03-04)