Hi again,
today I was thinking about creating dataflows, that have the most correct approach for incremental data update.
So our source database tables contain field called “date_added” and “date_updated”.
When our workflow starts, it takes the current time into variable called “$newMaxDateUpdated” and it finds a last time when the specfic table was updated into
variable called “$maxDateUpdated”. When the dataflow starts, it makes a query on the source data with the condition like this “where t.date_updated > {$maxDateUpdated} or t.date_added > {$maxDateUpdated}”.
This dataflow makes table compare and some of them also use history preserve.
After the completion of specific dataflow work, the value of variable $newMaxDateUpdated will be recorded to our table called “table_update_log”.
And next time the variable called $maxDateUpdated will get the value from that table again and so on…
I was thinking about the time comparison, because when the dataflow starts and the query is being made, the operation may be quite fast and there is a possibility that on the same time somebody updates data on these tables. So there may be double rows afterwards in the future runs…
What do you think about this solution? Is somebody facing same issues and what solutions are you using then?
bobmeediake (BOB member since 2008-12-01)