Hi,
Our source system has 3 tables we need to build the customer dimension; customer (basic info like name and DoB), customer_address (address and postcode) and customer_contact (phone numbers and email addresses).
Each table has modified dates which we can use for delta loads to pick up new and changed rows.
My question is what is the best practice for implementing the delta logic using Data Services. My initial load will extract all data into my warehouse source tables. The thinking was to have another set of source tables (suffixed “_DELTA”) that contain the changed rows populated in the subsequent delta loads. The workflows to then build the customer dimension would use these delta tables and not the full source tables. However the problem comes when a customer changes email address for example, their record in the customer table is not altered so won’t be extracted into the DELTA table and therefore this won’t flow through into the end dimension.
Do I maybe do away with the delta tables, use the modified dates to update the full source tables in the warehouse, and then use the modified dates again on the full source tables to build the customer dimension (only processing new and changed rows). But I would need an OR statement on the dates of all 3 tables to look for changes in either of them. I think this might be slower that having the exact data I need in their own DELTA tables.
Any input or experience welcome.
Thanks,
b/
Bisquite (BOB member since 2009-07-15)