Delta load strategy for customer dimension

Hi,

Our source system has 3 tables we need to build the customer dimension; customer (basic info like name and DoB), customer_address (address and postcode) and customer_contact (phone numbers and email addresses).
Each table has modified dates which we can use for delta loads to pick up new and changed rows.

My question is what is the best practice for implementing the delta logic using Data Services. My initial load will extract all data into my warehouse source tables. The thinking was to have another set of source tables (suffixed “_DELTA”) that contain the changed rows populated in the subsequent delta loads. The workflows to then build the customer dimension would use these delta tables and not the full source tables. However the problem comes when a customer changes email address for example, their record in the customer table is not altered so won’t be extracted into the DELTA table and therefore this won’t flow through into the end dimension.

Do I maybe do away with the delta tables, use the modified dates to update the full source tables in the warehouse, and then use the modified dates again on the full source tables to build the customer dimension (only processing new and changed rows). But I would need an OR statement on the dates of all 3 tables to look for changes in either of them. I think this might be slower that having the exact data I need in their own DELTA tables.

Any input or experience welcome.

Thanks,

b/


Bisquite (BOB member since 2009-07-15)

How are you going to get the delta …?

Is it CDC captured enabled database … or Previous day’s Database backup Vs Current day’s Database backup. !!

one way is to create a view to get the full delta row from your 3 source tables and use it as source in Data Services.


BO_Chief :us: (BOB member since 2004-06-06)

We are just going to select data from each table based on the modified date being greater than the last load date. Our standard practice is to keep copies of the source tables in the warehouse. So we would update these with new rows, then I think use your view method (or just a normal DI query) on the warehouse’s source copies to process the new rows into the dimension.


Bisquite (BOB member since 2009-07-15)