incrmental load and delta load mechnasim.

hi,

i need the way how the incremental load and delta load works in rapidmarts.

if any one can get the wiki links for the this issue it will be help full to me.

Thanks. :mrgreen:


imrankhan (BOB member since 2012-02-08)

Best will be to look at the delta dataflows yourself and try to understand them. It is different for every table as the way to identify changes is different in the source for each.
In other words, there is no generic method in SAP, hence not in the RapidMart.


Werner Daehn :de: (BOB member since 2004-12-17)

Hi Werner,

but in the rapidmarts we will follow the functionality of using first and delta loads,
so based on the table AW_JOB EXECUTION it will work.but here how does it perform this actions based on that table.

can u give me a brief idea abt how this will load data into first and delta.


imrankhan (BOB member since 2012-02-08)

Via the status table the global variable $G_SDATE and $G_EDATE are set and it is the responsibility of the delta dataflow to load all changes that happen in this window.

Many dataflows, especially the one based on smaller tables, simply read all the source data and feed it into a Table_Comparison transform to figure out all changes - including the ones that happened in that delta window.

Others might right CDHDR table, others have a timestamp column you can trust and no deletes will happen in, others…


Werner Daehn :de: (BOB member since 2004-12-17)

Control Tables
AW_JOBEXECUTION
AW_FIRST_RUN_DATE

Global Variables
$G_SDATE
$G_EDATE
$G_LOAD_TYPE

Job Level functions that read/write to control tables
AW_StartJob
AW_EndJob
RM_PutFirstRunDate

Job Level Scripts that call the functions
InitializeJob calls AW_StartJob
EndJob calls AW_EndJob

‘FIRST’ run example
$G_SDATE = ‘2000.01.01’
$G_EDATE = sysdate()
$G_LOAD_TYPE = ‘FIRST’

  1. The InitializeJob script calls the RM_PutFirstRunDate function and passes job name and global variables
  2. The RM_PutFirstRunDate function creates/updates a record in the AW_FIRST_RUN_DATE table with the following values
    [list]JOB_NAME = job_name()
    JOB_LOAD_TYPE = $G_LOAD_TYPE
    JOB_START_DATE = $G_SDATE
    JOB_END_DATE = $G_EDATE
    [/list]3. The InitializeJob script calls the AW_StartJob function and passes job name and global variables
  3. The AW_StartJob function creates/updates a record in the AW_JOB EXECUTION table with the following values
    [list]NAME = Job Name
    EXTRACTLOW = $G_SDATE
    EXTRACTHIGH = $G_EDATE
    STATUS = ‘started’[/list]
  4. The ETL Job resumes and performs encapsulated WFs and DFs
  5. If no errors the EndJob script calls the AW_EndJob function and passes job name
  6. AW_EndJob updates the specific job record in the AW_JOB EXECUTION table and sets the status column = ‘done’
  7. If a job failure occurs, the AW_EndJob function is never called hence the AW_JOB EXECUTION table is never updated with a status of ‘done’

‘DELTA’ run example
$G_SDATE = based on last successful completion of the ETL at the job level
$G_EDATE = sysdate()
$G_LOAD_TYPE = ‘DELTA’

  1. The InitializeJob script calls the RM_PutFirstRunDate function and passes the job name to validate that a ‘FIRST’ load has been run
  2. If a ‘FIRST’ load has not been run – you will receive a fatal error ‘Error - First Run date for Job in table AW_FIRST_RUN_DATE is NULL ’
  3. If a ‘FIRST’ run record is found, the InitializeJob script resumes and calls the AW_StartJob function and passes job name and global variables
  4. The AW_StartJob function determines if previous job ran to completion by interrogating the STATUS column of the AW_JOB EXECUTION table
  5. Based on last successful run the following values are used
    [list]NAME = Job Name
    EXTRACTLOW
    ----set to EXTRACTHIGH of the previous job if it completed (status=’done’)
    ----set to EXTRACTLOW of previous job if it did not complete (status=’’started’)
    EXTRACTHIGH = $G_EDATE
    STATUS = ‘started’[/list]
  6. The ETL Job resumes and performs encapsulated WFs and DFs
  7. If no errors the EndJob script calls the AW_EndJob function and passes job name
  8. AW_EndJob updates the specific job record in the AW_JOB EXECUTION table and sets the status column = ‘done’
  9. If a job failure occurs, the AW_EndJob function is never called hence the AW_JOB EXECUTION table is never updated with a status of ‘done’

***NOTE – if your ‘FIRST’ run resulted in an error it is best to re-run in ‘FIRST’ mode – otherwise a ‘DELTA’ would ensue, using the start date of the failed ‘FIRST’ – this could cause slow performance as the data would be subject to ‘DELTA’ logic (i.e. Target Based comparisons)


SalH :mexico: (BOB member since 2010-11-09)

Thanks.

This is great and helpfull to me.


imrankhan (BOB member since 2012-02-08)