SCD type 2 in Teradata

system · October 11, 2010, 11:42am

Hi All,

I m new to this SAP DS.As per the below link i m trying to achieve SCD type 2 for my DB Teradata.

https://wiki.sdn.sap.com/wiki/display/BOBJ/Slow%20Changing%20Dimension%20Type%202

My Target table is already having the below data

key_id,cust_id,cust_name,city,valid_from,valid_to,curr_ind
1,2001,Garnier soft serv,San Jose,1/17/2006,12/31/9999,Y

After inserting 3 records into my source table

insert into SCD_SRC values(1,‘Adobe software’,‘Sun City’,‘2006-12-20’);
insert into SCD_SRC values(2001,‘Garnier soft serv’,‘Los Angeles’,‘2006-12-20’);
insert into SCD_SRC values(4001,‘Requisite Online’,‘San Francisco’,‘2006-12-20’);
My Job Design

Source Tab —> Query → TC → HP → KG → Target Tab

But when i m trying to run the job it’s stuck due to to some reason.And i have followed all the steps mentioned in the above link.But when i have tried in the oracle DB it’s running fine but not in teradata.

Awaiting for valuable response!

Thanks
SP

sp_ctc (BOB member since 2010-08-25)

system · October 12, 2010, 5:51am

It is probably a lock between the updates and the reads of the target table. Should not happen in databases, but many take the shortcut to apply locks for reads as well to make sure the reads are consistent.

So either you use TableComparison in cached mode or you switch the datastore via the session settings to uncommitted reads - however that’s done in Teradata.

Werner Daehn (BOB member since 2004-12-17)

system · October 12, 2010, 7:30am

Hi ,
Thanks for ur reply…For this job i was processed only 3 rows and it’s taking 4 hrs to complete.
What would be the better approach to run faster.

Thanks
SP

sp_ctc (BOB member since 2010-08-25)

system · October 12, 2010, 10:38am

Ignoring the uncommitted read for a second…

Set TC to row-by-row comparison
Make sure the fields listed in the TC’s input-primary-keys area are indexed. The transform will execute a “select … from comparison table where key1=X and key2=Y” with key columns being these input-primary-keys. Without that index above SQL will take a while for each input row.
Make sure the target table has a primary key and the key has an index so that existing rows are found fast for updates.

TC in row by row mode

Werner Daehn (BOB member since 2004-12-17)

system · October 12, 2010, 11:06am

Hi,

I have implemented the same thing what you have mentioned but it having same issue.Please find the below job log.

b 10-12-10 05:52:37 (938160:0001) JOB: Reading job <81c7abf6_3ba0_4912_a871_3ccfa2df050e> from the repository; Server version is <12.2.2.1>; Repository version is
<12.2.2.0000>.
(12.2) 10-12-10 05:52:37 (938160:0001) JOB: Current directory of job <81c7abf6_3ba0_4912_a871_3ccfa2df050e> is </BODI/dev/DS/dataservices/bin>.
(12.2) 10-12-10 05:52:37 (938160:0001) JOB: Starting job on job server host , port <3555>.
(12.2) 10-12-10 05:52:38 (938160:0001) JOB: Job <SCD_Chk_Jb> of runid <20101012055237938160001> is initiated by user .
(12.2) 10-12-10 05:52:38 (938160:0001) JOB: Processing job <SCD_Chk_Jb>.
(12.2) 10-12-10 05:52:38 (938160:0001) JOB: Optimizing job <SCD_Chk_Jb>.
(12.2) 10-12-10 05:52:38 (938160:0001) JOB: Job <SCD_Chk_Jb> is started.
(12.2) 10-12-10 05:52:39 (999540:0001) DATAFLOW: Process to execute data flow <Copy_1_SCD_Check> is started.
(12.2) 10-12-10 05:52:40 (999540:0001) DATAFLOW: The specified locale <eng_us.iso-8859-1> has been coerced to <eng_us.cp1252> for data flow <Copy_1_SCD_Check> because all
datastores obtain data in same locale <eng_us.cp1252>.
(12.2) 10-12-10 05:52:40 (999540:0001) DATAFLOW: Data flow <Copy_1_SCD_Check> is started.
(12.2) 10-12-10 05:52:40 (999540:0001) DATAFLOW: Cache statistics determined that data flow <Copy_1_SCD_Check> uses <0> caches with a total size of <0> bytes. This is less
than(or equal to) the virtual memory <3757047808> bytes available for caches. Statistics is switching the cache type to IN
MEMORY.
(12.2) 10-12-10 05:52:40 (999540:0001) DATAFLOW: Data flow <Copy_1_SCD_Check> using IN MEMORY Cache.[/b]

The job hangs for long time in the above log information.

But i have implemented the same thing in oracle DB , it took 1 to 2 sec to load the data in the target table.
Is there any issue with the teradata settingsneeds to be updated?
please suggest ! Witing for ur reply.

Thanks
SP

sp_ctc (BOB member since 2010-08-25)

system · October 13, 2010, 5:49am

Please execute the same job again, this time with the trace options (found in the execution dialog, the trace tab) set to trace_sql_reader=yes, trace_sql_loader=yes, trace_sql_transforms=yes.

This will show all the SQL statements we execute. Then execute them manually using a query tool and check if they respond within a few ms. If they don’t there is an index missing, if they do it has to be a lock of some kind, a read lock teradata needs and Oracle does not.

I did some reading around locking in Teradata for the last 10 minutes. Yes they do have locks on reads (called LOCKING FOR ACCESS). But I cannot find a hint about row or page level locking.
http://www.coffingdw.com/sql/tdsqlutp/locking.htm

Werner Daehn (BOB member since 2004-12-17)

system · October 13, 2010, 7:42am

Hi Werner,

PLease find the details of job design…with all information

Job design — SRcC tab -->Query --> TC --> HP --> KG --> Target Tab

source table structure--------------------------
CREATE SET TABLE DUTLsb.SCD_SRC ,FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
customer_id INTEGER NOT NULL,
customer_name VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
city VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
valid_from DATE FORMAT ‘YYYY-MM-DD’)
UNIQUE PRIMARY INDEX ( customer_id );
Records in Source Table

Custid Cust_nm City Valid_from

1 Adobe software Sun City 12/20/2006
2,001 Garnier soft serv Los Angeles 12/20/2006
4,001 Requisite Online San Francisco 12/20/2006

Target table Structure

CREATE SET TABLE DUTLsb.SCD_TRG ,FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
key_id INTEGER NOT NULL,
customer_id INTEGER,
customer_name VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
city VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
valid_from DATE FORMAT ‘YYYY-MM-DD’,
valid_to DATE FORMAT ‘YYYY-MM-DD’,
current_ind CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC)
UNIQUE PRIMARY INDEX ( key_id )
INDEX ( customer_id ,customer_name ,city ,valid_from );

Records in Target Table

Key _id Custid Cust_nm City Valid_from Valid_to Cur_ind

1 2001 Garnier soft serv San Jose 1/17/2006 12/31/9999 YWhen i will run the job , the target table will populate all the 3 records from source table and finally it should have 4 records with TYPE 2 history capture.

In Table comparison I have mentioned cust_id as “Input Primary Key columns” and cust_nm,city,valid_from defined as “Compare columns”.And comparison method is “Row-by-Row select”.In HP the same list of columns mentioned for compare columns.

In target load property I have defined the Normal Load.

Please share your valuable inputs.

Thanks
SP

sp_ctc (BOB member since 2010-08-25)

system · October 18, 2010, 9:38am

Hi All,

Anyone can provide the solution for this issue.

Thanks
SP

sp_ctc (BOB member since 2010-08-25)

system · October 25, 2010, 10:15am

Thanks for all your support!

Now this issue has been resolved.

Thanks
SP

sp_ctc (BOB member since 2010-08-25)

system · October 25, 2010, 11:10am

Glad you managed it. Can you explain how?

Werner Daehn (BOB member since 2004-12-17)

system · October 25, 2010, 6:46pm

Hi All,

I m not sure whether was it locking issue or not but observe the read lock in the database table.So created view which is on the target table and used that view for the TC.Then It ran fine.

But one issue as i have already mentioned that i was using the following design
SrcTab—>Query—>TC—>HP—>KG—>TrgTab.

The issue is if i m not using KG tfm and surrogate key & flag in the target table.The records are inconsistent.After few updation in src table, the target records are generating incorrect END_DATE
But if i m using KG & Surrogate key(FLAG Not necessary) the target table records generated correctly.

So my question is shall we implement SCD type2 with the following design without surrogate key & FLAG

SrcTab—>Query—>TC—>HP—>TrgTab

Please revert back to me if you have any suggestion.

Thanks
SP

sp_ctc (BOB member since 2010-08-25)

system · October 26, 2010, 6:36am

No, your first approach is the correct one.

This is what I did:
https://wiki.sdn.sap.com/wiki/display/BOBJ/Slow+Changing+Dimension+Type+2

Werner Daehn (BOB member since 2004-12-17)

system · October 26, 2010, 10:20am

Thanks Werner for your reply.
The following design covers CDC(With initial load & delta loading),SCD Type2 with history capture.
Src Tab–>Query–>TC–>HP–>KG–>Trg Tab

But when the data volume is huge then at that point of time will the TC give us any performance issue?

Means while we ll load the delta data from the source with comparison to target table and if at that point the target table is already have 1 Billion or more no of records so will it hamper the performance?

Please suggest me regarding the same.

Thanks
SP

sp_ctc (BOB member since 2010-08-25)

system · October 26, 2010, 4:55pm

It depends on what comparison method you use in TC.

cached_comparison_table means you are loading 1bn rows - all the rows of the target - into memory. That will take a while and usually does not make sense for a CDC dataflow. Why should we cache 1bn rows just to compare 100’000 input rows with the target?

sorted_mode means that the source data is read sorted by primary key, the comparison table as well and then both are kept in sync. This works particularily well if the primary key is a value that gets higher and higher, e.g. a new sales order number is previous sales order number plus 1, and the changes happen rather at the higher end, e.g. very unlikely every day the sales order number 00000000001 is changed - more likely the recent (high) numbers are changed. Because then the TC will not read all rows for the target tables but only those with primary key >= lowest primary key from the source.

row_by_row means for each input row we execute a select * from target table where primary key = primary key. So then no overhead for caching or sorting is needed, we just issue 100’000 times - once per input row - a select * where primary key = ?. Assuming there is an index supporting the where clause this will take 10 second for the 100’000 lookups.

And if that is not fast enough, you can set the dataflow DoP to have multiple parallel instances of the TC transform.

Werner Daehn (BOB member since 2004-12-17)

system · November 17, 2011, 9:49pm

Can you please explain how exactly the problem was resolved?

bhimsat (BOB member since 2011-08-11)

system · September 12, 2012, 8:52pm

what if your DW design does not use Surrogate Keys?

Will the TC/HP work correctly without the SK?

We too are having deadlock issues (TD13.10) …so a view for the TC is the solution?

jayhawkmba (BOB member since 2012-08-29)

SCD type 2 in Teradata

insert into SCD_SRC values(1,‘Adobe software’,‘Sun City’,‘2006-12-20’); insert into SCD_SRC values(2001,‘Garnier soft serv’,‘Los Angeles’,‘2006-12-20’); insert into SCD_SRC values(4001,‘Requisite Online’,‘San Francisco’,‘2006-12-20’); My Job Design

Custid Cust_nm City Valid_from

Target table Structure

Records in Target Table

Key _id Custid Cust_nm City Valid_from Valid_to Cur_ind

insert into SCD_SRC values(1,‘Adobe software’,‘Sun City’,‘2006-12-20’);
insert into SCD_SRC values(2001,‘Garnier soft serv’,‘Los Angeles’,‘2006-12-20’);
insert into SCD_SRC values(4001,‘Requisite Online’,‘San Francisco’,‘2006-12-20’);
My Job Design