insert into SCD_SRC values(1,‘Adobe software’,‘Sun City’,‘2006-12-20’);
insert into SCD_SRC values(2001,‘Garnier soft serv’,‘Los Angeles’,‘2006-12-20’);
insert into SCD_SRC values(4001,‘Requisite Online’,‘San Francisco’,‘2006-12-20’);
My Job Design
Source Tab —> Query → TC → HP → KG → Target Tab
But when i m trying to run the job it’s stuck due to to some reason.And i have followed all the steps mentioned in the above link.But when i have tried in the oracle DB it’s running fine but not in teradata.
It is probably a lock between the updates and the reads of the target table. Should not happen in databases, but many take the shortcut to apply locks for reads as well to make sure the reads are consistent.
So either you use TableComparison in cached mode or you switch the datastore via the session settings to uncommitted reads - however that’s done in Teradata.
Make sure the fields listed in the TC’s input-primary-keys area are indexed. The transform will execute a “select … from comparison table where key1=X and key2=Y” with key columns being these input-primary-keys. Without that index above SQL will take a while for each input row.
Make sure the target table has a primary key and the key has an index so that existing rows are found fast for updates.
I have implemented the same thing what you have mentioned but it having same issue.Please find the below job log.
b 10-12-10 05:52:37 (938160:0001) JOB: Reading job <81c7abf6_3ba0_4912_a871_3ccfa2df050e> from the repository; Server version is <12.2.2.1>; Repository version is
<12.2.2.0000>.
(12.2) 10-12-10 05:52:37 (938160:0001) JOB: Current directory of job <81c7abf6_3ba0_4912_a871_3ccfa2df050e> is </BODI/dev/DS/dataservices/bin>.
(12.2) 10-12-10 05:52:37 (938160:0001) JOB: Starting job on job server host , port <3555>.
(12.2) 10-12-10 05:52:38 (938160:0001) JOB: Job <SCD_Chk_Jb> of runid <20101012055237938160001> is initiated by user .
(12.2) 10-12-10 05:52:38 (938160:0001) JOB: Processing job <SCD_Chk_Jb>.
(12.2) 10-12-10 05:52:38 (938160:0001) JOB: Optimizing job <SCD_Chk_Jb>.
(12.2) 10-12-10 05:52:38 (938160:0001) JOB: Job <SCD_Chk_Jb> is started.
(12.2) 10-12-10 05:52:39 (999540:0001) DATAFLOW: Process to execute data flow <Copy_1_SCD_Check> is started.
(12.2) 10-12-10 05:52:40 (999540:0001) DATAFLOW: The specified locale <eng_us.iso-8859-1> has been coerced to <eng_us.cp1252> for data flow <Copy_1_SCD_Check> because all
datastores obtain data in same locale <eng_us.cp1252>.
(12.2) 10-12-10 05:52:40 (999540:0001) DATAFLOW: Data flow <Copy_1_SCD_Check> is started.
(12.2) 10-12-10 05:52:40 (999540:0001) DATAFLOW: Cache statistics determined that data flow <Copy_1_SCD_Check> uses <0> caches with a total size of <0> bytes. This is less
than(or equal to) the virtual memory <3757047808> bytes available for caches. Statistics is switching the cache type to IN
MEMORY.
(12.2) 10-12-10 05:52:40 (999540:0001) DATAFLOW: Data flow <Copy_1_SCD_Check> using IN MEMORY Cache.[/b]
The job hangs for long time in the above log information.
But i have implemented the same thing in oracle DB , it took 1 to 2 sec to load the data in the target table.
Is there any issue with the teradata settingsneeds to be updated?
please suggest ! Witing for ur reply.
Please execute the same job again, this time with the trace options (found in the execution dialog, the trace tab) set to trace_sql_reader=yes, trace_sql_loader=yes, trace_sql_transforms=yes.
This will show all the SQL statements we execute. Then execute them manually using a query tool and check if they respond within a few ms. If they don’t there is an index missing, if they do it has to be a lock of some kind, a read lock teradata needs and Oracle does not.
I did some reading around locking in Teradata for the last 10 minutes. Yes they do have locks on reads (called LOCKING FOR ACCESS). But I cannot find a hint about row or page level locking. http://www.coffingdw.com/sql/tdsqlutp/locking.htm
PLease find the details of job design…with all information
Job design — SRcC tab -->Query --> TC --> HP --> KG --> Target Tab
source table structure--------------------------
CREATE SET TABLE DUTLsb.SCD_SRC ,FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
customer_id INTEGER NOT NULL,
customer_name VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
city VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
valid_from DATE FORMAT ‘YYYY-MM-DD’)
UNIQUE PRIMARY INDEX ( customer_id );
Records in Source Table
Custid Cust_nm City Valid_from
1 Adobe software Sun City 12/20/2006
2,001 Garnier soft serv Los Angeles 12/20/2006
4,001 Requisite Online San Francisco 12/20/2006
Target table Structure
CREATE SET TABLE DUTLsb.SCD_TRG ,FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
key_id INTEGER NOT NULL,
customer_id INTEGER,
customer_name VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
city VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
valid_from DATE FORMAT ‘YYYY-MM-DD’,
valid_to DATE FORMAT ‘YYYY-MM-DD’,
current_ind CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC)
UNIQUE PRIMARY INDEX ( key_id )
INDEX ( customer_id ,customer_name ,city ,valid_from );
Records in Target Table
Key _id Custid Cust_nm City Valid_from Valid_to Cur_ind
1 2001 Garnier soft serv San Jose 1/17/2006 12/31/9999 YWhen i will run the job , the target table will populate all the 3 records from source table and finally it should have 4 records with TYPE 2 history capture.
In Table comparison I have mentioned cust_id as “Input Primary Key columns” and cust_nm,city,valid_from defined as “Compare columns”.And comparison method is “Row-by-Row select”.In HP the same list of columns mentioned for compare columns.
In target load property I have defined the Normal Load.
I m not sure whether was it locking issue or not but observe the read lock in the database table.So created view which is on the target table and used that view for the TC.Then It ran fine.
But one issue as i have already mentioned that i was using the following design
SrcTab—>Query—>TC—>HP—>KG—>TrgTab.
The issue is if i m not using KG tfm and surrogate key & flag in the target table.The records are inconsistent.After few updation in src table, the target records are generating incorrect END_DATE
But if i m using KG & Surrogate key(FLAG Not necessary) the target table records generated correctly.
So my question is shall we implement SCD type2 with the following design without surrogate key & FLAG
SrcTab—>Query—>TC—>HP—>TrgTab
Please revert back to me if you have any suggestion.
Thanks Werner for your reply.
The following design covers CDC(With initial load & delta loading),SCD Type2 with history capture.
Src Tab–>Query–>TC–>HP–>KG–>Trg Tab
But when the data volume is huge then at that point of time will the TC give us any performance issue?
Means while we ll load the delta data from the source with comparison to target table and if at that point the target table is already have 1 Billion or more no of records so will it hamper the performance?
It depends on what comparison method you use in TC.
cached_comparison_table means you are loading 1bn rows - all the rows of the target - into memory. That will take a while and usually does not make sense for a CDC dataflow. Why should we cache 1bn rows just to compare 100’000 input rows with the target?
sorted_mode means that the source data is read sorted by primary key, the comparison table as well and then both are kept in sync. This works particularily well if the primary key is a value that gets higher and higher, e.g. a new sales order number is previous sales order number plus 1, and the changes happen rather at the higher end, e.g. very unlikely every day the sales order number 00000000001 is changed - more likely the recent (high) numbers are changed. Because then the TC will not read all rows for the target tables but only those with primary key >= lowest primary key from the source.
row_by_row means for each input row we execute a select * from target table where primary key = primary key. So then no overhead for caching or sorting is needed, we just issue 100’000 times - once per input row - a select * where primary key = ?. Assuming there is an index supporting the where clause this will take 10 second for the 100’000 lookups.
And if that is not fast enough, you can set the dataflow DoP to have multiple parallel instances of the TC transform.