Edit - Eileen moved to General Discussion]
This is way off topic for BusinessObjects but is related to data warehousing. Anybody have an example of PL/SQL code that calculates a CRC? I have some C++ code but was wondering if anyone tried coding a PL/SQL function or database trigger.
I moved this over to General Discussion. A lot of people never look at the true “Off Topic” stuff! So even though this is a little off the beaten path, it is certainly Business Objects related…
And just cause I don’t know…What is a CRC calculation?
Cyclic Redundancy Checksum. A CRC can be used to compare rows that have many columns and see if there is a difference. If there is a difference in the rows, a further comparison of each of the column values can be done. CRC can help speed up a load process for wide dimensions by allowing you to pull rows that need type 1,2 or 3 change.
For example we have a dimension with about 80 columns and it can take a lot of processing to compare each of the 80 columns between each row. If you just compare CRC values, you can quickly ignore the rows that match (have no change) and concentrate on the rows the are different between the last load and the current load.
I wonder whether you could make use of DBMS_UTILITY.GET_HASH_VALUE() to do this, with a concatanation of the columns being the input. Here’s the 8i documentation …
GET_HASH_VALUE Function
This function computes a hash value for the given string.
Syntax
DBMS_UTILITY.GET_HASH_VALUE (
name VARCHAR2,
base NUMBER,
hash_size NUMBER)
RETURN NUMBER;
Parameters
Pragmas
pragma restrict_references(get_hash_value, WNDS, RNDS, WNPS, RNPS);
Returns
A hash value based on the input string.
For example, to get a hash value on a string
where the hash value should be between 1000 and 3047, use 1000 as the base value
and 2048 as the hash_size value. Using a power of 2 for the hash_size
parameter works best.
Table 5620 GET_HASH_VALUE Function Parameters
Parameter Description
name String to be hashed.
base Base value for the returned hash value to start at.
hash_size Desired size of the hash table.
One point to note is that Oracle keep changing the hash algorithm, so if you upgrade a version of Oracle you might like to prepare a script to update all of your hash values before running your load.
… and that also means that if you source your data from Oracle 8.1.7, and load to 9.1, then you should make sure you are not calculating your hash value for the target in the source database – the two would not match
We are going to source data from an OLTP Oracle database. We will do this on a daily basis, keeping backup copies of the mirror tables for each day and eventally/possibily export really old backup mirror tables periodically.
According to Kimball design tips, the mirror table (in our Oracle database) will have an extra column called CRC. At the time we pull data from the source system (possibly using a database link and materialize view with a CRC pl/sql function call), the CRC will be calculated at that time (using our database version of get_hash_value). So we never have to worry about the source system database version. Only when we upgrade our database.
The tricky thing for us to think about in the load design is when we want to do data cleansing. Will we do CRC calculation before or after copying from source? Or calculate CRC when copying from source AND recalculate CRC after data cleansing? We still thinking about this design and “when” the CRC will be calculated. It depends on further data analysis and how clean the data will actually be coming into our warehouse.
No matter what, we are going to do CRC, so to make it as flexible as possible, we may make or use a generic function like GET_HASH_VALUE and call it from table triggers and or materialzed views (if that’s possible). I’ll be looking to see if there are limitations to the GET_HASH_VALUE and other features.
Anyway, thanks again for your help. GET_HASH_VALUE sounds like the way to go rather than recoding something from scratch.