MD5 Hash Function

Hi all,

I have already searched and found a reply on the topic of creating MD5 hashs … but it wasn’t quite what I wanted.

I’m working on a DW build where there is a requirement to create MD5 hash of keys, and separately staging data in order to speed up the Table Compare process in the load. The TC is very poorly performing without hashs.

I have originally solved the problem using the python functionality exposed in Data Services DQ components (User-Defined). It worked fine, and very fast, but I was then told that the licence key in use was only a temp one; when the real one turned up, it disabled all my custom md5 functions :flush:

I wondered if anyone:

  • Knows how I can continue to use the python code … or
  • Has written a DI custom function to generate an MD5 hash … or
  • Knows how I can implement a new piece of Java/whatever code as a DI transform

I have also written a small generator in VB.NET, and it works but the performance is rubbish.

Any suggestions are very welcome!

Regards, Spencer


easte72106 (BOB member since 2006-02-13)

We have an hash function but do not expose it to the outside. But before we check options here, may I ask why the Table Comparison transform is slow? Depending on its mode I see throughput of up to 80’000rows per second and thread.

What is the setting you use for TC?

Cache

row_by_row

sorted


Werner Daehn :de: (BOB member since 2004-12-17)

Hi Werner,

TC is (currently) using Sorted Input. Given the size of the target table, I’ve tried all the options!

Input table has 4 key fields and about 65 compare fields (about 67 million rows)
Comparison table is about 700mil rows, and is properly indexed. I can generate an ATL tomorrow and post it to this thread.

Cheers, Spencer


easte72106 (BOB member since 2006-02-13)

We did a major improvement on sorted input in 12.1. Are you on that version by any chance?

The comparison is not the issue it is finding the matching row. So if you try the MD5 function route, I would state you change anything except making it even slower. Would be interestng to see what you did in python though.

A completely different approach could be to build a very simple dataflow like

source -> query -> target

and set the target to autocorrect load. Goal should be to get a full pushdown into an Oracle Merge statement.

btw, what database are we talking about? ATL would certainly help.


Werner Daehn :de: (BOB member since 2004-12-17)

Sorry … took me a while to get back onto this :slight_smile:

Yes, customer is on 12.1 …

Re Python, I’ve attached an ATL for you to look at; this is not the problem table for me since it’s reasonably small, and is at the import step where I generate the MD5 (prior to ODS load)

I will give the simpler flow a try and let you know the results …

Database is SQL2008

Regards, Spencer
MD5.zip (4.0 KB)


easte72106 (BOB member since 2006-02-13)

With SQL Server we do not yet support a SQL Merge statement.


Werner Daehn :de: (BOB member since 2004-12-17)

I’d love to see an MD5 hash function added as an enhancement request for the next version of Data Services… :smiley:

Cheers


josh.fletcher :australia: (BOB member since 2007-01-08)

ADAPT01186400 filed


Werner Daehn :de: (BOB member since 2004-12-17)

Hi Werner, I know it’s an old post but I’m looking for a hash MD5 function in DS 4.0

Can you please help me :blush:

Thanks


Abadir :canada: (BOB member since 2013-02-05)

hi Werner

I am also looking for this functionality, do you know if this has been enabled. If so it is available in which version ?

We are on 4.2 SP5.

Poonam


phemrajani (BOB member since 2012-06-02)

FYI, I had a recent conversation with the Director of EIM Product Management and this Adapt was closed with no action taken. As far as he was aware there is no current initiative to provide MD5 capability in Data Services. If this is something that has community support I would recommend putting it into Idea Place so it can get some exposure and feedback.

Ernie Phelps
ASUG DM SIG Chair


eepjr24 :us: (BOB member since 2005-09-16)

We too are in need of MD5/hash functionality.

Mitch


mitchg00 (BOB member since 2008-03-17)