Depending on source and what your expectations are, you may be in for some work … if you have to start monkeying with the cleansing package. (the out of box cp has some issues)
Do you have any other data you can use to match on ? (address/phone/email/shoe size)
I dont know what you’re doing, but for me to just match on a name doesnt usually buy me much. It ends up being just a fancy way to sort your data which you’re still going to have to look at.
So from a high level, what is it you’re trying to accomplish ?
I have two tables Table1 contains those you are travelling and Table2 contains all the people in a certain department, which include those travelling, not-travelling, dependents etc. Both the tables are in separate database managed by two different depts, hence they have different person_ids. When i say managed by two depts, the details are keyed in by two different set of users, so the names and details entered not always match in both the tables. Now i have a requirement to join both these tables to eliminate people based on a certain code. Since there are no unique keys (person_ids are different), i have to join these tables using their firstname and lastname. In some cases like the examples given in the first post they dont match because of the obvious reasons explained above. But i need to extract the correct record from both the tables in the output with those names. Right now i am doing an outer join and getting names that have issues, but this doesn’t solve the problem permenantly. I am looking for cleaning the data using some logic and match the names.
Other data which i can use to match could be date of birth (in some cases they are null in the table) or SSN (some cases null means not entered).
You would want to do a merge of the two tables. Depending on the size of these tables, you’re going to need a break key for the match. Could be first byte of fn/ln. Then feed it into a name match.
The more data you have for a match the better, and it doesnt always have to be there. (doesnt hurt anything). In the end… someone is going to have to sift through the results, which is what I try to steer clear of.
Im surprised you have a license for a match but no data cleanse.
There is a plethora of Python modules out there to help perform some basic fuzzy matching. You would still need the logic to apply it across the two sets of data, and analyze the results.
You’re not going to find a simple cut & paste solution to this one.
I have to figure out a way to do it, python seems to be much more complicated to implement in DI than I thought. May be I am wrong.
One question - for Match Transform do I still need a directory/rule dictionary to work on? I do not have any directory/rule set up in my environment, so basically I am looking at some thing to work on like Match Transform which don’t need any external package.
I see the DQM enabled in my environment but do not have any USPS address license or name directory etc.
You should not need a cleansing package to run a match.
Your results may not be sufficient to satisfy your business requirements, which are often difficult to nail down.
In situations like this, it seems like anything you can provide for them is going to be better than what they had before. We have some matching that we do which only matches at a 50-60% match rate, but still saves our staff hundreds of hours of manual lookups.