Match Logic

system · July 17, 2013, 1:31pm

Hey guys,

I have a job where I have 2 files A and B. I want to use the Match transform to remove rows in file A that are also in file B. In other words, I want to purge file A of any file B records. So I merged the two files and setup file A’s priority to 2 and files B priority to 1. Then I run it through a match transform and a case transform. My case logic looks something like this:


NameAddress_MatchBatch.NameAddr_Individual_GROUP_NUMBER IS NULL OR (NameAddress_MatchBatch.PRIORITY = 2 AND NameAddress_MatchBatch.NameAddr_Individual_IndGroupStats_GROUP_RANK = 'M')

Anything that is true for that statement I consider “GOOD” data, and everything else I put into a “PURGED” table so I can see what got purged out. When I’m looking at the “PURGED” table, I see that a lot of file B records are matching with itself, and I’m worried that records in file A are doing the same thing. I don’t want to “de-dupe” file A. Records having the same name and address are fine. I only want to purge file A of all file B records. So what else can I do to make sure de-duping won’t happen, or am I already OK with what I have?

Thanks,
Daniel Hosler

DanHosler (BOB member since 2013-06-19)

system · July 17, 2013, 9:02pm

add input sources, with a mapping DSID values promotion/suppression.

you’ll have to tell it in a query which is which (or you can just hard code your key in the source).

then in post match processing, add the suppress flag.

this will also allow you to add in a compare table to ignore source to source matching.

the match itself doesnt do any deduping whatsoever, but depending on the results of the match you may make that determination.

jlynn73 (BOB member since 2009-10-27)

system · July 17, 2013, 9:31pm

Cool thanks. In the Input Sources there is a source type field with NORMAL, SUPPRESS, and SPECIAL options. Could you tell me what those mean?

Edit: I’m going to guess that I should have my purge file (file B) be a SUPPRESS type and use the suppress match flag. Or am I doing it wrong?

DanHosler (BOB member since 2013-06-19)

system · July 18, 2013, 1:27pm

yes. If you have a suppression/pander file, set that as suppress.

in the qualification table you tell it not to compare suppression to suppression or promotion to promotion. Since you dont care about identifying intra matches. You need a promotion to suppression and suppression to promotion. (your priority should stick your supps at the bottom of your break groups)

make sure you output the suppress flag, and check the output.

If you’re suppressing dead/dma … its going to take a bit more work.

jlynn73 (BOB member since 2009-10-27)

system · July 18, 2013, 2:23pm

Where is this qualification table? Right now I’m using this logic to grab my “good” records:

Address_MatchBatch.Addr_Address_GROUP_NUMBER IS NULL OR (Address_MatchBatch.PRIORITY = 2 AND (Address_MatchBatch.Addr_Address_OUTPUT_FLAG_SELECT_RECORD = 'Y' OR Address_MatchBatch.Addr_Address_OUTPUT_FLAG2_SELECT_RECORD = 'Y'))

where OUTPUT_FLAG is looking for single source masters and OUTPUT_FLAG2 is looking for single source subordinates. I think I’m successfully not de-duping file A now.

Right now I’m trying to figure out why SAP DS 4.1 is purging out more records (about 20,000 more) than the legacy tools. Does DS 4.1 match better than the legacy tools, or is it more likely that I still don’t know what I’m doing?

DanHosler (BOB member since 2013-06-19)

system · July 18, 2013, 8:51pm

sorry they changed it to compare table, under tab match level1, you can add in the options for compare table. There you can assign your sources.

on output there should be a column added when you enable the suppress flag. Default would be Set1_Level1_Suppress_SELECT_RECORD. You should be able to select your 2’s with a suppress flag of N.

as far as why you be getting more matches? If this is your first match with R4.1 then you probably would want to kick out a match all file (old + new) and compare record to record and see how they coded different.

if you havent started working on the dictionary/parsing rules … I’ll let that one be a surprise.

jlynn73 (BOB member since 2009-10-27)

system · July 18, 2013, 9:16pm

Yeah that’s what I’m currently doing. Thanks for all those tips.

DanHosler (BOB member since 2013-06-19)

system · September 27, 2013, 2:11pm

Hello Experts,

I have a similar requirement. But with little twist and turns

we need to perform de-duplication on source with below combinations

First Name, Last Name and Address
First Name and email
First Name, Last Name and Mobile Number

I have to also add Target rows to above to know if the row is already present in our system. Once done, update or insert rows to target depending on the match score.

Can’t really understand suppression/promotion from the Technical manuals.

appreciate any help

Thanks

BODSDW (BOB member since 2011-01-19)

system · September 27, 2013, 3:24pm

It doesnt really sound like you have any suppression sources.

All of the post match flags can be gotten by interrogating your dupe group sources post match. It just makes it a lot easier if you can use the built in flags. We have certain jobs that use some pretty complex key to key “group coding” which the match just cant accommodate (requiring custom post match code).

since you’re talking about using multiple matches, you’re probably going to want to use the associative match transform. Its extremely easy to set up.

jlynn73 (BOB member since 2009-10-27)

system · September 27, 2013, 7:29pm

Here is what I’m currently using to decide if the record is considered a ‘purge’ or a ‘dupe’:

Purge:


Single_Source_Subordinate='N' AND Group_Rank='S'

Dupe:


Single_Source_Subordinate='Y' AND Group_Rank='S'

This assumes that you have input sources defined.

DanHosler (BOB member since 2013-06-19)

system · October 3, 2013, 2:08pm

Thank You. Hope it is really that easy to set up

BODSDW (BOB member since 2011-01-19)