A Quick Look at Fuzzy Matching Programming Techniques Using SASĀ® Software

Kirk Paul Lafler1 and Stephen Sloan2
1Software Intelligence Corporation, 2Accenture


Abstract

Data comes in all forms, shapes, sizes and complexities. Stored in files and data sets, SASĀ® users across industries know all too well that data can be, and often is, problematic and plagued with a variety of issues. When unique and reliable identifiers, referred to as the key, are available, users routinely are able to match records from two or more data sets using merge, join, and/or hash programming techniques without problem. But, when a unique and reliable identifier is not available, or does not exist, then one or more fuzzy matching programming techniques must be used. Topics include introducing what fuzzy matching is along with examples of the SOUNDEX (for phonetic matching) algorithm, and the SPEDIS, COMPLEV, and COMPGED functions to resolve key identifier issues and to successfully merge, join and match less than perfect or messy data.