The requestor data was a challenge because we had a feature in our system where users could more or less update the requestor catalog on the fly.  And it happened to create duplicates because there weren’t any enforced data entry standards.  There was one field for the name, and users could enter a first name first, last name first, just initials, however.… With the Pervasive tool, we’ve been able to find these duplicates and clean them up."

Laurie Green
Director of Product Management
FAMIS Software

Mike Hoskins on Data MatchMerge

How to handle tough duplicate data, fuzzy matching, foreign languages, unicode (3:48)



Improving Data Quality

Easily Identify, Merge and Purge Duplicate Data

Click to enlarge

John Smith, Johnny Smith, Jonathan Smith, Johnnie Smith?

Will the real John Smith PLEASE step forward? Only in game shows. So how do we determine they are one in the same? Address? E-mail address? Phone number? What if John Smith bought Jonathan Smith's home?  Then what?  That's where Pervasive steps in.

Pervasive Data MatchMerge™ provides a comprehensive solution for inaccurate, inconsistent and duplicate data. Built to compliment Pervasive Data Integrator™, Pervasive Data MatchMerge adds additional advanced, tunable algorithms to identify potential duplicate data, and an easy-to-use user interface to make de-duping a breeze. Confidence scores can be set up for automated record merging as well.

Pervasive Data MatchMerge can easily incorporate third-party data cleansing and standardization applications through standard APIs. This powerful combination yields an intuitive, highly accurate and extremely fast solution for identifying and resolving duplicate data problems.

Comprehensive Match & De-duping Solution:

  • High-performance and fast response times that will scale for large data sets
  • Ability to match on any combination of fields in data sets
  • Multiple algorithms and encoding methods for better matching results
  • Easy property configuration management to fine-tune matching parameters
  • Intuitive and easy-to-use user interface for quick resolution
  • Output includesthe merged and matched records in clusters with correlation IDs to facilitate downstream processing
  • Straightforward extensibility options for third-party components
  • Matching engine can be automated with Pervasive Data Integrator™'s API
  • Easy execution of Pervasive Data Integrator upstream and downstream processes

Benefits of a robust Data MatchMerge Solution

Pervasive Data MatchMerge allows you to identify data redundancies that cause inaccuracies in reporting and analytics. It can be used for identity resolution to reduce costly overruns for items such as mailing campaigns and other marketing efforts.

Identity resolution, address cleansing and matching on other key data points can help to combat fraud by correlating like records for easier analysis. The solution can quickly incorporate third-party data cleansing and standardization applications by using standards APIs.

The tunable fuzzy matching algorithms allow analysts to quickly iterate through multiple combinations of matching and scoring properties to achieve as great a degree of accuracy as your organization needs.

Business cases in which data matching can help:

  • Marketing Campaign Management
  • Fraud Detection and Loss Prevention
  • Corporate Acquisition and Mergers
  • Business Intelligence

Algorithms & Encodings

Data fields may be compared using any of the following leading-edge algorithms for fuzzy logic matching:

  • Levenshtein Edit Distance
  • Jaro
  • Jaro-Winkler
  • Jaro-Hef
  • Damerau-Levenshtein
  • Q-gram
  • Positional Q-gram
  • Shorthand
  • Exact match

The following phonetic encoding methods are also used:

  • Soundex
  • Refined Soundex
  • Metaphone
  • Double Metaphone
  • Substring

The runtime engine supports:

  • Windows
  • Linux