Entity Resolution & MDM: Interchangeable?
After reading a recent post from Henrik Sørensen, Entity Revolution vs. Entity Evolution, I was a little confused and perhaps it was the way Entity Resolution and Master Data Management concepts were mixed.
The premise of the post is that not all organizations and business processes require full Entity Resolution. While the premise is right, we need to break it down a little bit to explain it.
Gartner released a report in November entitled, "Top 10 Technology Trends Impacting Information Infrastructure, 2011." Two of the top ten trends were "Entity Resolution and Analysis" and "Master Data Management." They were listed separately with the following definitions:
Master Data Management
"MDM is a technology-enabled business discipline in which business and IT staff work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of an enterprise's official shared master data assets. MDM is a critical data-sharing program that enables other business and IT investments and programs, which depend on master data quality and consistency, to yield the expected benefits to the business."
Entity Resolution and Analysis
"Entity resolution and analysis (ER&A) is the capability to resolve multiple labels for individuals, products or other "noun classes" of data into a single resolved entity when pseudonyms, aliases or other synonymic references exist. Multiple references may result from intentional or unintentional duplication of information in a company's internal systems, due to years of siloed development, mergers and acquisitions, and other normal business activities. It is intentional falsification of information and identities, however, that has brought this previously obscure technology to the fore. It is used to identify the use of false identities and networks of individuals who are trying to hide their relationships to each other. The same technologies or analyses are used in the detection of fraud networks, racketeering and money-laundering."
So in the first case you have an accurate, common view of data assets and in the second case you are looking for people who are trying to hide within your data. Those really are two different use cases and quite frankly require a different technology approach. In the first case you are trying to answer "what is common?" while in the second case you are trying to answer, "What could be happening?"
In the first (MDM) case, you have business processes and systems that depend on a "single view" of an asset. Generally, you are looking at your data, you're pretty confident you have the same person or thing represented differently in multiple areas and you know your business processes will function better if you can take those multiple representations and make them one.
You have to be very confident that one representation is accurate and changing your mind on a representation could have negative consequences. Of course, it greatly benefits you if the one representation you create is based on who is asking (see Uniqueness is in the Eye of the Beholder).
In the case of entity resolution and analysis, you have a clue, or an indication something could be wrong, and you need to go figure it out. Generally, you are looking at your data, and even if you think you are already doing a good job mastering it, you're pretty sure you have entities that shouldn't be there.
You may pull in other data from outside your organization to help figure it out. You need to use techniques that are different, that are focused on finding weak links, and can tell you when suspicious things are found.
You aren't looking just for the confident answers but are looking for the "I'm pretty sure but not entirely sure" answers. Those are the ones you focus on. Having the ability to change your mind on a resolution is required.
So if you accept these definitions, then I do agree that organizations who are mastering their enterprise data may not need entity resolution. Conversely, organizations that want entity resolution may not need to master their data. If the delineation makes sense the next logical question for us to tackle is, When might an organization need both? I’d love to hear your thoughts and ideas on that answer.
4 Responses »
Trackbacks
- Tweets that mention Entity Resolution & MDM: Interchangeable? | Mastering Data Management -- Topsy.com
- What is Identity Resolution? « Liliendahl on Data Quality
Leave a Response








Entries(RSS)
Jeff, thanks a lot for following up on the subject.
It’s a good question: Do we have to differentiate between master data management and entity resolution. I sense a close relation to the two ways of having good data quality: Either your data are fit for the purpose of use or they reflect the real world. Or both. Probably best – now or later – if both.
Jeff - great post!
When prospects and customers ask me about the difference between MDM and Entity Resolution I usually provide a high level explanation that includes MDM being more focused on a common and customizable data "view" depending on the consumer (finance analyst versus executive versus Intel analyst, etc.) with the capability to CRUD (Create, Read, Update, Delete) both consuming and contributing source records as required. I go on to explain that entity resolution is the piece that disambiguates and "collapses" duplicate and overlapping records on which the common view is often times based.
As you know, the reason that entity resolution is such a strong component of MDM is that we must ensure that common data views, reports, golden records, etc. are clean and accurate and include all relevant records across the enterprise that may, on the surface, appear to be a different entity but in fact are the same (perhaps an entity with an address attribute in source A is populated with an old address versus the same entity with an address attribute in Source B that is up to date).
Many data quality issues can also be overcome via entity resolution. Probabilistic algorithms for example could score "John Macdonald" fairly high when compared to "Jonathan McDonald" depending on the "edit distance" calculations where exact matching would say these are not the same entity regardless of the fact that perhaps the spelling deviations are a result of a data entry error. When applied correctly, entity resolution could tell you if these entities are the same by looking at other attributes associated with these two names and by comparing commonalities and deviations to develop a score to justify a decision on whether or not "Jon Macdonald" and "Jonathan McDonald" are in fact the same or different entities.
So, I would argue that most organizations need both when you look at what a solid data strategy should be long-term. I would also argue that organizations often start with entity resolution and analysis in hopes of discovering what their data looks like, de-duping and disambiguating their data, etc. and eventually transition to MDM (with entity resolution still running under the covers) as they start to develop a need for common views and a need (based on trust) to synchronize contributing and consuming sources with the ground truth.
I look forward to hearing others comment as well.