German English

Object Matching (Entity Resolution)

Object Matching (Entity resolution) is a critical data integration task and aims at identifying semantically corresponding objects (records, instances) in one or several data sources. A typical example is the redundant and heterogeneous representation of customers in different enterprise databases. Finding corresponding customer representations is a key task, e.g., for customer relationship management or master data management, in general. On the web, finding matching ovjects is typically even more challenging due to the higher degrees of heterogenity (less structured data with many text attributes, more sources, more data quality problems etc.).

MOMA, STEM, FEVER

We are developing comprehensive prototypes for object matching since 2006. A key idea is to support the combination of several match techniques (matchers) to improve the overall effectiveness in terms of precision and recall. The first prototype MOMA supports the construction of flexible workflows for object matching and the reuse of previous match results which are represented as instance mappings. Furthermore, MOMA not only uses the similarity of attribute values but also incorporates a powerful context matcher called neighborhood matcher.

The more recent frameworks STEM and FEVER support blocking and matching as well as the use of machine learning techniques. The machine learning approaches utilize a limited amount of training data (manually labeled correspondences) to semi-automatically find effective combinations of matchers. FEVER also supports the comparative evaluation of different match approaches for a given match task.

Specific entity resolution approaches have been developed to categorize and match product offers and product descriptions of web shops.

Benchmarking existing entity resolution approaches

In a VLDB 2010 paper we have used FEVER to comparatively evaluate existing entity resolution implementations. The datasets used in the evaluation can be downloaded here.

Please see also our recent work on cloud-based entity resolution and load balancing.

Project Members

Publications

PDF

Google Scholar
Kolb, L.; Thor, A.; Rahm, E.
Don't Match Twice: Redundancy-free Similarity Computation with MapReduce
Proc. 2nd Intl. Workshop on Data Analytics in the Cloud (DanaC), 2013
2013-06
PDF

Google Scholar
Ngonga Ngomo A.-C.; Kolb, L.; Heino, N.; Hartung, M.; Auer, S.; Rahm, E.
When to Reach for the Cloud: Using Parallel Hardware for Link Discovery
Proc. 10th Intl. Extended Semantic Web Conference (ESWC), 2013
2013-05
PDF
further information
Google Scholar
Kolb, L.; Rahm, E.
Parallel Entity Resolution with Dedoop
Datenbank-Spektrum 13 (1), 2013
2013-02-23
PDF

Google Scholar
Kolb, L.; Thor, A.; Rahm, E.
Dedoop: Efficient Deduplication with Hadoop
Proc. 38th Intl. Conference on Very Large Databases (VLDB) / Proc. of the VLDB Endowment 5(12), 2012
2012-08
PDF

Google Scholar
Kolb, L.; Thor, A.; Rahm, E.
Load Balancing for MapReduce-based Entity Resolution
Proc. 28th Intl. Conference on Data Engineering (ICDE), 2012
2012-04
PDF

Google Scholar
Köpcke, H.; Thor, A.; Thomas, S.; Rahm, E.
Tailoring entity resolution for matching product offers
Proc. 15th Intl. Conf. on Extending Database Technology (EDBT), 2012, pp. 545-550
2012-03
PDF
further information
Google Scholar
Kolb, L.; Thor, A.; Rahm, E.
Multi-pass Sorted Neighborhood Blocking with MapReduce
Computer Science - Research and Development 27(1), 2012
2012-02
PDF

Google Scholar
publication iconKolb, L.; Köpcke, H.; Thor, A.; Rahm, E.
Learning-based Entity Resolution with MapReduce
Proc. 3rd Intl. Workshop on Cloud Data Management (CloudDB), 2011
2011-10
PDF

Google Scholar
Kolb, L; Thor, A.; Rahm, E.
Block-based Load Balancing for Entity Resolution with MapReduce
Proc. 20th Intl. Conference on Information and Knowledge Management (CIKM), 2011
2011-10
PDF

Google Scholar
Kolb, L.; Thor, A.; Rahm, E.
Parallel Sorted Neighborhood Blocking with MapReduce
Proc. 14th GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2011
2011-03
PDF

Google Scholar
Köpcke, H.; Thor, A.; Rahm, E.
Evaluation of entity resolution approaches on real-world match problems
Proc. 36th Intl. Conference on Very Large Databases (VLDB) / Proceedings of the VLDB Endowment 3(1), 2010
2010-09
PDF

Google Scholar
Kirsten, T.; Kolb, L.; Hartung, M.; Groß, A.; Köpcke, H.; Rahm, E.
Data Partitioning for Parallel Entity Matching
Proc. 8th Intl. Workshop on Quality in Databases (QDB), 2010
2010-09
PDF

Google Scholar
Thor, A.
Toward an adaptive String Similarity Measure for Matching Product Offers
Proc. GI-Workshop - Informationsintegration in Service-Architekturen, 2010
2010-09
PDF
further information
Google Scholar
Köpcke, H.; Thor, A.; Rahm, E.
Learning-based approaches for matching web data entities
IEEE Internet Computing 14(4), 2010
2010-07
PDF
further information
Google Scholar
Köpcke, H.; Rahm, E.
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
2010-01
PDF

Google Scholar
Köpcke, H.; Thor, A.; Rahm, E.
Comparative evaluation of entity resolution approaches with FEVER
Proc. 35th Intl. Conference on Very Large Databases (VLDB), 2009 (demo)
2009-08
PDF

Google Scholar
publication iconKöpcke, H.; Rahm, E.
Training Selection for Tuning Entity Matching
6th International Workshop on Quality in Databases and Management of Uncertain Data (QDB/MUD 2008)
2008-08
PDF

Google Scholar
Thor, A.; Kirsten, T.; Rahm, E.
Instance-based matching of hierarchical ontologies
Proc. of 12. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2007
2007-03
PDF

Google Scholar
Köpcke, H.; Rahm, E.
Analyse von Zitierungshäufigkeiten für die Datenbankkonferenz BTW
Datenbank-Spektrum, 7. Jahrgang, Heft 20
2007-02
PDF

Google Scholar
Thor, A.; Rahm, E.
MOMA - A Mapping-based Object Matching System
Proc. 3rd Conference on Innovative Data Systems Research (CIDR), 2007
2007-01