COMA 3.0

Schema and Ontology Matching with COMA 3.0

This page gives an overview about the schema matching systems COMA 3.0 developed at the University of Leipzig. Please consult the relevant papers (below) for a more detailed discussion of our approach. You can obtain the Community Edition of COMA 3.0 on SourceForge.net.

Figure 1. User Interface of COMA++ (Version 2009)

Introduction

Schema and ontology matching aim at identifying semantic correspondences between metadata structures or models such as database schemas, XML message formats, and ontologies. Solving such match problems are of key importance to service interoperability and data integration in numerous application domains. The goal is to keep manual effort low.

COMA 3.0 is a schema and ontology matching tool. It extends our previous prototypes COMA and COMA++ by an enhanced workflow management and additional features like ontology merging. Furthermore, it offers a comprehensive infrastructure to solve large real-world match problems. The graphical interface offers a variety of interactions, allowing the user to influence in the match process in many ways. COMA 3.0 functionality is used within the new QuickMig prototype focussing on the generation of executable mappings for data migration.

The new COMA 3.0 also integrates the ATOM prototype for automatic schema and ontology merging.

Architecture

As depicted below, COMA 3.0 is divided in 4 modules, where the three modules Storage, Match Execution and Mapping Processing roughly follow the Input-Processing-Output pattern and the User Connection module provides different ways to access the program (front end). The Storage consists of the Importers that load schemas, ontologies, existing mappings and auxiliary information (like instance data, dictionaries etc.) in the Repository, where they are persistently stored. From the repository, these files can be directly used to carry out matching tasks.

The Match Execution is the core of COMA. It gets two schemas or ontologies as input, runs several matching algorithms and calculates the match result. In this module, the Execution Engine determines the relevant schema components for matching, applies multiple matching strategies and finally combines the partial results to the final match result. The obtained mapping can be used as input in the next iteration for further refinement. Each iteration can be individually configured, i.e., the types of components to be considered, the matchers for similarity computation, and the strategies for similarity combination. The Match Library is a large bundle of schema matching strategies that can be combined to extensive workflows. Eventually, the Configuration Engine enables to automatically or manually define schema matching workflows.

The Mapping Processing module is used to carry out further tasks once the match result is calculated. It allows to automatically enrich mappings (e.g., to detect complex correspondences), merge models (ontology merging) or transform data, either directly or by generating a query script. This module is part of the COMA 3.0 Business version.

The User Connection module consists of a full-fledged GUI, a SaaS-solution and an API. For most users, the GUI might be the most comfortable and most convenient way to use COMA.

Figure 2. System Architecture of COMA 3.0

Figure 3. Match Processing in COMA 3.0

Model Support

Using a generic data representation, COMA 3.0 uniformly supports schemas and ontologies, e.g. the powerful standard languages W3C XML Schema (XSD) and Web Ontology Language (OWL). Further formats supported by COMA 3.0 include XML Data Reduced (XDR) and relational schemas.

XSD Support: COMA 3.0 supports very large schemas that are distributed over a multitude of XSD documents and that span various namespaces.
OWL Support: COMA 3.0 currently supports matching between ontologies written in W3C OWL-Lite. OWL class hierarchies and relationship types are read in via the OWL API and mapped to the generic model representation based on directed acyclic graphs.

Matchers and Match Strategies

COMA 3.0 supports a comprehensive and extensible library of individual matchers, which can be selected to perform a match operation. Using the GUI, it is easy to construct new, more powerful, matchers by combining existing ones. Moreover, it is possible to specify match strategies as workflows of multiple match steps, allowing to divide and successively solve complex match tasks in multiple stages. Due to the flexibility to configure matchers and match strategies, COMA 3.0 cannot only be used to solve match problems, but also to comparatively evaluate the effectiveness of different match algorithms.

Using the flexible infrastructure for combining and refining match results, match processing is supported as a workflow of several match steps. We implemented specific workflows (i.e., strategies) for context-dependent, fragment-based, and reuse-oriented matching, respectively:

Context-dependent Matching. We address the problem of context-dependent matching, which is necessary for schemas with shared elements. Although required by many applications, such as transformation of XML messages, identifying context-dependent correspondences is mostly ignored by previous work. COMA 3.0 supports several strategies, which are also scalable for large schemas, to obtain context-dependent match results.
Fragment-based Matching. To cope with large schemas, COMA 3.0 implements a fragment-based match processing approach. Following the divide-and-conquer idea, it decomposes a large match problem into smaller subproblems by matching at the level of schema fragments. With the reduced problem size, we aim not only at better execution time, but also at better match quality compared to schema-level matching.
Reuse-oriented Matching. We pursue the reuse of previously determined match results. The main mechanism for our approach is a MatchCompose operation, which performs a join-like operation on a mapping path consisting of two or more mappings, such as A-B, B-C, and C-D, successively sharing a common schema, to derive a new mapping between A and D.

COMA Evolution

The COMA project exists for 10 years by now, and during this decade got gradually extended and improved. Key stations are:

2002: First release of the schema matcher COMA. COMA stands for combined matching and offers a suite of several matching strategies.

2005: Release of COMA++, which now offers ontology matching and fragment-based matching. COMA++ also comes with a GUI to allow a much more comfortable schema matching.

2008: Release of the 2008 version of COMA++, which supports instance matching. Also, a web-edition is developed, containing the prime features of COMA++.

2011: Release of COMA 3.0, which is a redesigned version of COMA++. It offers ontology merging and an enhanced workflow management.

2012: The Community Edition of COMA 3.0 becomes an Open Source Project under AGPL license.

Benchmark

In order to compare schema and ontology matchers with COMA++, a couple of mapping scenarios can be downloaded here.

What others say about COMA, COMA++

“COMA++ is a generic, composite matcher with very effective match results.” [Duchateau et al., OTM 2008]

“COMA++ is one of the best available schema matchers that enjoys from combining several available methods for schema matching” [Nezhad et al., WWW 2007]

“The best recall and the best F-measure were achieved by COMA++.” [Kappel et al., BTW workshop 2007]

“…the COMA system … was the first to clearly articulate and embody the multi-component architecture…” [Lee et al., VLDB Journal 2007]

“The most complete tool”. [Manakanatas et al., DISWEB 2006]

“COMA is the first work to address engineering issues of a schema matching system.” [Bernstein et al., Sigmod Record 2004]

“COMA with the NamePath+Leaves matcher combination is the fastest prototype in our evaluation.” [Yatskevich, Technical Report 2003]

Publications

Arnold, Patrick; Rahm, Erhard
Enriching Ontology Mappings with Semantic Relations
Data and Knowledge Engineering, Volume 93, September 2014, Pages 1–18, Best DKE paper of 2014
2014-09

Arnold, P.; Rahm, E.
Extracting Semantic Concept Relations from Wikipedia
Proc. 4th Int. Conf. Web Intelligence, Mining and Semantics (WIMS)
2014-06

Arnold, Patrick; Rahm, Erhard
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach
Proc. 17th ADBIS Conference. LNCS 8133, pp. 42-55
2013-09

Arnold, Patrick
Semantic Enrichment of Ontology Mappings: Detecting Relation Types and Complex Correspondences
25. GI-Workshop Grundlagen von Datenbanken
2013

Raunich, S.; Rahm, E.
Towards a Benchmark for Ontology Merging
Proc. 7th OTM Workshop on Enterprise Integration, Interoperability and Networking (EI2N’2012), Springer LNCS
2012-09

Massmann, Sabine; Raunich, Salvatore; Aumueller, David; Arnold, Patrick; Rahm, Erhard
Evolution of the COMA Match System
OM-2011 (The Sixth International Workshop on Ontology Matching, October 24th, 2011, Bonn, Germany)
2011-10

Algergawy, A.; Massmann, S.; Rahm, E.
A Clustering-based Approach For Large-scale Ontology Matching
Proc. ADBIS Conf. 2011, LNCS 6909, pp. 415-428
2011-09

Groß, A.; Hartung, M.; Kirsten, T.; Rahm, E.
Mapping Composition for Matching Large Life Science Ontologies
2nd International Conference on Biomedical Ontology (ICBO 2011)
2011-07

Peukert, E.; Eberius, J.; Rahm, E.
AMC – A Framework for Modelling and Comparing Matching Systems as Matching Processes
Proc. Int. Conf. on Data Engineering (Demo paper), 2011
2011-04

Raunich, Salvatore; Rahm, Erhard
ATOM: Automatic Target-driven Ontology Merging
Proc. Int. Conf. on Data Engineering (Demo paper), 2011
2011-04

Peukert, Eric; Rahm, Erhard
Restricting the Overlap of Top-N Sets in Schema Matching
Proc. EDBT workhop on New Trends in Similarity Search (NTSS 2011)
2011-03

Groß, A.; Hartung, M.; Kirsten, T.; Rahm, E.
On Matching Large Life Science Ontologies in Parallel
7th International Conference on Data Integration in the Life Sciences (DILS 2010)
2010-08

Peukert, Eric; Berthold, Henrike; Rahm, Erhard
Rewrite Techniques for Performance Optimization of Schema Matching Processes
13th International Conference on Extending Database Technology, EDBT 2010
2010-03

Massmann, S. ; Rahm, E.
Evaluating Instance-based Matching of Web Directories
11th International Workshop on the Web and Databases (WebDB 2008)
2008-06

Drumm, C.; Schmitt, M.; Do, H.-H.; Rahm, E.
QuickMig - Automatic Schema Matching for Data Migration Projects
Proc. ACM CIKM, Lisabon, Nov. 2007
2007-11

Engmann, D.; Massmann, S.
Instance Matching with COMA++
BTW 2007 Workshop: Model Management und Metadaten-Verwaltung
2007-03

Do, H.-H.; Rahm, E.
Matching Large Schemas: Approaches and Evaluation
Information Systems, Volume 32, Issue 6, September 2007, Pages 857-885
2007

Massmann, S.; Engmann, D.; Rahm, E.
COMA++: Results for the Ontology Alignment Contest OAEI 2006
International Workshop on Ontology Matching, collocated with the 5th ISWC-2006; Athens, Georgia, USA
2006-11

Do, Hai Hong
Schema Matching and Mapping-based Data Integration
Dissertation. Veröffentlich durch Verlag Dr. Müller (VDM), ISBN 3-86550-997-5,
2006

Aumueller, D.; Do, H.H.; Massmann, S.; Rahm, E.
Schema and ontology matching with COMA++
SIGMOD Conference
2005-06

Rahm, E.; Do, H.H.; Massmann, S.
Matching Large XML Schemas
Sigmod Record 33(4)
2004-12

Do, H.H.; Melnik, S.; Rahm, E.
Comparison of Schema Matching Evaluations
Proc. Workshop Web and Databases, LNCS 2593, 2003
2003

Do, H.H.; Rahm, E.
COMA - A System for Flexible Combination of Schema Matching Approaches
Proc. 28th Intl. Conference on Very Large Databases (VLDB), Hongkong, Aug. 2002
2002

Contact/Project Members

Prof. Dr. Erhard Rahm
Patrick Arnold
Hong-Hai Do
David Aumüller

» Druckversion

Abteilung Datenbanken Leipzig

Inhalte

Neue Publikationen

Links auf diese Seite