Description
Schema matching is a critical step in many applications,
such as XML message mapping, data warehouse
loading, and schema integration. In this paper, we
investigate algorithms for generic schema matching,
outside of any particular data model or application. We
first present a taxonomy for past solutions, showing
that a rich range of techniques is available. We then
propose a new algorithm, Cupid, that discovers mappings
between schema elements based on their names,
data types, constraints, and schema structure, using a
broader set of techniques than past approaches. Some
of our innovations are the integrated use of linguistic
and structural matching, context-dependent matching
of shared types, and a bias toward leaf structure where
much of the schema content resides. After describing
our algorithm, we present experimental results that
compare Cupid to two other schema matching systems.