iFuice — information Fusion utilizing instance correspondences and peer mappings
iFuice is a new approach to information fusion of web data which is developed in our group since 2005. It is instance-driven and can utilize peer mappings (e.g., instance corresondences) between independent data sources. Such correspondences are already available between many sources, e.g. in the form of web links and thus support high quality data fusion. Arbitrary sources can be incorporated into iFuice by merely specifying the available data object types and defining a mapping of such a type to one of another data source. By interconnecting mappings between object types from various data sources the information space can be accessed and queried via a script language or adaptively explored in an interactive session. Powerful generic, declarative operators are available to execute and manipulate mappings and their results, e.g. for result fusion (aggregation). Mappings and operators are executable on sets of objects and highly composable thereby supporting a powerful aggregation of information over several sources. Script programs implement data integration workflows which are more flexible than the mere use of queries as in query mediators or federated databases. An extension of iFuice is being developed for mashup-like data integration.
Source and mapping semantics are reflected in a domain model which is at a higher abstraction (ontological) level than a global schema and easier to construct. The domain model conaists of object types (e.g., author, publication) and mapping types (e.g., AuthorOfPublication). Available sources and mappings of the various types of the domain model are reflected in a source mapping model. While physical data sources (PDS) refer to real-world sources, e.g. DBLP, each logical data source (LDS) refers to a particular object type of the domain model. So-called same-mappings interconnect LDS of the same type and associate corresponing isntances which may thus be aggregated to fuse their information. The mappings of the source-mapping graph are executable, e.g. implemented by a query or web service. iFuice allows for explorative data fusion by browsing along these mappings. The execution of several mappings and manipulation of their results can be specified within scripts to allow repeated executions for different input objects.
Use case
iFuice has been used in different domains, in particular in bioinformatics (BioFuice) and for
citation analysis.
For example, we may want to have a script determining for a given conference X its most frequently referenced papers, let’s say to determine candidates for a 10 year best paper award. An iFuice representation of such a script could be as follows.
$SIGMODPubs := queryTraverse (LDS=DBLP.Conf, {Name="SIGMOD 1995"}, DBLPConfPubs)
$CombinedConfPub := aggregateSame ($SIGMODPubs, GoogleScholar)
$CleanedPubs := fuseAttributes ($CombinedConfPub)
$Result := sort ($CleanedPubs, "NoOfCitings")
Informally, it locates conference X in DBLP, executes the PubConf mapping to get all publications of that conference, uses the same-mapping to Google Scholar to get the corresponding publications together with an attribute indicating the number of citations, sorting the publications on the number of citations, and returning the top-most publications. The example shows that mappings need to be executable on a set of input objects and return a set of output objects.
Publications