Don't Match Twice: Redundancy-free Similarity Computation with MapReduce

Kolb, L.; Thor, A.; Rahm, E.
Don't Match Twice: Redundancy-free Similarity Computation with MapReduce
Proc. 2nd Intl. Workshop on Data Analytics in the Cloud (DanaC), 2013
2013-06

Beschreibung

To improve the effectiveness of pair-wise similarity computation, state-of-the-art approaches assign objects to multiple overlapping clusters. This introduces redundant pair comparisons when similar objects share more than one cluster. We propose an approach that eliminates such redundant comparisons and that can be easily integrated into existing MapReduce implementations. We evaluate the approach on a real cloud infrastructure and show its effectiveness for all degrees of redundancy.

Keywords

MapReduce
Hadoop
Pairwise similarity computation
Redundancy
Overlapping clustering

BibTex

@inproceedings{Kolb:2013:DMT:2486767.2486768,
 author = {Kolb, Lars and Thor, Andreas and Rahm, Erhard},
 title = {{Don't Match Twice: Redundancy-free Similarity Computation with MapReduce}},
 booktitle = {Proceedings of the Second Workshop on Data Analytics in the Cloud},
 series = {DanaC '13},
 year = {2013},
 pages = {1--5},
 url = {http://doi.acm.org/10.1145/2486767.2486768}
}

Eingetragen von Lars Kolb. | 29 Oktober, 2012 - 20:03

» Druckversion

Abteilung Datenbanken Leipzig

Inhalte

Neue Publikationen

Don't Match Twice: Redundancy-free Similarity Computation with MapReduce

Beschreibung

Keywords

BibTex