German English

Don't Match Twice: Redundancy-free Similarity Computation with MapReduce

PDF

Google Scholar

Kolb, L.; Thor, A.; Rahm, E.
Don't Match Twice: Redundancy-free Similarity Computation with MapReduce
Proc. 2nd Intl. Workshop on Data Analytics in the Cloud (DanaC), 2013
2013-06

Description

To improve the effectiveness of pair-wise similarity computation, state-of-the-art approaches assign objects to multiple overlapping clusters. This introduces redundant pair comparisons when similar objects share more than one cluster. We propose an approach that eliminates such redundant comparisons and that can be easily integrated into existing MapReduce implementations. We evaluate the approach on a real cloud infrastructure and show its effectiveness for all degrees of redundancy.


Presentation


Keywords

  • MapReduce
  • Hadoop
  • Pairwise similarity computation
  • Redundancy
  • Overlapping clustering

BibTex

@inproceedings{Kolb:2013:DMT:2486767.2486768,
 author = {Kolb, Lars and Thor, Andreas and Rahm, Erhard},
 title = {{Don't Match Twice: Redundancy-free Similarity Computation with MapReduce}},
 booktitle = {Proceedings of the Second Workshop on Data Analytics in the Cloud},
 series = {DanaC '13},
 year = {2013},
 pages = {1--5},
 url = {http://doi.acm.org/10.1145/2486767.2486768}
}