German English

Distributed Privacy-Preserving Record Linkage using Pivot-based Filter Techniques

PDF

Google Scholar

Gladbach, Marcel; Sehili, Ziad; Kudraß, Thomas; Christen, Peter; Rahm, Erhard
Distributed Privacy-Preserving Record Linkage using Pivot-based Filter Techniques
Proceedings of the 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), pp. 33-38, 2018
2018-04

Description

Privacy-preserving record linkage (PPRL) aims at linking person-related records from different data sources while protecting privacy. It is applied in medical research to link health data without revealing sensible person-related data. We propose and evaluate a new parallel PPRL approach based on Apache Flink that aims at high performance and scalability to large datasets. The approach supports a pivot-based filtering method for metric distance functions that saves many similarity computations. We describe our distributed approaches to determine pivots and pivot-based linkage. We also demonstrate the high efficiency of the approach for different datasets and configurations.