Abteilung Datenbanken Leipzig (https://old.dbs.uni-leipzig.de)

GRADOOP: Scalable Graph Data Management and Analytics with Hadoop

Processing highly connected data as graphs becomes more and more important in many different domains. Prominent examples are social networks, e.g. facebook and Twitter, as well as information networks like the World Wide Web or biological networks. One important similarity of these domain specific data is their inherent graph structure which makes them eligible for analytics using graph algorithms. Besides that, the datasets share two more similarities: they are huge in size, making it hard or even impossible to process them on a single machine and they grow over time, which classifies them as dynamic graphs. With the objective of analyzing these large-scale, dynamic datasets, we started developing a framework called “Gradoop” (Graph Analytics on Hadoop®) with the following three main objectives:

  1. developing a graph data model incl. operators for the definition of analytical pipelines
  2. data integration of heterogeneous source systems into an integrated graph and
  3. efficient data distribution / replication to optimize the execution of distributed graph operators.

Our prototype is build on top of the distributed dataflow framework Apache Flink™ [1]. The data model has been designed and the operators have been implemented. A first use case is the BIIIG [2] project for graph analytics in business information networks. In our ongoing work, we will look into different methods of operator tuning depending on the underlying dataflow system.

People

Research

Students

Awards

Source code

GitHub [9]

Cooperation


Competence Center for Scalable Data Services and Solutions (ScaDS)
[10]

Selected Talks

DateTalkEventLanguage
Aug. 2017Scalable Graph Analytics [11]3rd Int. ScaDS Big Data Summer School [12]en
Feb 2017Skalierbare Graph-basierte Analyse und Business Intelligence [13]bitkom Big Data Summit 2017 [14]de
Feb 2017(Cypher)-[:ON]->(ApacheFlink)<-[:USING]-(Gradoop) [19]FOSDEM 2017 Graph Devroom [20]en
Feb 2017From Shopping Baskets to Structural Patterns [21]FOSDEM 2017 Graph Devroom [22]en
Nov 2016Scalable Graph Data Analytics with GRADOOP [23]BBDC Symposium [24]en
Oct 2016Gut vernetzt: Skalierbares Graph Mining für Business Intelligence [25]data2day 2016 [26]de
Jul 2016Distributed Graph Analytics with GRADOOP [27]Let’s talk about Graph Databases [28]en
Mar 2016GRADOOP - Scalable Graph Analytics with Apache Flink [29]Graph Fun with Apache Flink & Neo4j [30]en
Oct 2015GRADOOP - Scalable Graph Analytics with Apache Flink [34]FlinkForward 2015 [35]en
Jul 2015Scalable Graph Analytics with GRADOOP and BIIIG [36]Graph Sync Meeting @ScaDS Dresdenen
May 2015Scalable Graph Analytics with GRADOOP [39]Keynote GvDB-Workshopen

Publications

PDF [40]
further information [41]
Google Scholar [42]
[43]Rost, C [44]; Gomez, K [45]; Taeschner, M [46]; Fritzsche, P [47]; Schons, L [48]; Christ, L [49]; Adameit, T [50]; Junghanns, M [51]; Rahm, E [52]
Distributed temporal graph analytics with GRADOOP [53]
VLDB Journal 2021 Special Issue Paper
2021-05 [54]
PDF [55]
further information [56]
Google Scholar [57]
[58]Gomez, K. [59]; Taeschner, M. [60]; Rostami, M. Ali [61]; Rost, C. [62]; Rahm, E. [63]
Graph Sampling with Distributed In-Memory Dataflow Systems [64]
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2021
2021-03 [65]
PDF [66]

Google Scholar [67]
[68]Rost, C. [69]; Gomez, K. [70]; Fritzsche, P. [71]; Thor, A. [72]; Rahm, E. [73]
Exploration and Analysis of Temporal Property Graphs [74]
24th International Conference on Extending Database Technology (EDBT)
2021-03 [75]
PDF [76]
further information [77]
Google Scholar [78]
[79]Rost, C. [80]; Thor, A. [81]; Rahm, E. [82]
Analyzing Temporal Graphs with Gradoop [83]
Datenbank-Spektrum 19(3)
2019-11 [84]
PDF [85]
further information [86]
Google Scholar [87]
[88]Gomez, K. [89]; Taeschner, M. [90]; Rostami, M. Ali. [91]; Rost, C. [92]; Rahm, E. [93]
Distributed Graph Sampling with In-Memory Dataflow Systems [94]
Techn. Report, Univ. of Leipzig, arXiv:1910.04493, Oct 2019
2019-10 [95]
PDF [96]

Google Scholar [97]
publication icon [98]Rost, C. [99]; Thor, A. [100]; Fritzsche, P. [101]; Gomez, K. [102]; Rahm, E. [103]
Evolution Analysis of Large Graphs with Gradoop [104]
Proc. of Intl. Workshop on Advances in managing and mining large evolving graphs (LEG@ECML-PKDD)
2019-09 [105]
PDF [106]

Google Scholar [107]
[108]Kricke, M. [109]; Peukert, E. [110]; Rahm, E. [111]
Graph data transformations in GRADOOP [112]
Proc. BTW, March 2019
2019-03 [113]
PDF [114]

Google Scholar [115]
publication icon [116]Rost, Christopher [117]; Thor, Andreas [118]; Rahm, Erhard [119]
Temporal Graph Analysis using Gradoop [120]
Proc. BTW workshops, LNI
2019-03 [121]
PDF [122]

Google Scholar [123]
publication icon [124]Rostami, M.A. [125]; Kricke, M. [126]; Peukert, E. [127]; Kühne, S. [128]; Wilke, M. [129]; Dienst, S. [130]; Rahm, E. [131]
BIGGR: Bringing Gradoop to Applications [132]
Datenbank-Spektrum
2019-03 [133]

further information [134]
Google Scholar [135]
publication icon [136]Petermann, A. [137]
On Pattern Mining in Graph Data to Support Decision-Making [138]
Dissertation, Univ. Leipzig
2019 [139]
PDF [140]

Google Scholar [141]
publication icon [142]Nentwig, Markus [143]; Rahm, Erhard [144]
Incremental Clustering on Linked Data [145]
Proc. IEEE International Conference on Data Mining Workshop, ICDMW 2018, Singapore
2018-11 [146]
PDF [147]
further information [148]
Google Scholar [149]
publication icon [150]Saeedi, Alieh [151]; Nentwig, Markus [152]; Peukert, Eric [153]; Rahm, Erhard [154]
Scalable Matching and Clustering of Entities with FAMER [155]
Complex Systems Informatics and Modeling Quarterly (CSIMQ), Issue 16, Sep./Oct. 2018, pp 61–83
2018-11 [156]

PDF [157]
Google Scholar [158]
[159]Junghanns, Martin [160]; Kießling, Max [161]; Teichmann, Niklas [162]; Gomez, Kevin [163]; Petermann, Andre [164]; Rahm, Erhard [165]
Declarative and distributed graph analytics with GRADOOP [166]
PVLDB
2018-08 [167]


Google Scholar [168]
publication icon [169] Bergami, G. [170]; Petermann, A. [171]; Montesi, D. [172]
THoSP: an Algorithm for Nesting Property Graphs [173]
Proc. ACM SIGMOD Workshop on Graph Data Management Experiences & Systems and Network Data Analytics (GRADES-NDA)
2018-06 [174]
PDF [175]

Google Scholar [176]
[177]Saeedi, Alieh [178]; Peukert, Eric [179]; Rahm, Erhard [180]
Using Link Features for Entity Clustering in Knowledge Graphs [181]
Proc. ESWC 2018 (Best research paper award)
2018-06 [182]
PDF [183]

Google Scholar [184]
[185]Petermann, A. [186]; Junghanns, M. [187]; Rahm, E. [188];
DIMSpan - Transactional Frequent Subgraph Mining with Distributed In-Memory Dataflow Systems [189]
Proc. Int. Conf. on Big Data Computing, Applications and Technologies (BDCAT) 2017, pp 237-246
2017-12 [190]
PDF [191]

Google Scholar [192]
[193]Petermann, A. [194]; Micale, G. [195]; Bergami, G. [196]; Pulvirenti, A. [197]; Rahm, E. [198];
Mining and Ranking of Generalized Multi-Dimensional Frequent Subgraphs [199]
Proc. International Conference on Digital Information Management (ICDIM) 2017
2017-09 [200]
PDF [201]

Google Scholar [202]
[203]Saeedi, Alieh [204]; Peukert, Eric [205]; Rahm, Erhard [206]
Comparative Evaluation of Distributed Clustering Schemes for Multi-source Entity Resolution [207]
Proc. ADBIS, LNCS 10509, pp 278-293
2017-09 [208]
PDF [209]

Google Scholar [210]
publication icon [211]Junghanns, M. [212]; Kießling, M. [213]; Averbuch, A., [214]; Petermann, A. [215]; Rahm, E. [216]
Cypher-based Graph Pattern Matching in Gradoop [217]
Proc. ACM SIGMOD workshop on Graph Data Management Experiences and Systems (GRADES)
2017-05 [218]
PDF [219]

Google Scholar [220]
[221]Junghanns, M. [222]; Petermann, A. [223]; Rahm, E. [224];
Distributed Grouping of Property Graphs with GRADOOP [225]
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2017
2017-03 [226]
PDF [227]

Google Scholar [228]
[229]Junghanns, M. [230]; Petermann, A. [231]; Teichmann, N. [232]; Rahm, E. [233];
The Big Picture: Understanding large-scale graphs using Graph Grouping with GRADOOP [234]
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2017 (Demo paper)
2017-03 [235]

PDF [236]
Google Scholar [237]
publication icon [238]Kemper, S. [239]; Petermann, A. [240]; Junghanns, M. [241]
Distributed FoodBroker: Skalierbare Generierung graphbasierter Geschäftsprozessdaten. [242]
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2017 (Workshops)
2017-03 [243]
PDF [244]

Google Scholar [245]
[246]Petermann, A. [247]; Junghanns, M. [248]; Kemper, S. [249]; Gomez, K. [250]; Teichmann, N. [251]; Rahm, E. [252];
Graph Mining for Complex Data Analytics [253]
Proc. ICDM 2016 (Demo paper)
2016-12 [254]

PDF [255]
Google Scholar [256]
publication icon [257]Junghanns, M. [258]; Petermann, A. [259]
Verteilte Graphanalyse mit Gradoop [260]
JavaSPEKTRUM 05/2016
2016-10 [261]
PDF [262]

Google Scholar [263]
[264]Petermann, A. [265]; Junghanns, M. [266]
Scalable Business Intelligence with Graph Collections [267]
it - Information Technology, Special Issue: Big Data Analytics, Vol. 58 (4), 2016, pp. 166–175
2016-08 [268]
PDF [269]

Google Scholar [270]
publication icon [271]Junghanns, M. [272]; Petermann, A. [273]; Teichmann, N. [274]; Gomez, K. [275]; Rahm, E. [276]
Analyzing Extended Property Graphs with Apache Flink [277]
Proc. Int. SIGMOD workshop on Network Data Analytics (NDA)
2016-07 [278]
PDF [279]

Google Scholar [280]
[281]Junghanns, M. [282]; Petermann, A. [283]; Gomez, K. [284]; Rahm, E. [285]
GRADOOP: Scalable Graph Data Management and Analytics with Hadoop [286]
Techn. Report, Univ. of Leipzig, arXiv:1506.00548, June 2015
2015-06 [287]
PDF [288]

Google Scholar [289]
publication icon [290]Rahm, Erhard [291]
Scalable graph analytics with GRADOOP [292]
Proc. GI-Workshop Grundlagen von Datenbanksystemen (GvDB), Gommern, May 2015 (Invited Talk)
2015-05 [293]
PDF [294]

Google Scholar [295]
[296]Petermann, A. [297]; Junghanns, M. [298]; Müller, R. [299]; Rahm, E. [300]
Graph-based Data Integration and Business Intelligence with BIIIG [301]
Proc. VLDB Conf., 2014 (Demo paper)
2014-09 [302]
PDF [303]

Google Scholar [304]
[305]Petermann, A. [306]; Junghanns, M. [307]; Müller, R. [308]; Rahm, E. [309]
FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics [310]
5th Workshop on Big Data Benchmarking (WBDB 2014), LNCS 8991, 2015
2014-08 [311]
PDF [312]

Google Scholar [313]
publication icon [314]Petermann, A. [315]; Junghanns, M. [316]; Müller, R. [317]; Rahm, E. [318]
BIIIG : Enabling Business Intelligence with Integrated Instance Graphs [319]
5th International Workshop on Graph Data Management (GDM 2014)
2014-03 [320]

Disclaimer

Apache®, Hadoop® Apache Flink™ and Apache HBase™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.


URL:
https://old.dbs.uni-leipzig.de/de/research/projects/gradoop