GRADOOP: Scalable Graph Data Management and Analytics with Hadoop

Processing highly connected data as graphs becomes more and more important in many different domains. Prominent examples are social networks, e.g. facebook and Twitter, as well as information networks like the World Wide Web or biological networks. One important similarity of these domain specific data is their inherent graph structure which makes them eligible for analytics using graph algorithms. Besides that, the datasets share two more similarities: they are huge in size, making it hard or even impossible to process them on a single machine and they grow over time, which classifies them as dynamic graphs. With the objective of analyzing these large-scale, dynamic datasets, we started developing a framework called “Gradoop” (Graph Analytics on Hadoop®) with the following three main objectives:

developing a graph data model incl. operators for the definition of analytical pipelines
data integration of heterogeneous source systems into an integrated graph and
efficient data distribution / replication to optimize the execution of distributed graph operators.

Our prototype is build on top of the distributed dataflow framework Apache Flink™. The data model has been designed and the operators have been implemented. A first use case is the BIIIG project for graph analytics in business information networks. In our ongoing work, we will look into different methods of operator tuning depending on the underlying dataflow system.

People

Research

Students

Philip Fritzsche
Timo Adameit
Lucas Schons

Awards

Best Demo Award, BTW 2017

Source code

GitHub

Cooperation

Competence Center for Scalable Data Services and Solutions (ScaDS)

Selected Talks

Date	Talk	Event	Language
Aug. 2017	Scalable Graph Analytics	3rd Int. ScaDS Big Data Summer School	en
Feb 2017	Skalierbare Graph-basierte Analyse und Business Intelligence	bitkom Big Data Summit 2017	de
Feb 2017	(Cypher)-[:ON]->(ApacheFlink)<-[:USING]-(Gradoop)	FOSDEM 2017 Graph Devroom	en
Feb 2017	From Shopping Baskets to Structural Patterns	FOSDEM 2017 Graph Devroom	en
Nov 2016	Scalable Graph Data Analytics with GRADOOP	BBDC Symposium	en
Oct 2016	Gut vernetzt: Skalierbares Graph Mining für Business Intelligence	data2day 2016	de
Jul 2016	Distributed Graph Analytics with GRADOOP	Let’s talk about Graph Databases	en
Mar 2016	GRADOOP - Scalable Graph Analytics with Apache Flink	Graph Fun with Apache Flink & Neo4j	en
Oct 2015	GRADOOP - Scalable Graph Analytics with Apache Flink	FlinkForward 2015	en
Jul 2015	Scalable Graph Analytics with GRADOOP and BIIIG	Graph Sync Meeting @ScaDS Dresden	en
May 2015	Scalable Graph Analytics with GRADOOP	Keynote GvDB-Workshop	en

Publications

Rost, C; Gomez, K; Taeschner, M; Fritzsche, P; Schons, L; Christ, L; Adameit, T; Junghanns, M; Rahm, E
Distributed temporal graph analytics with GRADOOP
VLDB Journal 2021 Special Issue Paper
2021-05

Gomez, K.; Taeschner, M.; Rostami, M. Ali; Rost, C.; Rahm, E.
Graph Sampling with Distributed In-Memory Dataflow Systems
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2021
2021-03

Rost, C.; Gomez, K.; Fritzsche, P.; Thor, A.; Rahm, E.
Exploration and Analysis of Temporal Property Graphs
24th International Conference on Extending Database Technology (EDBT)
2021-03

Rost, C.; Thor, A.; Rahm, E.
Analyzing Temporal Graphs with Gradoop
Datenbank-Spektrum 19(3)
2019-11

Gomez, K.; Taeschner, M.; Rostami, M. Ali.; Rost, C.; Rahm, E.
Distributed Graph Sampling with In-Memory Dataflow Systems
Techn. Report, Univ. of Leipzig, arXiv:1910.04493, Oct 2019
2019-10

Rost, C.; Thor, A.; Fritzsche, P.; Gomez, K.; Rahm, E.
Evolution Analysis of Large Graphs with Gradoop
Proc. of Intl. Workshop on Advances in managing and mining large evolving graphs (LEG@ECML-PKDD)
2019-09

Kricke, M.; Peukert, E.; Rahm, E.
Graph data transformations in GRADOOP
Proc. BTW, March 2019
2019-03

Rost, Christopher; Thor, Andreas; Rahm, Erhard
Temporal Graph Analysis using Gradoop
Proc. BTW workshops, LNI
2019-03

Rostami, M.A.; Kricke, M.; Peukert, E.; Kühne, S.; Wilke, M.; Dienst, S.; Rahm, E.
BIGGR: Bringing Gradoop to Applications
Datenbank-Spektrum
2019-03

Petermann, A.
On Pattern Mining in Graph Data to Support Decision-Making
Dissertation, Univ. Leipzig
2019

Nentwig, Markus; Rahm, Erhard
Incremental Clustering on Linked Data
Proc. IEEE International Conference on Data Mining Workshop, ICDMW 2018, Singapore
2018-11

Saeedi, Alieh; Nentwig, Markus; Peukert, Eric; Rahm, Erhard
Scalable Matching and Clustering of Entities with FAMER
Complex Systems Informatics and Modeling Quarterly (CSIMQ), Issue 16, Sep./Oct. 2018, pp 61–83
2018-11

Junghanns, Martin; Kießling, Max; Teichmann, Niklas; Gomez, Kevin; Petermann, Andre; Rahm, Erhard
Declarative and distributed graph analytics with GRADOOP
PVLDB
2018-08

Bergami, G.; Petermann, A.; Montesi, D.
THoSP: an Algorithm for Nesting Property Graphs
Proc. ACM SIGMOD Workshop on Graph Data Management Experiences & Systems and Network Data Analytics (GRADES-NDA)
2018-06

Saeedi, Alieh; Peukert, Eric; Rahm, Erhard
Using Link Features for Entity Clustering in Knowledge Graphs
Proc. ESWC 2018 (Best research paper award)
2018-06

Petermann, A.; Junghanns, M.; Rahm, E.;
DIMSpan - Transactional Frequent Subgraph Mining with Distributed In-Memory Dataflow Systems
Proc. Int. Conf. on Big Data Computing, Applications and Technologies (BDCAT) 2017, pp 237-246
2017-12

Petermann, A.; Micale, G.; Bergami, G.; Pulvirenti, A.; Rahm, E.;
Mining and Ranking of Generalized Multi-Dimensional Frequent Subgraphs
Proc. International Conference on Digital Information Management (ICDIM) 2017
2017-09

Saeedi, Alieh; Peukert, Eric; Rahm, Erhard
Comparative Evaluation of Distributed Clustering Schemes for Multi-source Entity Resolution
Proc. ADBIS, LNCS 10509, pp 278-293
2017-09

Junghanns, M.; Kießling, M.; Averbuch, A.,; Petermann, A.; Rahm, E.
Cypher-based Graph Pattern Matching in Gradoop
Proc. ACM SIGMOD workshop on Graph Data Management Experiences and Systems (GRADES)
2017-05

Junghanns, M.; Petermann, A.; Rahm, E.;
Distributed Grouping of Property Graphs with GRADOOP
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2017
2017-03

Junghanns, M.; Petermann, A.; Teichmann, N.; Rahm, E.;
The Big Picture: Understanding large-scale graphs using Graph Grouping with GRADOOP
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2017 (Demo paper)
2017-03

Kemper, S.; Petermann, A.; Junghanns, M.
Distributed FoodBroker: Skalierbare Generierung graphbasierter Geschäftsprozessdaten.
Proc. Datenbanksysteme für Business, Technologie und Web (BTW) 2017 (Workshops)
2017-03

Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E.;
Graph Mining for Complex Data Analytics
Proc. ICDM 2016 (Demo paper)
2016-12

Junghanns, M.; Petermann, A.
Verteilte Graphanalyse mit Gradoop
JavaSPEKTRUM 05/2016
2016-10

Petermann, A.; Junghanns, M.
Scalable Business Intelligence with Graph Collections
it - Information Technology, Special Issue: Big Data Analytics, Vol. 58 (4), 2016, pp. 166–175
2016-08

Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E.
Analyzing Extended Property Graphs with Apache Flink
Proc. Int. SIGMOD workshop on Network Data Analytics (NDA)
2016-07

Junghanns, M.; Petermann, A.; Gomez, K.; Rahm, E.
GRADOOP: Scalable Graph Data Management and Analytics with Hadoop
Techn. Report, Univ. of Leipzig, arXiv:1506.00548, June 2015
2015-06

Rahm, Erhard
Scalable graph analytics with GRADOOP
Proc. GI-Workshop Grundlagen von Datenbanksystemen (GvDB), Gommern, May 2015 (Invited Talk)
2015-05

Petermann, A.; Junghanns, M.; Müller, R.; Rahm, E.
Graph-based Data Integration and Business Intelligence with BIIIG
Proc. VLDB Conf., 2014 (Demo paper)
2014-09

Petermann, A.; Junghanns, M.; Müller, R.; Rahm, E.
FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics
5th Workshop on Big Data Benchmarking (WBDB 2014), LNCS 8991, 2015
2014-08

Petermann, A.; Junghanns, M.; Müller, R.; Rahm, E.
BIIIG : Enabling Business Intelligence with Integrated Instance Graphs
5th International Workshop on Graph Data Management (GDM 2014)
2014-03

Disclaimer

Apache®, Hadoop® Apache Flink™ and Apache HBase™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

» printer friendly version

Database Group Leipzig

Contents

Recent publications