GRADOOP: Scalable Graph Data Management and Analytics with Hadoop
Processing highly connected data as graphs becomes more and more important in many different domains. Prominent examples are social networks, e.g. facebook and Twitter, as well as information networks like the World Wide Web or biological networks. One important similarity of these domain specific data is their inherent graph structure which makes them eligible for analytics using graph algorithms. Besides that, the datasets share two more similarities: they are huge in size, making it hard or even impossible to process them on a single machine and they grow over time, which classifies them as dynamic graphs. With the objective of analyzing these large-scale, dynamic datasets, we started developing a framework called “Gradoop” (Graph Analytics on Hadoop®) with the following three main objectives:
- developing a graph data model incl. operators for the definition of analytical pipelines
- data integration of heterogeneous source systems into an integrated graph and
- efficient data distribution / replication to optimize the execution of distributed graph operators.
Our prototype is build on top of the distributed dataflow framework Apache Flink™. The data model has been designed and the operators have been implemented. A first use case is the BIIIG project for graph analytics in business information networks. In our ongoing work, we will look into different methods of operator tuning depending on the underlying dataflow system.
People
Research
Students
- Philip Fritzsche
- Timo Adameit
- Lucas Schons
Awards
Source code
GitHubCooperation
Competence Center for Scalable Data Services and Solutions (ScaDS)
Selected Talks
Date | Talk | Event | Language |
---|---|---|---|
Aug. 2017 | Scalable Graph Analytics | 3rd Int. ScaDS Big Data Summer School | en |
Feb 2017 | Skalierbare Graph-basierte Analyse und Business Intelligence | bitkom Big Data Summit 2017 | de |
Feb 2017 | (Cypher)-[:ON]->(ApacheFlink)<-[:USING]-(Gradoop) | FOSDEM 2017 Graph Devroom | en |
Feb 2017 | From Shopping Baskets to Structural Patterns | FOSDEM 2017 Graph Devroom | en |
Nov 2016 | Scalable Graph Data Analytics with GRADOOP | BBDC Symposium | en |
Oct 2016 | Gut vernetzt: Skalierbares Graph Mining für Business Intelligence | data2day 2016 | de |
Jul 2016 | Distributed Graph Analytics with GRADOOP | Let’s talk about Graph Databases | en |
Mar 2016 | GRADOOP - Scalable Graph Analytics with Apache Flink | Graph Fun with Apache Flink & Neo4j | en |
Oct 2015 | GRADOOP - Scalable Graph Analytics with Apache Flink | FlinkForward 2015 | en |
Jul 2015 | Scalable Graph Analytics with GRADOOP and BIIIG | Graph Sync Meeting @ScaDS Dresden | en |
May 2015 | Scalable Graph Analytics with GRADOOP | Keynote GvDB-Workshop | en |
Publications
Disclaimer
Apache®, Hadoop® Apache Flink™ and Apache HBase™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.