A paper “Split4Blank: Maintaining consistency while improving efficiency of loading RDF data with blank nodes” was published in PLOS ONE.
2019. 06. 13 /
A paper “Split4Blank: Maintaining consistency while improving efficiency of loading RDF data with blank nodes” was published in PLOS ONE. Dr. Atsuko Yamaguchi and Yasunori Yamamoto from DBCLS have contributed to this work.
As data sizes become extremely large, loading them to a triple store consumes time. To improve the efficiency of this task, parallel loading has been recommended for several stores. However, with parallel loading, dataset consistency must be considered if the dataset contains blank nodes. In this paper, a method which splits a dataset into multiple files under the condition that identical blank nodes are not separated is proposed. The proposed method uses connected component and multiprocessor scheduling algorithms and was proved to keep dataset consistency about blank nodes. The tool Split4Blank was developed based on the method and is available here under the MIT license.
The open-access full paper can be found online at: https://doi.org/10.1371/journal.pone.0217852
PLOS ONE. 2019 DOI: 10.1371/journal.pone.0217852