TY - GEN
T1 - Characterization of semi-synthetic dataset for big-data semantic analysis
AU - Techentin, Robert
AU - Foti, Daniel
AU - Al-Saffar, Sinan
AU - Li, Peter
AU - Daniel, Erik
AU - Gilbert, Barry
AU - Holmes, David
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/2/11
Y1 - 2014/2/11
N2 - Over the past decade, the use of semantic databases has served as the basis for storing and analyzing complex, heterogeneous, and irregular data. While there are similarities with traditional relational database systems, semantic data stores provide a rich platform for conducting non-traditional analyses of data. In support of new graph analytic algorithms and specialized graph analytic hardware, we have developed a large semi-synthetic, semantically rich dataset. The construction of this dataset mimics the real-world scenario of using relational databases as the basis for semantic data construction. In order to achieve real-world variable distributions and variable dependencies, data.gov data was used as the basis for developing an approach to build arbitrarily large semi-synthetic datasets. The intent of the semi-synthetic dataset is to serve as a testbed for new semantic graph analyses and computational software/hardware platforms. The construction process and basic data characterization is described. All code related to the data collection, consolidation, and augmentation are available for distribution.
AB - Over the past decade, the use of semantic databases has served as the basis for storing and analyzing complex, heterogeneous, and irregular data. While there are similarities with traditional relational database systems, semantic data stores provide a rich platform for conducting non-traditional analyses of data. In support of new graph analytic algorithms and specialized graph analytic hardware, we have developed a large semi-synthetic, semantically rich dataset. The construction of this dataset mimics the real-world scenario of using relational databases as the basis for semantic data construction. In order to achieve real-world variable distributions and variable dependencies, data.gov data was used as the basis for developing an approach to build arbitrarily large semi-synthetic datasets. The intent of the semi-synthetic dataset is to serve as a testbed for new semantic graph analyses and computational software/hardware platforms. The construction process and basic data characterization is described. All code related to the data collection, consolidation, and augmentation are available for distribution.
KW - RDF
KW - big data
KW - data.gov
KW - graph computing
KW - semantic representation
UR - http://www.scopus.com/inward/record.url?scp=84946689387&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84946689387&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2014.7040994
DO - 10.1109/HPEC.2014.7040994
M3 - Conference contribution
AN - SCOPUS:84946689387
T3 - 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014
BT - 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014
Y2 - 9 September 2014 through 11 September 2014
ER -