Characterization of semi-synthetic dataset for big-data semantic analysis

Robert Techentin; Daniel Foti; Sinan Al-Saffar; Peter Li; Erik Daniel; Barry Gilbert; David Holmes

doi:10.1109/HPEC.2014.7040994

Characterization of semi-synthetic dataset for big-data semantic analysis

Robert Techentin, Daniel Foti, Sinan Al-Saffar, Peter Li, Erik Daniel, Barry Gilbert, David Holmes

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

Over the past decade, the use of semantic databases has served as the basis for storing and analyzing complex, heterogeneous, and irregular data. While there are similarities with traditional relational database systems, semantic data stores provide a rich platform for conducting non-traditional analyses of data. In support of new graph analytic algorithms and specialized graph analytic hardware, we have developed a large semi-synthetic, semantically rich dataset. The construction of this dataset mimics the real-world scenario of using relational databases as the basis for semantic data construction. In order to achieve real-world variable distributions and variable dependencies, data.gov data was used as the basis for developing an approach to build arbitrarily large semi-synthetic datasets. The intent of the semi-synthetic dataset is to serve as a testbed for new semantic graph analyses and computational software/hardware platforms. The construction process and basic data characterization is described. All code related to the data collection, consolidation, and augmentation are available for distribution.

Original language	English (US)
Title of host publication	2014 IEEE High Performance Extreme Computing Conference, HPEC 2014
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781479962334
DOIs	https://doi.org/10.1109/HPEC.2014.7040994
State	Published - Feb 11 2014
Event	2014 IEEE High Performance Extreme Computing Conference, HPEC 2014 - Waltham, United States Duration: Sep 9 2014 → Sep 11 2014

Publication series

Name	2014 IEEE High Performance Extreme Computing Conference, HPEC 2014

Other

Other	2014 IEEE High Performance Extreme Computing Conference, HPEC 2014
Country/Territory	United States
City	Waltham
Period	9/9/14 → 9/11/14

Keywords

RDF
big data
data.gov
graph computing
semantic representation

ASJC Scopus subject areas

Software

Access to Document

10.1109/HPEC.2014.7040994

Cite this

Techentin, R., Foti, D., Al-Saffar, S., Li, P., Daniel, E., Gilbert, B., & Holmes, D. (2014). Characterization of semi-synthetic dataset for big-data semantic analysis. In 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014 Article 7040994 (2014 IEEE High Performance Extreme Computing Conference, HPEC 2014). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/HPEC.2014.7040994

Characterization of semi-synthetic dataset for big-data semantic analysis. / Techentin, Robert; Foti, Daniel; Al-Saffar, Sinan et al.
2014 IEEE High Performance Extreme Computing Conference, HPEC 2014. Institute of Electrical and Electronics Engineers Inc., 2014. 7040994 (2014 IEEE High Performance Extreme Computing Conference, HPEC 2014).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Techentin, R, Foti, D, Al-Saffar, S, Li, P, Daniel, E, Gilbert, B & Holmes, D 2014, Characterization of semi-synthetic dataset for big-data semantic analysis. in 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014., 7040994, 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014, Institute of Electrical and Electronics Engineers Inc., 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014, Waltham, United States, 9/9/14. https://doi.org/10.1109/HPEC.2014.7040994

Techentin R, Foti D, Al-Saffar S, Li P, Daniel E, Gilbert B et al. Characterization of semi-synthetic dataset for big-data semantic analysis. In 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014. Institute of Electrical and Electronics Engineers Inc. 2014. 7040994. (2014 IEEE High Performance Extreme Computing Conference, HPEC 2014). doi: 10.1109/HPEC.2014.7040994

@inproceedings{5d63e6270a224cd599d780e78e6edaa5,

title = "Characterization of semi-synthetic dataset for big-data semantic analysis",

abstract = "Over the past decade, the use of semantic databases has served as the basis for storing and analyzing complex, heterogeneous, and irregular data. While there are similarities with traditional relational database systems, semantic data stores provide a rich platform for conducting non-traditional analyses of data. In support of new graph analytic algorithms and specialized graph analytic hardware, we have developed a large semi-synthetic, semantically rich dataset. The construction of this dataset mimics the real-world scenario of using relational databases as the basis for semantic data construction. In order to achieve real-world variable distributions and variable dependencies, data.gov data was used as the basis for developing an approach to build arbitrarily large semi-synthetic datasets. The intent of the semi-synthetic dataset is to serve as a testbed for new semantic graph analyses and computational software/hardware platforms. The construction process and basic data characterization is described. All code related to the data collection, consolidation, and augmentation are available for distribution.",

keywords = "RDF, big data, data.gov, graph computing, semantic representation",

author = "Robert Techentin and Daniel Foti and Sinan Al-Saffar and Peter Li and Erik Daniel and Barry Gilbert and David Holmes",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.; 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014 ; Conference date: 09-09-2014 Through 11-09-2014",

year = "2014",

month = feb,

day = "11",

doi = "10.1109/HPEC.2014.7040994",

language = "English (US)",

series = "2014 IEEE High Performance Extreme Computing Conference, HPEC 2014",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2014 IEEE High Performance Extreme Computing Conference, HPEC 2014",

}

TY - GEN

T1 - Characterization of semi-synthetic dataset for big-data semantic analysis

AU - Techentin, Robert

AU - Foti, Daniel

AU - Al-Saffar, Sinan

AU - Li, Peter

AU - Daniel, Erik

AU - Gilbert, Barry

AU - Holmes, David

PY - 2014/2/11

Y1 - 2014/2/11

N2 - Over the past decade, the use of semantic databases has served as the basis for storing and analyzing complex, heterogeneous, and irregular data. While there are similarities with traditional relational database systems, semantic data stores provide a rich platform for conducting non-traditional analyses of data. In support of new graph analytic algorithms and specialized graph analytic hardware, we have developed a large semi-synthetic, semantically rich dataset. The construction of this dataset mimics the real-world scenario of using relational databases as the basis for semantic data construction. In order to achieve real-world variable distributions and variable dependencies, data.gov data was used as the basis for developing an approach to build arbitrarily large semi-synthetic datasets. The intent of the semi-synthetic dataset is to serve as a testbed for new semantic graph analyses and computational software/hardware platforms. The construction process and basic data characterization is described. All code related to the data collection, consolidation, and augmentation are available for distribution.

AB - Over the past decade, the use of semantic databases has served as the basis for storing and analyzing complex, heterogeneous, and irregular data. While there are similarities with traditional relational database systems, semantic data stores provide a rich platform for conducting non-traditional analyses of data. In support of new graph analytic algorithms and specialized graph analytic hardware, we have developed a large semi-synthetic, semantically rich dataset. The construction of this dataset mimics the real-world scenario of using relational databases as the basis for semantic data construction. In order to achieve real-world variable distributions and variable dependencies, data.gov data was used as the basis for developing an approach to build arbitrarily large semi-synthetic datasets. The intent of the semi-synthetic dataset is to serve as a testbed for new semantic graph analyses and computational software/hardware platforms. The construction process and basic data characterization is described. All code related to the data collection, consolidation, and augmentation are available for distribution.

KW - RDF

KW - big data

KW - data.gov

KW - graph computing

KW - semantic representation

UR - http://www.scopus.com/inward/record.url?scp=84946689387&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946689387&partnerID=8YFLogxK

U2 - 10.1109/HPEC.2014.7040994

DO - 10.1109/HPEC.2014.7040994

M3 - Conference contribution

AN - SCOPUS:84946689387

T3 - 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014

BT - 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2014 IEEE High Performance Extreme Computing Conference, HPEC 2014

Y2 - 9 September 2014 through 11 September 2014

ER -

Characterization of semi-synthetic dataset for big-data semantic analysis

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this