Characterization of semi-synthetic dataset for big-data semantic analysis

Robert Techentin, Daniel Foti, Sinan Al-Saffar, Peter Li, Erik Daniel, Barry Gilbert, David Holmes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Over the past decade, the use of semantic databases has served as the basis for storing and analyzing complex, heterogeneous, and irregular data. While there are similarities with traditional relational database systems, semantic data stores provide a rich platform for conducting non-traditional analyses of data. In support of new graph analytic algorithms and specialized graph analytic hardware, we have developed a large semi-synthetic, semantically rich dataset. The construction of this dataset mimics the real-world scenario of using relational databases as the basis for semantic data construction. In order to achieve real-world variable distributions and variable dependencies, data.gov data was used as the basis for developing an approach to build arbitrarily large semi-synthetic datasets. The intent of the semi-synthetic dataset is to serve as a testbed for new semantic graph analyses and computational software/hardware platforms. The construction process and basic data characterization is described. All code related to the data collection, consolidation, and augmentation are available for distribution.

Original languageEnglish (US)
Title of host publication2014 IEEE High Performance Extreme Computing Conference, HPEC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479962334
DOIs
StatePublished - Feb 11 2014
Event2014 IEEE High Performance Extreme Computing Conference, HPEC 2014 - Waltham, United States
Duration: Sep 9 2014Sep 11 2014

Publication series

Name2014 IEEE High Performance Extreme Computing Conference, HPEC 2014

Other

Other2014 IEEE High Performance Extreme Computing Conference, HPEC 2014
Country/TerritoryUnited States
CityWaltham
Period9/9/149/11/14

Keywords

  • RDF
  • big data
  • data.gov
  • graph computing
  • semantic representation

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Characterization of semi-synthetic dataset for big-data semantic analysis'. Together they form a unique fingerprint.

Cite this