Development of a semi-synthetic dataset as a testbed for big-data semantic analytics

Robert Techentin; Daniel Foti; Peter Li; Erik Daniel; Barry Gilbert; David Holmes; Sinan Al-Saffar

doi:10.1109/ICSC.2014.45

Development of a semi-synthetic dataset as a testbed for big-data semantic analytics

Robert Techentin, Daniel Foti, Peter Li, Erik Daniel, Barry Gilbert, David Holmes, Sinan Al-Saffar

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

We have developed a large semi-synthetic, semantically rich dataset, modeled after the medical record of a large medical institution. Using the highly diverse data.gov data repository and a multivariate data augmentation strategy, we can generate arbitrarily large semi-synthetic datasets which can be used to test new algorithms and computational platforms. The construction process and basic data characterization are described. The databases, as well as code for data collection, consolidation, and augmentation are available for distribution.

Original language	English (US)
Title of host publication	Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014
Publisher	IEEE Computer Society
Pages	252-253
Number of pages	2
ISBN (Print)	9781479940028
DOIs	https://doi.org/10.1109/ICSC.2014.45
State	Published - 2014
Event	8th IEEE International Conference on Semantic Computing, ICSC 2014 - Newport Beach, CA, United States Duration: Jun 16 2014 → Jun 18 2014

Publication series

Name	Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014

Other

Other	8th IEEE International Conference on Semantic Computing, ICSC 2014
Country/Territory	United States
City	Newport Beach, CA
Period	6/16/14 → 6/18/14

Keywords

RDF
big data
data.gov
graph computing
semantic representation

ASJC Scopus subject areas

Software

Access to Document

10.1109/ICSC.2014.45

Cite this

Techentin, R., Foti, D., Li, P., Daniel, E., Gilbert, B., Holmes, D., & Al-Saffar, S. (2014). Development of a semi-synthetic dataset as a testbed for big-data semantic analytics. In Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014 (pp. 252-253). Article 6882033 (Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014). IEEE Computer Society. https://doi.org/10.1109/ICSC.2014.45

Development of a semi-synthetic dataset as a testbed for big-data semantic analytics. / Techentin, Robert; Foti, Daniel; Li, Peter et al.
Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014. IEEE Computer Society, 2014. p. 252-253 6882033 (Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Techentin, R, Foti, D, Li, P, Daniel, E, Gilbert, B , Holmes, D & Al-Saffar, S 2014, Development of a semi-synthetic dataset as a testbed for big-data semantic analytics. in Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014., 6882033, Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014, IEEE Computer Society, pp. 252-253, 8th IEEE International Conference on Semantic Computing, ICSC 2014, Newport Beach, CA, United States, 6/16/14. https://doi.org/10.1109/ICSC.2014.45

Techentin R, Foti D, Li P, Daniel E, Gilbert B , Holmes D et al. Development of a semi-synthetic dataset as a testbed for big-data semantic analytics. In Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014. IEEE Computer Society. 2014. p. 252-253. 6882033. (Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014). doi: 10.1109/ICSC.2014.45

@inproceedings{cb4101088c424a7a994e5113abbc647f,

title = "Development of a semi-synthetic dataset as a testbed for big-data semantic analytics",

abstract = "We have developed a large semi-synthetic, semantically rich dataset, modeled after the medical record of a large medical institution. Using the highly diverse data.gov data repository and a multivariate data augmentation strategy, we can generate arbitrarily large semi-synthetic datasets which can be used to test new algorithms and computational platforms. The construction process and basic data characterization are described. The databases, as well as code for data collection, consolidation, and augmentation are available for distribution.",

keywords = "RDF, big data, data.gov, graph computing, semantic representation",

author = "Robert Techentin and Daniel Foti and Peter Li and Erik Daniel and Barry Gilbert and David Holmes and Sinan Al-Saffar",

year = "2014",

doi = "10.1109/ICSC.2014.45",

language = "English (US)",

isbn = "9781479940028",

series = "Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014",

publisher = "IEEE Computer Society",

pages = "252--253",

booktitle = "Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014",

note = "8th IEEE International Conference on Semantic Computing, ICSC 2014 ; Conference date: 16-06-2014 Through 18-06-2014",

}

TY - GEN

T1 - Development of a semi-synthetic dataset as a testbed for big-data semantic analytics

AU - Techentin, Robert

AU - Foti, Daniel

AU - Li, Peter

AU - Daniel, Erik

AU - Gilbert, Barry

AU - Holmes, David

AU - Al-Saffar, Sinan

PY - 2014

Y1 - 2014

N2 - We have developed a large semi-synthetic, semantically rich dataset, modeled after the medical record of a large medical institution. Using the highly diverse data.gov data repository and a multivariate data augmentation strategy, we can generate arbitrarily large semi-synthetic datasets which can be used to test new algorithms and computational platforms. The construction process and basic data characterization are described. The databases, as well as code for data collection, consolidation, and augmentation are available for distribution.

AB - We have developed a large semi-synthetic, semantically rich dataset, modeled after the medical record of a large medical institution. Using the highly diverse data.gov data repository and a multivariate data augmentation strategy, we can generate arbitrarily large semi-synthetic datasets which can be used to test new algorithms and computational platforms. The construction process and basic data characterization are described. The databases, as well as code for data collection, consolidation, and augmentation are available for distribution.

KW - RDF

KW - big data

KW - data.gov

KW - graph computing

KW - semantic representation

UR - http://www.scopus.com/inward/record.url?scp=84906972190&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84906972190&partnerID=8YFLogxK

U2 - 10.1109/ICSC.2014.45

DO - 10.1109/ICSC.2014.45

M3 - Conference contribution

AN - SCOPUS:84906972190

SN - 9781479940028

T3 - Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014

SP - 252

EP - 253

BT - Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014

PB - IEEE Computer Society

T2 - 8th IEEE International Conference on Semantic Computing, ICSC 2014

Y2 - 16 June 2014 through 18 June 2014

ER -

Development of a semi-synthetic dataset as a testbed for big-data semantic analytics

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this