Algorithms and software for collaborative discovery from autonomous, semantically heterogeneous, distributed information sources

Doina Caragea, Jun Zhang, Jie Bao, Jyotishman Pathak, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Development of high throughput data acquisition technologies, together with advances in computing, and communications have resulted in an explosive growth in the number, size, and diversity of potentially useful information sources, This has resulted in unprecedented opportunities in data-driven knowledge acquisition and decision-making in a number of emerging increasingly data-rich application domains such as bioinformatics, environmental informatics, enterprise informatics, and social informatics (among others). However, the massive size, semantic heterogeneity, autonomy, and distributed nature of the data repositories present significant hurdles in acquiring useful knowledge from the available data. This paper introduces some of the algorithmic and statistical problems that arise in such a setting, describes algorithms for learning classifiers from distributed data that offer rigorous performance guarantees (relative to their centralized or batch counterparts). It also describes how this approach can be extended to work with autonomous, and hence, inevitably semantically heterogeneous data sources, by making explicit, the ontologies (attributes and relationships between attributes) associated with the data sources and reconciling the semantic differences among the data sources from a user's point of view. This allows user or context-dependent exploration of semantically heterogeneous data sources. The resulting algorithms have been implemented in INDUS - an open source software package for collaborative discovery from autonomous, semantically heterogeneous, distributed data sources.

Original languageEnglish (US)
Title of host publicationDiscovery Science - 8th International Conference, DS 2005, Proceedings
Pages14
Number of pages1
DOIs
StatePublished - 2005
Event8th International Conference on Discovery Science, DS 2005 - , Singapore
Duration: Oct 8 2005Oct 11 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3735 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th International Conference on Discovery Science, DS 2005
CountrySingapore
Period10/8/0510/11/05

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Algorithms and software for collaborative discovery from autonomous, semantically heterogeneous, distributed information sources'. Together they form a unique fingerprint.

  • Cite this

    Caragea, D., Zhang, J., Bao, J., Pathak, J., & Honavar, V. (2005). Algorithms and software for collaborative discovery from autonomous, semantically heterogeneous, distributed information sources. In Discovery Science - 8th International Conference, DS 2005, Proceedings (pp. 14). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3735 LNAI). https://doi.org/10.1007/11563983_2