Onto-BioThesaurus: ontological representation of gene/protein names for biomedica

Project: Research project

Project Details


DESCRIPTION (provided by applicant):

The long-term goal of our research is to develop resources and tools for knowledge retrieval management in the biomedical domain. As the pace of biomedical research accelerates, researchers become more and more dependent on computers to manage the explosive amount of biomedical information being published. The high quality of many databases is guaranteed by database curators who extract and synthesize information stored in literature or other databases. It is important to accurately recognize biomedical entity names in text and map the identified names to corresponding records in biomedical databases. Usually, a biomedical database provides a list of names either entered by curators or extracted from other databases. Those names could be used to retrieve records from databases or map names to database records by NLP systems. However, there are several characteristics associated with biomedical entity names, namely: synonymy (i.e., different names refer to the same database entry), ambiguity (i.e., one name is associated with different entries), and novelty (i.e., names or entities are not present in databases or knowledge bases) which make the task of retrieving database records using names and the task of associating names in text to database records very daunting. Additionally, biomedical entities can appear in text as short forms (SFs) abbreviated from their long forms (LFs). The prevalent use of SFs representing biomedical entities is another challenge faced by end users and NLP applications because of the high ambiguity of SFs.

Recently, ontology-based knowledge management is becoming increasingly popular since ontologies provide formal, machine-processable, and human-interpretable representations of the biomedical entities and their relations. We hypothesize that biomedical ontologies can be used to reduce the difficulty associated with retrieving records using names or mapping names in text to database records. Specific aims and the corresponding hypotheses are: i) develop onto-BioThesaurus by enriching BioThesaurus with gene/protein-related ontologies (Hypothesis: aligning gene/protein names to gene/protein-related ontologies can reduce the complexity associated with gene/protein names);ii) harvest synonyms for gene/protein classes and entities from online resources and text (Hypothesis: harvesting synonyms especially gene/protein SFs is critical since SFs are frequently used to represent gene/protein entities);iii) build a web user interface for gene/protein names and entries search and query through ontology-enabled onto-BioThesaurus (Hypothesis: enhancing BioThesaurus with gene/protein-related ontologies would enable us to build heuristic rules to enable machine reasoning);and iv) evaluate and distribute research methods/outcome (Hypothesis: evaluating and distributing research methods/outcome are critical to advance both basic and applied biomedical science.
Effective start/end date9/1/099/29/13


  • U.S. National Library of Medicine: $614,000.00
  • U.S. National Library of Medicine: $608,650.00


  • Medicine(all)
  • Health Professions(all)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.