CAREER: Natural Language Processing for Biological Knowledge Management

Project: Research project

Project Details


(This award is funded through the American Recovery and Reinvestment Act of 2009: Public Law 111-5).

This is a CAREER award to support the research of Dr. Hongfang Liu in the Department of Biostatistics, Bioinformatics and Biomathematics at Georgetown University. Dr. Liu is a second-year, tenure-track Assistant Professor. Natural language processing (NLP) is a field of computer science and linguistics which develops algorithmns to locate concepts in free text; ontologies (ie. common, defined volcabulary) must be created to capture the meaning of the free text. The research field of this investigator is in the use of NLP for biological knowledge management. Specifically she will build NLP systems for protein form curation. NLP systems will be used for retrieving articles, highlighting sentences, and extracting events/relationships related to protein forms and used as a basis for curating proteins. Since one gene can produce multiple protein forms which differ in sequence, chemistry, and function, a systematic analysis of proteomics data is needed for accurate annotations of genes and their corresponding protein forms. A NLP system can be constructed by taking advantage of knowledge from existing NLP systems and the targeted end users of the system. The project is engaging various communities such as molecular database developers, NLP researchers, and basic biology scientists, as the expert knowledge base for tools development. All of the NLP tools for protein curation will be posted on the Liu lab website:

As a part of her CAREER plan, Dr. Liu is providing research-oriented educational experiences for students and young researchers, especially in NLP and in ontology-based knowledge management in biology. Several web-based mini courses are being developed to provide biological domain-specific introduction to ontology, NLP, and ontology-based tools. The courses will be distributed publically. The research team includes a post-doctoral associate and doctoral student and interns from related degree programs at Georgetown and from nearby universities, including several historically minority schools such as Howard University and University of the District of Columbia. These collaborations will increase the participation of women and minorities and others under-represented in science and technology.

Effective start/end date8/1/097/31/14


  • National Science Foundation: $843,662.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.