Automated discovery of drug treatment patterns for endocrine therapy of breast cancer within an electronic medical record

Guergana K. Savova, Janet E. Olson, Sean P. Murphy, Victoria L. Cafourek, Fergus J. Couch, Matthew P. Goetz, James N. Ingle, Vera J. Suman, Christopher G. Chute, Richard M. Weinshilboum

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


Objective: To develop an algorithm for the discovery of drug treatment patterns for endocrine breast cancer therapy within an electronic medical record and to test the hypothesis that information extracted using it is comparable to the information found by traditional methods. Materials: The electronic medical charts of 1507 patients diagnosed with histologically confirmed primary invasive breast cancer. Methods: The automatic drug treatment classification tool consisted of components for: (1) extraction of drug treatment-relevant information from clinical narratives using natural language processing (clinical Text Analysis and Knowledge Extraction System); (2) extraction of drug treatment data from an electronic prescribing system; (3) merging information to create a patient treatment timeline; and (4) final classification logic. Results: Agreement between results from the algorithm and from a nurse abstractor is measured for categories: (0) no tamoxifen or aromatase inhibitor (AI) treatment; (1) tamoxifen only; (2) AI only; (3) tamoxifen before AI; (4) AI before tamoxifen; (5) multiple AIs and tamoxifen cycles in no specific order; and (6) no specific treatment dates. Specificity (all categories): 96.14%e100%; sensitivity (categories (0)-(4)): 90.27%-99.83%; sensitivity (categories (5)-(6)): 0-23.53%; positive predictive values: 80%e97.38%; negative predictive values: 96.91%e99.93%. Discussion: Our approach illustrates a secondary use of the electronic medical record. The main challenge is event temporality. Conclusion: We present an algorithm for automated treatment classification within an electronic medical record to combine information extracted through natural language processing with that extracted from structured databases. The algorithm has high specificity for all categories, high sensitivity for five categories, and low sensitivity for two categories.

Original languageEnglish (US)
Pages (from-to)e83-e89
JournalJournal of the American Medical Informatics Association
Issue numberE1
StatePublished - Jun 2012

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'Automated discovery of drug treatment patterns for endocrine therapy of breast cancer within an electronic medical record'. Together they form a unique fingerprint.

Cite this