Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes

Mohammed Ali Al-Garadi, Yuan Chi Yang, Sahithi Lakamana, Jie Lin, Sabrina Li, Angel Xie, Whitney Hogg-Bremer, Mylin Torres, Imon Banerjee, Abeed Sarker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Breast cancer patients often discontinue their long-term treatments, such as hormone therapy, increasing the risk of cancer recurrence. These discontinuations may be caused by adverse patient-centered outcomes (PCOs) due to hormonal drug side effects or other factors. PCOs are not detectable through laboratory tests, and are sparsely documented in electronic health records. Thus, there is a need to explore complementary sources of information for PCOs associated with breast cancer treatments. Social media is a promising resource, but extracting true PCOs from it first requires the accurate detection of real breast cancer patients. We describe a natural language processing (NLP) pipeline for automatically detecting breast cancer patients from Twitter based on their self-reports. The pipeline uses breast cancer-related keywords to collect streaming data from Twitter, applies NLP patterns to filter out noisy posts, and then employs a machine learning classifier trained using manually-annotated data (n = 5,019) for distinguishing firsthand self-reports of breast cancer from other tweets. A classifier based on bidirectional encoder representations from transformers (BERT) showed human-like performance and achieved Fscore of 0.857 (inter-annotator agreement: 0.845; Cohen’s kappa) for the positive class, considerably outperforming the next best classifier—a recurrent neural network with bidirectional long short-term memory (Fscore: 0.670). Qualitative analyses of posts from automatically-detected users revealed discussions about side effects, non-adherence and mental health conditions, illustrating the feasibility of our social media-based approach for studying breast cancer related PCOs from a large population.

Original languageEnglish (US)
Title of host publicationArtificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Proceedings
EditorsMartin Michalowski, Robert Moskovitch
PublisherSpringer Science and Business Media Deutschland GmbH
Pages100-110
Number of pages11
ISBN (Print)9783030591366
DOIs
StatePublished - 2020
Event18th International Conference on Artificial Intelligence in Medicine, AIME 2020 - Minneapolis, United States
Duration: Aug 25 2020Aug 28 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12299 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Artificial Intelligence in Medicine, AIME 2020
Country/TerritoryUnited States
CityMinneapolis
Period8/25/208/28/20

Keywords

  • Breast cancer
  • Natural language processing
  • Social media

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes'. Together they form a unique fingerprint.

Cite this