A feasibility study on identifying drinking-related contents in Facebook through mining heterogeneous data

Omar ElTayeby, Todd Eaglin, Malak Abdullah, David Burlinson, Wenwen Dou, Lixia Yao

Research output: Contribution to journalArticle

3 Scopus citations


Binge drinking is a severe health problem faced by many US colleges and universities. College students often post drinking-related text and images on social media, portraying their alcohol use as socially desirable. In this project, we investigated the feasibility of mining the heterogeneous data (e.g. text, images, and videos) on Facebook to identify drinking-related contents. We manually annotated 4266 posts during 21 October 2011 and 3 November 2014 from “I’m Shmacked” group on Facebook, where 511 posts were drinking-related. Our machine learning models show that by combining heterogeneous data types, we were able to identify drinking-related posts with an F1-score of 0.81. Prediction models built on text data were more reliable compared to those built on image and video data for predicting drinking-related contents. As the first step of our efforts in this direction, this feasibility study showed promise toward unleashing the potential of mining social media to identify students who binge drink.

Original languageEnglish (US)
JournalHealth Informatics Journal
StateAccepted/In press - Jan 1 2018



  • binge drinking
  • image classification
  • machine learning
  • social media
  • text mining
  • video classification

ASJC Scopus subject areas

  • Health Informatics

Cite this