A feasibility study on identifying drinking-related contents in Facebook through mining heterogeneous data

Omar ElTayeby, Todd Eaglin, Malak Abdullah, David Burlinson, Wenwen Dou, Lixia Yao

Research output: Contribution to journalArticle

Abstract

Binge drinking is a severe health problem faced by many US colleges and universities. College students often post drinking-related text and images on social media, portraying their alcohol use as socially desirable. In this project, we investigated the feasibility of mining the heterogeneous data (e.g. text, images, and videos) on Facebook to identify drinking-related contents. We manually annotated 4266 posts during 21 October 2011 and 3 November 2014 from “I’m Shmacked” group on Facebook, where 511 posts were drinking-related. Our machine learning models show that by combining heterogeneous data types, we were able to identify drinking-related posts with an F1-score of 0.81. Prediction models built on text data were more reliable compared to those built on image and video data for predicting drinking-related contents. As the first step of our efforts in this direction, this feasibility study showed promise toward unleashing the potential of mining social media to identify students who binge drink.

Original languageEnglish (US)
JournalHealth Informatics Journal
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Data Mining
Feasibility Studies
Drinking
Social Media
Students
Binge Drinking
Alcohols
Health

Keywords

  • binge drinking
  • image classification
  • machine learning
  • social media
  • text mining
  • video classification

ASJC Scopus subject areas

  • Health Informatics

Cite this

A feasibility study on identifying drinking-related contents in Facebook through mining heterogeneous data. / ElTayeby, Omar; Eaglin, Todd; Abdullah, Malak; Burlinson, David; Dou, Wenwen; Yao, Lixia.

In: Health Informatics Journal, 01.01.2018.

Research output: Contribution to journalArticle

ElTayeby, Omar ; Eaglin, Todd ; Abdullah, Malak ; Burlinson, David ; Dou, Wenwen ; Yao, Lixia. / A feasibility study on identifying drinking-related contents in Facebook through mining heterogeneous data. In: Health Informatics Journal. 2018.
@article{7dbd2aba220a474c9ff6a6c4c5fb7893,
title = "A feasibility study on identifying drinking-related contents in Facebook through mining heterogeneous data",
abstract = "Binge drinking is a severe health problem faced by many US colleges and universities. College students often post drinking-related text and images on social media, portraying their alcohol use as socially desirable. In this project, we investigated the feasibility of mining the heterogeneous data (e.g. text, images, and videos) on Facebook to identify drinking-related contents. We manually annotated 4266 posts during 21 October 2011 and 3 November 2014 from “I’m Shmacked” group on Facebook, where 511 posts were drinking-related. Our machine learning models show that by combining heterogeneous data types, we were able to identify drinking-related posts with an F1-score of 0.81. Prediction models built on text data were more reliable compared to those built on image and video data for predicting drinking-related contents. As the first step of our efforts in this direction, this feasibility study showed promise toward unleashing the potential of mining social media to identify students who binge drink.",
keywords = "binge drinking, image classification, machine learning, social media, text mining, video classification",
author = "Omar ElTayeby and Todd Eaglin and Malak Abdullah and David Burlinson and Wenwen Dou and Lixia Yao",
year = "2018",
month = "1",
day = "1",
doi = "10.1177/1460458218798084",
language = "English (US)",
journal = "Health Informatics Journal",
issn = "1460-4582",
publisher = "SAGE Publications Ltd",

}

TY - JOUR

T1 - A feasibility study on identifying drinking-related contents in Facebook through mining heterogeneous data

AU - ElTayeby, Omar

AU - Eaglin, Todd

AU - Abdullah, Malak

AU - Burlinson, David

AU - Dou, Wenwen

AU - Yao, Lixia

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Binge drinking is a severe health problem faced by many US colleges and universities. College students often post drinking-related text and images on social media, portraying their alcohol use as socially desirable. In this project, we investigated the feasibility of mining the heterogeneous data (e.g. text, images, and videos) on Facebook to identify drinking-related contents. We manually annotated 4266 posts during 21 October 2011 and 3 November 2014 from “I’m Shmacked” group on Facebook, where 511 posts were drinking-related. Our machine learning models show that by combining heterogeneous data types, we were able to identify drinking-related posts with an F1-score of 0.81. Prediction models built on text data were more reliable compared to those built on image and video data for predicting drinking-related contents. As the first step of our efforts in this direction, this feasibility study showed promise toward unleashing the potential of mining social media to identify students who binge drink.

AB - Binge drinking is a severe health problem faced by many US colleges and universities. College students often post drinking-related text and images on social media, portraying their alcohol use as socially desirable. In this project, we investigated the feasibility of mining the heterogeneous data (e.g. text, images, and videos) on Facebook to identify drinking-related contents. We manually annotated 4266 posts during 21 October 2011 and 3 November 2014 from “I’m Shmacked” group on Facebook, where 511 posts were drinking-related. Our machine learning models show that by combining heterogeneous data types, we were able to identify drinking-related posts with an F1-score of 0.81. Prediction models built on text data were more reliable compared to those built on image and video data for predicting drinking-related contents. As the first step of our efforts in this direction, this feasibility study showed promise toward unleashing the potential of mining social media to identify students who binge drink.

KW - binge drinking

KW - image classification

KW - machine learning

KW - social media

KW - text mining

KW - video classification

UR - http://www.scopus.com/inward/record.url?scp=85058928984&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058928984&partnerID=8YFLogxK

U2 - 10.1177/1460458218798084

DO - 10.1177/1460458218798084

M3 - Article

C2 - 30230403

AN - SCOPUS:85058928984

JO - Health Informatics Journal

JF - Health Informatics Journal

SN - 1460-4582

ER -