Graph-based malware detection using dynamic analysis

Blake Anderson, Daniel Quist, Joshua Neil, Curtis Storlie, Terran Lane

Research output: Contribution to journalArticle

105 Citations (Scopus)

Abstract

We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.

Original languageEnglish (US)
Pages (from-to)247-258
Number of pages12
JournalJournal in Computer Virology
Volume7
Issue number4
DOIs
StatePublished - Nov 2011
Externally publishedYes

Fingerprint

Dynamic analysis
Computer viruses
Packers
Viruses
Markov processes
Support vector machines
Learning systems
Malware

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Hardware and Architecture

Cite this

Graph-based malware detection using dynamic analysis. / Anderson, Blake; Quist, Daniel; Neil, Joshua; Storlie, Curtis; Lane, Terran.

In: Journal in Computer Virology, Vol. 7, No. 4, 11.2011, p. 247-258.

Research output: Contribution to journalArticle

Anderson, Blake ; Quist, Daniel ; Neil, Joshua ; Storlie, Curtis ; Lane, Terran. / Graph-based malware detection using dynamic analysis. In: Journal in Computer Virology. 2011 ; Vol. 7, No. 4. pp. 247-258.
@article{2fbb753fe07044d694b7c1d7196f463a,
title = "Graph-based malware detection using dynamic analysis",
abstract = "We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.",
author = "Blake Anderson and Daniel Quist and Joshua Neil and Curtis Storlie and Terran Lane",
year = "2011",
month = "11",
doi = "10.1007/s11416-011-0152-x",
language = "English (US)",
volume = "7",
pages = "247--258",
journal = "Journal of Computer Virology and Hacking Techniques",
issn = "2274-2042",
publisher = "Springer Science + Business Media",
number = "4",

}

TY - JOUR

T1 - Graph-based malware detection using dynamic analysis

AU - Anderson, Blake

AU - Quist, Daniel

AU - Neil, Joshua

AU - Storlie, Curtis

AU - Lane, Terran

PY - 2011/11

Y1 - 2011/11

N2 - We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.

AB - We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.

UR - http://www.scopus.com/inward/record.url?scp=80255137449&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80255137449&partnerID=8YFLogxK

U2 - 10.1007/s11416-011-0152-x

DO - 10.1007/s11416-011-0152-x

M3 - Article

VL - 7

SP - 247

EP - 258

JO - Journal of Computer Virology and Hacking Techniques

JF - Journal of Computer Virology and Hacking Techniques

SN - 2274-2042

IS - 4

ER -