Automating reverse engineering with machine learning techniques

Blake Anderson, Curtis Storlie, Micah Yates, Aaron McPhall

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.

Original languageEnglish (US)
Pages (from-to)103-112
Number of pages10
JournalUnknown Journal
Volume2014-November
Issue numberNovember
DOIs
StatePublished - Nov 7 2014
Externally publishedYes

Fingerprint

Reverse engineering
Subroutines
Learning systems
Engineers
Support vector machines
Labels
Malware
Machine Learning
Detectors
Support Vector Machine

Keywords

  • Computer security
  • Gaussian processes
  • Machine learning
  • Malware
  • Multiple kernel learning
  • Support vector machines

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Cite this

Automating reverse engineering with machine learning techniques. / Anderson, Blake; Storlie, Curtis; Yates, Micah; McPhall, Aaron.

In: Unknown Journal, Vol. 2014-November, No. November, 07.11.2014, p. 103-112.

Research output: Contribution to journalArticle

Anderson, B, Storlie, C, Yates, M & McPhall, A 2014, 'Automating reverse engineering with machine learning techniques', Unknown Journal, vol. 2014-November, no. November, pp. 103-112. https://doi.org/10.1145/2666652.2666665
Anderson, Blake ; Storlie, Curtis ; Yates, Micah ; McPhall, Aaron. / Automating reverse engineering with machine learning techniques. In: Unknown Journal. 2014 ; Vol. 2014-November, No. November. pp. 103-112.
@article{1f4380b71b294fe5bdfff991dcfd835b,
title = "Automating reverse engineering with machine learning techniques",
abstract = "Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.",
keywords = "Computer security, Gaussian processes, Machine learning, Malware, Multiple kernel learning, Support vector machines",
author = "Blake Anderson and Curtis Storlie and Micah Yates and Aaron McPhall",
year = "2014",
month = "11",
day = "7",
doi = "10.1145/2666652.2666665",
language = "English (US)",
volume = "2014-November",
pages = "103--112",
journal = "American Journal of Physiology - Renal Fluid and Electrolyte Physiology",
issn = "1931-857X",
publisher = "American Physiological Society",
number = "November",

}

TY - JOUR

T1 - Automating reverse engineering with machine learning techniques

AU - Anderson, Blake

AU - Storlie, Curtis

AU - Yates, Micah

AU - McPhall, Aaron

PY - 2014/11/7

Y1 - 2014/11/7

N2 - Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.

AB - Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.

KW - Computer security

KW - Gaussian processes

KW - Machine learning

KW - Malware

KW - Multiple kernel learning

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=84937691153&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937691153&partnerID=8YFLogxK

U2 - 10.1145/2666652.2666665

DO - 10.1145/2666652.2666665

M3 - Article

AN - SCOPUS:84937691153

VL - 2014-November

SP - 103

EP - 112

JO - American Journal of Physiology - Renal Fluid and Electrolyte Physiology

JF - American Journal of Physiology - Renal Fluid and Electrolyte Physiology

SN - 1931-857X

IS - November

ER -