TY - GEN
T1 - Automating reverse engineering with machine learning techniques
AU - Anderson, Blake
AU - Storlie, Curtis
AU - Yates, Micah
AU - McPhall, Aaron
N1 - Publisher Copyright:
© 2014 by the Association for Computing Machinery, Inc. (ACM).
PY - 2014/11/7
Y1 - 2014/11/7
N2 - Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.
AB - Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.
KW - Computer security
KW - Gaussian processes
KW - Machine learning
KW - Malware
KW - Multiple kernel learning
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=84937691153&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937691153&partnerID=8YFLogxK
U2 - 10.1145/2666652.2666665
DO - 10.1145/2666652.2666665
M3 - Conference contribution
AN - SCOPUS:84937691153
SN - 9781450331531
T3 - Proceedings of the ACM Conference on Computer and Communications Security
SP - 103
EP - 112
BT - AISec 2014 - Proceedings of the 2014 ACM Artificial Intelligent and Security Workshop, Co-located with CCS 2014
PB - Association for Computing Machinery
T2 - 2014 7th ACM Workshop Artificial Intelligence and Security, AISec 2014, Co-located with the ACM Conference on Computer and Communication Security, CCS 2014
Y2 - 7 November 2014
ER -