Scan statistics for the online detection of locally anomalous subgraphs

Joshua Neil, Curtis Hash, Alexander Brugh, Mike Fisk, Curtis Storlie

Research output: Contribution to journalArticle

57 Citations (Scopus)

Abstract

We introduce a computationally scalable method for detecting small anomalous areas in a large, time-dependent computer network, motivated by the challenge of identifying intruders operating inside enterprise-sized computer networks. Time-series of communications between computers are used to detect anomalies, and are modeled using Markov models that capture the bursty, often human-caused behavior that dominates a large subset of the time-series. Anomalies in these time-series are common, and the network intrusions we seek involve coincident anomalies over multiple connected pairs of computers. We show empirically that each time-series is nearly always independent of the time-series of other pairs of communicating computers. This independence is used to build models of normal activity in local areas from the models of the individual time-series, and these local areas are designed to detect the types of intrusions we are interested in. We define a locality statistic calculated by testing for deviations from historic behavior in each local area, and then define a scan statistic as the maximum deviation score over all local areas. We show that identifying these local anomalies is sufficient to correctly identify anomalies of various relevant shapes in the network. Supplementary material, including additional details and simulation code, are provided online.

Original languageEnglish (US)
Pages (from-to)403-414
Number of pages12
JournalTechnometrics
Volume55
Issue number4
DOIs
StatePublished - Nov 1 2013
Externally publishedYes

Fingerprint

Scan Statistic
Anomalous
Time series
Subgraph
Anomaly
Statistics
Computer Networks
Computer networks
Deviation
Coincident
Human Behavior
Locality
Markov Model
Statistic
Sufficient
Testing
Subset
Communication
Model
Industry

Keywords

  • Anomaly detection
  • Dynamic graph
  • Network intrusion detection
  • Path
  • Star

ASJC Scopus subject areas

  • Statistics and Probability
  • Modeling and Simulation
  • Applied Mathematics

Cite this

Scan statistics for the online detection of locally anomalous subgraphs. / Neil, Joshua; Hash, Curtis; Brugh, Alexander; Fisk, Mike; Storlie, Curtis.

In: Technometrics, Vol. 55, No. 4, 01.11.2013, p. 403-414.

Research output: Contribution to journalArticle

Neil, Joshua ; Hash, Curtis ; Brugh, Alexander ; Fisk, Mike ; Storlie, Curtis. / Scan statistics for the online detection of locally anomalous subgraphs. In: Technometrics. 2013 ; Vol. 55, No. 4. pp. 403-414.
@article{95876c0f62ba40fab7a7d2ffb1c7bddf,
title = "Scan statistics for the online detection of locally anomalous subgraphs",
abstract = "We introduce a computationally scalable method for detecting small anomalous areas in a large, time-dependent computer network, motivated by the challenge of identifying intruders operating inside enterprise-sized computer networks. Time-series of communications between computers are used to detect anomalies, and are modeled using Markov models that capture the bursty, often human-caused behavior that dominates a large subset of the time-series. Anomalies in these time-series are common, and the network intrusions we seek involve coincident anomalies over multiple connected pairs of computers. We show empirically that each time-series is nearly always independent of the time-series of other pairs of communicating computers. This independence is used to build models of normal activity in local areas from the models of the individual time-series, and these local areas are designed to detect the types of intrusions we are interested in. We define a locality statistic calculated by testing for deviations from historic behavior in each local area, and then define a scan statistic as the maximum deviation score over all local areas. We show that identifying these local anomalies is sufficient to correctly identify anomalies of various relevant shapes in the network. Supplementary material, including additional details and simulation code, are provided online.",
keywords = "Anomaly detection, Dynamic graph, Network intrusion detection, Path, Star",
author = "Joshua Neil and Curtis Hash and Alexander Brugh and Mike Fisk and Curtis Storlie",
year = "2013",
month = "11",
day = "1",
doi = "10.1080/00401706.2013.822830",
language = "English (US)",
volume = "55",
pages = "403--414",
journal = "Technometrics",
issn = "0040-1706",
publisher = "American Statistical Association",
number = "4",

}

TY - JOUR

T1 - Scan statistics for the online detection of locally anomalous subgraphs

AU - Neil, Joshua

AU - Hash, Curtis

AU - Brugh, Alexander

AU - Fisk, Mike

AU - Storlie, Curtis

PY - 2013/11/1

Y1 - 2013/11/1

N2 - We introduce a computationally scalable method for detecting small anomalous areas in a large, time-dependent computer network, motivated by the challenge of identifying intruders operating inside enterprise-sized computer networks. Time-series of communications between computers are used to detect anomalies, and are modeled using Markov models that capture the bursty, often human-caused behavior that dominates a large subset of the time-series. Anomalies in these time-series are common, and the network intrusions we seek involve coincident anomalies over multiple connected pairs of computers. We show empirically that each time-series is nearly always independent of the time-series of other pairs of communicating computers. This independence is used to build models of normal activity in local areas from the models of the individual time-series, and these local areas are designed to detect the types of intrusions we are interested in. We define a locality statistic calculated by testing for deviations from historic behavior in each local area, and then define a scan statistic as the maximum deviation score over all local areas. We show that identifying these local anomalies is sufficient to correctly identify anomalies of various relevant shapes in the network. Supplementary material, including additional details and simulation code, are provided online.

AB - We introduce a computationally scalable method for detecting small anomalous areas in a large, time-dependent computer network, motivated by the challenge of identifying intruders operating inside enterprise-sized computer networks. Time-series of communications between computers are used to detect anomalies, and are modeled using Markov models that capture the bursty, often human-caused behavior that dominates a large subset of the time-series. Anomalies in these time-series are common, and the network intrusions we seek involve coincident anomalies over multiple connected pairs of computers. We show empirically that each time-series is nearly always independent of the time-series of other pairs of communicating computers. This independence is used to build models of normal activity in local areas from the models of the individual time-series, and these local areas are designed to detect the types of intrusions we are interested in. We define a locality statistic calculated by testing for deviations from historic behavior in each local area, and then define a scan statistic as the maximum deviation score over all local areas. We show that identifying these local anomalies is sufficient to correctly identify anomalies of various relevant shapes in the network. Supplementary material, including additional details and simulation code, are provided online.

KW - Anomaly detection

KW - Dynamic graph

KW - Network intrusion detection

KW - Path

KW - Star

UR - http://www.scopus.com/inward/record.url?scp=84890034467&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890034467&partnerID=8YFLogxK

U2 - 10.1080/00401706.2013.822830

DO - 10.1080/00401706.2013.822830

M3 - Article

VL - 55

SP - 403

EP - 414

JO - Technometrics

JF - Technometrics

SN - 0040-1706

IS - 4

ER -