Incident Classification Patterns: Introduction

Please provide the information below to view the online Verizon Data Breach Investigations Report.

Thank You.

Thank you.

You may now close this message and continue to your article.

  • The times they are a-changin’

    Remember 2014? Uptown was funky, Pharrell Williams was happy, and if you had a problem, you could shake it off. The DBIR first introduced the Incident Classification Patterns in 2014, as a useful shorthand for the sometimes complex combinations of VERIS Actors, Actions, Assets and Attributes that would frequently occur. The threat landscape has changed a bit since then, and we are now happy to introduce a refresh of the DBIR patterns.

    As you can imagine, this was a very hard decision for the team, but we were able to find strength and courage from the leadership shown by big bold refreshing business moves such as the release of New Coke and Crystal Pepsi.

  • Our new patterns explain 99.3% of analyzed breaches and 99.6% of analyzed incidents this year. They also explain 99.5% of quality breaches and 99.8% of quality incidents over all time.38

  • Figure 44
  • Figure 45
  • Of course, not everything has changed. Denial of Service, Basic Web Application Attacks,39 Lost and Stolen Assets, Miscellaneous Errors, Privilege Misuse, and Everything Else renew their contract for another season. Payment Card Skimmers, Crimeware, Cyber-Espionage, and Point of Sale are the MVPs40 retired to make room for a couple of seasoned minor league patterns, ready for the big leagues: Social Engineering and System Intrusion. 

    Now just because some names haven’t changed it does not mean the patterns are the same. What has been currently assigned to the 2021 version of Miscellaneous Errors (for example) is not necessarily what was in the 2014 Miscellaneous Errors. 

    The original patterns were based on a hierarchical clustering approach that helped derive some simple rules used to assign incidents to patterns. It was a very prescriptive process that worked quite well at the time, but we could see the strain starting to show.41

    The new patterns are based on an elegant machine-learning clustering42 process. Making this decision was a gamble in many ways, as we were committed to trust the data on this process, and it paid off. The new patterns clearly fell around the same ones that had been prescriptive before, but also better capture complex interaction rules the old ones were unlikely to handle. 

    Figures 46 and 47 give an idea of where incidents and breaches went between the old and new patterns. First, the easy-to-explain changes. Lost and Stolen Assets are still mostly in the Lost and Stolen Assets pattern. The same can be said for Miscellaneous Errors, Privilege Misuse, Basic Web Applications Attacks, and Denial of Service. What has changed starts with Payment Card Skimmers, which now falls squarely into Everything Else. It originally had some similarities to the current System Intrusion pattern in that that is where non-webapp payment card breaches ended up. Obviously, skimming isn’t really what we think of when we picture a popped system, so over to Everything Else it goes.

  • Figure 46
  • Of more interest are Point of Sale, Crimeware, Cyber Espionage and Everything Else. They are now defined by the characteristics of the breach. Was Social Engineering the significant aspect? To the new Social Engineering pattern it goes! Was it a simple attack where the initial intrusion point was the web application? To Basic Web Application Attacks it goes! Or was it more of an elaborate system intrusion where the attacker gained access and poked around, maybe without us even knowing how they got in? System Intrusion is just waiting to welcome those incidents with open arms like an old Journey song. Those big changes weren’t exactly planned (quite frankly nothing in the DBIR ever is in regard to what the data is going to tell us). 

    Thanks to the re-focused patterns, we can provide better guidance when one of those patterns appear at the top of your industry. Cyber Espionage and Crimeware could suggest a different complexity of the incidents in most cases, but your controls don’t care if the threat actor has a cushy government job or if they are a free-market enthusiast entrepreneur.

  • Figure 47
  • This is the way

    Coming up with new patterns was not a superficial process. It has been in the works for some time. Clustering DBIR data is not quite as straight-forward as it might seem. First, we have almost 2,600 columns in the dataset leading to almost assured overfitting. Second, our data is mostly logical rather than categorical or continuous, limiting the approaches that are likely to work. Third, we have over 800,000 rows in our dataset, again limiting the approaches that would work. Fourth,43 we are well aware that our clusters would be imbalanced. There would be some clusters with far fewer incidents and breaches than others. Fifth, the results needed to be somewhat explainable, always a fun proposition on change to large-scale, machine-learning. Sixth,44 whatever approach we took would have to provide rules we could use to classify data later since we shouldn’t be re-clustering things every single year. Seventh, we want it to be possible for an incident to be able to fit into two or more patterns in order to better capture the nuance of more elaborate incidents. All of these, plus the importance of getting it right, meant we’ve taken it slow and steady.

    Before we get to what did work, let’s talk about some of the things that didn’t work. We started with hierarchical clustering similar to the 2014 pattern’s original methodology. Unfortunately, it was too unbalanced, finding small, highly similar things instead of bigger trends; kind of like seeing the trees but not the forest. K-means clustering would have been ideal, however given the size of our data, it’s simply too memory intensive due to the number of all-to-all comparisons needed. Principal Component analysis didn’t penalize using lots of features enough for our needs. Latent Dirichlet Allocation was slightly better, but still not good enough. Lasso and Ridge Regression didn’t converge well. Association Rules did not differentiate clusters well and would have had to be paired with a predictor. Artificial Neural Networks (ANNs) would have provided prediction but not clustering.45 We even tried Gaussian Finite Mixture Model clustering, but it had the opposite problem of hierarchical clustering in that it all the clusters were minor variants on the big themes; kinda like seeing the forest, but not the trees.46

    What we eventually settled on was spherical k-means. It provided us the clustering benefits of k-means (ability to classify new data, find both small and large clusters, handle high dimensionality without overfitting, handle logical data, and be explainable) while also not choking on our rather large dataset. Normal k-means calculates the distance between all the rows in the dataset in the dimensional space of the number of columns. It randomly creates a set number of cluster centers and assigns the points to the closest center. It then recalculates the center of each cluster, and repeats the two steps until there are no significant changes in the cluster memberships. All those distance calculations take a lot of time and memory. Spherical k-means improves on that by calculating cosign distance, and taking advantage of that special structure to avoid calculating full object-to-object distance matrices.47

  • Figure 48
  • Even then, it would be 10 hyperparameter variations until we were sure the approach would work and an additional six cluster versions based on the 2021 DBIR data to finalize the model. We settled on 517 columns to cluster, primarily dealing with the VERIS 4A’s (Action, Actor, Asset, and Attribute), victim, targeted, timeline, and discovery method.

    We wanted to prioritize more recent incidents over older ones, and while we tried just using the last few years of data, we ultimately settled on an exponential weighting function. We used a Lloyd-48Forgy49 style fixed-point algorithm with local improvements via Kernighan-Lin chains.50,51 While we wanted pattern overlap, the spherical k-means fuzziness parameter yielded poor results, so instead we set it to hard partitions and, after clustering, included incidents in multiple clusters if the next closest cluster(s) were almost as close as the main cluster. And voilà. We have some new patterns to play with.

    From experience, we know that incident and breach data can be very different. Breaches are a subset of incidents, but many times more important than incidents for our analysis.52 To ensure that both incidents and data breaches were reflected in the patterns, we ran clustering twice, once for each. To pick the best number of clusters, we calculated the total sum of squares (a measure of success in clustering) for several different numbers of clusters.

    We then manually examined the patterns generated around the “bend” in the lines (around five for incidents and eight for breaches; see Figure 49). Eventually we settled on eight breach clusters and 10 incident clusters. After clustering, the clusters were examined and some were grouped together (five in System Intrusion; three in Privilege Misuse and Miscellaneous Errors; two in Basic Web Application Attacks, Social Engineering, and Lost and Stolen Assets, and one in Denial of Service); and then named, forming the new patterns.53 

  • The new patterns provide a clear framework for us to explain the threat landscape and for you to bring it to the stakeholders in your organization.

  • Table 2 is what we got for all of that work. In some places, nothing has changed. In some places, everything has changed. But, more importantly, the new patterns provide a clear framework for us to explain the threat landscape and for you to bring it to the stakeholders in your organization.

  • Table 2
  • 38 Last but not least, it kills 99.9% of germs on contact! Ok not really.

    39 Which, after going through an incredibly scientific, focus-tested rebranding, briefly became “The pattern formerly known as Web Applications,” but then but then legal, Verizon Branding and Communications said we couldn’t do that either. We were bummed, we had a symbol picked out and everything. 

    40 Most Valuable Patterns 

    41 Like a three-day holiday visit with your in-laws. 

    42 We will be talking about it in way more detail than necessary in the next part of the section.

    43 Even I thought this list would only be three items long, but man did we have a lot of challenges.

    44 Another one? We get it. It was hard.

    45 We also tried ANNs for clustering, specifically Self-organizing Maps, but that didn’t work either.

    46 You have our permission to read this out loud as many times as you would like on first dates and/or at family get-togethers and Super Bowl® parties.

    47 http://www.stat.cmu.edu/~rnugent/PCMI2016/papers/SphericalKMeans.pdf

    48 Lloyd, Stuart P. (1982)

    49 Forgy, Edward W. (1965)

    50 Dhillon, Guan and Kogan (2002)

    51 Did that make sense to you? We’ll be honest, we didn’t read the papers. We just chose that option on the software.

    52 It is the Data Breach Investigations Report, not the Data Incident Investigations Report, after all.

    53 Astute readers will realize we didn’t try to keep the old patterns at all. The fact that so many still shown through is a testament to the 2014 patterns’ longevity.

    54 Like that container you keep all the cables in for electronics you do not own anymore just in case.

Let's get started.