DBIR Appendices

Thank you.

You will soon receive an email with a link to confirm your access. When you click to confirm from your email, your document will be available for download.

If you do not receive an email within 2 hours, please check your spam folder.

Thank you.

You may now close this message and continue to your article.

  • Appendix A: Methodology

  • One of the things readers value most about this report is the level of rigor and integrity we employ when collecting, analyzing and presenting data.

    Knowing our readership cares about such things and consumes this information with a keen eye helps keep us honest. Detailing our methods is an important part of that honesty.

    First, we make mistakes. A column transposed here; a number not updated there. We’re likely to discover a few things to fix. When we do, we’ll list them on our corrections page: verizon.com/business/resources/ reports/dbir/2021/report-corrections/.

    Second, we check our work. The same way the data behind the DBIR figures can be found in our GitHub repository,78 as with last year, we’re also publishing our fact check report there as well. It’s highly technical, but for those interested, we’ve attempted to test every fact in the report.79

    Third, François Jacob described "day science" and "night science".80 Day science is hypothesis driven while night science is creative exploration. The DBIR is squarely night science. As Yanai et al demonstrate, focusing too much on day science can cause you to miss the gorilla in the data.79 While we may not be perfect, we believe we provide the best obtainable version of the truth81 (to a given level of confidence and under the influence of biases acknowledged below). However, proving causality is best left to the controlled experiments of day science. The best we can do is correlation. And while correlation is not causation, they are often related to some extent, and often useful.

    Non-committal disclaimer

    We would like to reiterate that we make no claim that the findings of this report are representative of all data breaches in all organizations at all times. Even though the combined records from all our contributors more closely reflect reality than any of them in isolation, it is still a sample. And although we believe many of the findings presented in this report to be appropriate for generalization (and our confidence in this grows as we gather more data and compare it to that of others), bias undoubtedly exists.

    The DBIR process

    Our overall process remains intact and largely unchanged from previous years. All incidents included in this report were reviewed and converted (if necessary) into the VERIS framework to create a common, anonymous aggregate data set. If you are unfamiliar with the VERIS framework, it is short for Vocabulary for Event Recording and Incident Sharing, it is free to use, and links to VERIS resources are at the beginning of this report.

    The collection method and conversion techniques differed between contributors. In general, three basic methods (expounded below) were used to accomplish this:

    1. Direct recording of paid external forensic investigations and related intelligence operations conducted by Verizon using the VERIS Webapp
    2. Direct recording by partners using VERIS
    3. Converting partners’ existing schema into VERIS

    All contributors received instruction to omit any information that might identify organizations or individuals involved.

    Some source spreadsheets are converted to our standard spreadsheet formatted through automated mapping to ensure consistent conversion. Reviewed spreadsheets and VERIS Webapp JavaScript Object Notation (JSON) are ingested by an automated workflow that converts the incidents and breaches within into the VERIS JSON format as necessary, adds missing enumerations, and then validates the record against business logic and the VERIS schema. The automated workflow subsets the data and analyzes the results. Based on the results of this exploratory analysis, the validation logs from the workflow, and discussions with the partners providing the data, the data is cleaned and re-analyzed. This process runs nightly for roughly two months as data is collected and analyzed.

    Incident data

    Our data is non-exclusively multinomial, meaning a single feature, such as “Action”, can have multiple values (i.e., “Social”, “Malware”, and “Hacking”). This means that percentages do not necessarily add up to 100%. For example, if there are 5 botnet breaches, the sample size is 5. However, since each botnet used phishing, installed keyloggers, and used stolen credentials, there would be 5 Social actions, 5 Hacking actions, and 5 Malware actions, adding up to 300%. This is normal, expected and handled correctly in our analysis and tooling.

    Another important point is that when looking at the findings, “Unknown” is equivalent to “Unmeasured.” Which is to say that if a record (or collection of records) contains elements that have been marked as “Unknown” (whether it is something as basic as the number of records involved in the incident, or as complex as what specific capabilities a piece of malware contained), it means that we cannot make statements about that particular element as it stands in the record—we cannot measure where we have too little information. Because they are "unmeasured," they are not counted in sample sizes. The enumeration “Other” is, however, counted as it means the value was known but not part of VERIS.  Finally, “Not Applicable” (normally “NA”) may be counted or not counted depending on the claim being analyzed.

    This year we again made use of confidence intervals to allow us to analyze smaller sample sizes. We adopted a few rules to help minimize bias in reading such data. Here we define ‘small sample’ as less than 30 samples.

    1. Sample sizes smaller than five are too small to analyze.
    2. We won’t talk about count or percentage for small samples. This goes for figures too and is why some figures lack the dot for the point estimate
    3. For small samples we may talk about the value being in some range, or values being greater/less than each other. These all follow the confidence interval approaches listed above

    Incident eligibility

    For a potential entry to be eligible for the incident/breach corpus, a couple of requirements must be met. The entry must be a confirmed security incident defined as a loss of confidentiality, integrity, or availability. In addition to meeting the baseline definition of "security incident" the entry is assessed for quality. We create a subset of incidents (more on subsets later) that pass our quality filter. The details of what is a "quality" incident are:

    1. The incident must have at least seven enumerations (e.g., threat actor variety, threat action category, variety of integrity loss, etc.) across 34 fields OR be a DDoS attack. Exceptions are given to confirmed data breaches with less than seven enumerations.
    2. The incident must have at least one known VERIS threat action category (Hacking, Malware, etc.)

    In addition to having the level of details necessary to pass the quality filter, the incident must be within the timeframe of analysis, (November 1, 2019, to October 31, 2020, for this report). The 2020 caseload is the primary analytical focus of the report, but the entire range of data is referenced throughout, notably in trending graphs. We also exclude incidents and breaches affecting individuals that cannot be tied to an organizational attribute loss. If your friend’s laptop was hit with Trickbot it would not be included in this report.

    Lastly, for something to be eligible for inclusion in the DBIR, we have to know about it, which brings us to several potential biases we will discuss on the next page.

  • Figure 133
  • Figure 134
  • Figure 135
  • Figure 136
  • Acknowledgement and analysis of bias

    Many breaches go unreported (though our sample does contain many of those). Many more are as yet unknown by the victim (and thereby unknown to us). Therefore, until we (or someone) can conduct an exhaustive census of every breach that happens in the entire world each year (our study population), we must use sampling.83 Unfortunately, this process introduces bias.

    The first type of bias is random bias introduced by sampling. This year, our maximum confidence is +/- 0.6%84 for incidents and +/- 1.5% for breaches, which is related to our sample size. Any subset with a smaller sample size is going to have a wider confidence margin. We’ve expressed this confidence in the conditional probability bar charts (the “slanted" bar charts) we have been using since the 2019 report.

    The second source of bias is sampling bias. Still, it is clear that we conduct biased sampling. For instance, some breaches, such as those publicly disclosed, are more likely to enter our corpus, while others, such as classified breaches, are less likely.

    Figures 136, 137, 138 and 139 are an attempt to visualize potential sampling bias. Each radial axis is a VERIS enumeration, and we have ribbon charts representing our data contributors. Ideally, we want the distribution of sources to be roughly equal on the stacked bar charts along all axes. Axes only represented by a single source are more likely to be biased. However, contributions are inherently thick tailed, with a few contributors providing a lot of data and many contributors providing a few records within a certain area. Still, we mostly see that most axis have multiple large contributors with small contributors adding appreciably to the total incidents along that axes.

    You’ll notice rather large contributions on many of the axes. While we’d generally be concerned about this, they represent contributions aggregating several other sources, so not actual single contributions. It also occurs along most axes, limiting the bias introduced by that grouping of indirect contributors.

    The third source of bias is confirmation bias. Because we use our entire dataset for exploratory analysis (night science), we do not test specific hypotheses (day science). Until we develop a good collection method for data breaches or incidents from Earth-616 or any of the other Earths in the multiverse, this is probably the best that can be done.

    As stated, we attempt to mitigate these biases by collecting data from diverse contributors. We follow a consistent multiple-review process and when we hear hooves, we think horse, not zebra.

    Data subsets

    We already mentioned the subset of incidents that passed our quality requirements, but as part of our analysis there are other instances where we define subsets of data. These subsets consist of legitimate incidents that would eclipse smaller trends if left in. These are removed and analyzed separately (as called out in the relevant sections). This year we have two subsets of legitimate incidents that are not analyzed as part of the overall corpus:

    1. We separately analyzed a subset of web servers that were identified as secondary targets (such as taking over a website to spread malware).
    2. We separately analyzed botnet-related incidents.

    Finally, we create some subsets to help further our analysis. In particular, a single subset is used for all analysis within the DBIR unless otherwise stated. It includes only quality incidents as described above and excludes the aforementioned two subsets.

    Non-incident data

    Since the 2015 issue, the DBIR includes data that requires the analysis that did not fit into our usual categories of “incident” or “breach.” Examples of non-incident data include malware, patching, phishing, DDoS, and other types of data. The sample sizes for non-incident data tend to be much larger than the incident data, but from fewer sources. We make every effort to normalize the data (for example weighting records by the number contributed from the organization so all organizations are represented equally). We also attempt to combine multiple contributors with similar data to conduct the analysis wherever possible. Once analysis is complete, we try to discuss our findings with the relevant contributor or contributors so as to validate it against their knowledge of the data.

  • Appendix B: Controls

    1. Inventory and Control of Enterprise Assets
    2. Inventory and Control of Software Assets
    3. Data Protection
    4. Secure Configuration of Enterprise Assets and Software
    5. Account Management
    6. Access Control Management
    7. Continuous Vulnerability Management
    8. Audit Log Management
    9. Email and Web Browser Protections
    10. Malware Defenses
    11. Data Recovery
    12. Network Infrastructure Management
    13. Network Monitoring and Defenses
    14. Security Awareness and Skills Training
    15. Service Provider Management
    16. Application Software Security
    17. Incident Response Management
    18. Penetration Testing

    Hopefully you didn’t think we had forgotten about this section and were planning to end the report on a doom and gloom note?

    Never fear, back by popular demand from auditors, CISOs and control freaks in general, we’re updating our mapping with the community-built CIS Controls. If you haven’t heard, they have gone through a major update for their eighth iteration, much like our patterns have this year, and have been creatively named CIS Controls85. Fortunately, there’s no “should’ve had a V8” of the Controls mapping to VERIS, because we’ve got you covered.

    The CIS Controls are a community-built, maintained and supported series of best practices targeted at helping organizations prioritize their defenses based on what attackers are doing, the so-called “Offense informs Defense” approach to best practices. The DBIR is but one resource of attacker knowledge at the macro level. Nevertheless, we were fortunate enough to be in a position to provide feedback and suggest input into their community process.  Whether you are presenting your NIST Cybersecurity Framework (CSF) strategic roadmap at the Board level or defending an individual funding request for a new security program initiative, our goal is to allow you to easily tie our findings and data to your organization’s efforts. We are thrilled to witness the evolution of the best practices due to the hard work of the individuals that donated their valuable time to help. Here is an overview of what has changed:

    • Incorporating technologies such as cloud and mobile
    • In recognition of “borderless” networks and tighter coordination between network/system administrators, the Controls are organized by activity, resulting in reducing the number of Controls from 20 to 18
    • Reordering of Controls to show the importance of Data Protection (formerly 13, now 3)
    • Addition of a “Service Provider Management” Control to address how organizations should manage cloud services

    One of the more helpful components that the CIS community has decided to continue from v7 are the Implementation Groups (IG), which help organizations further prioritize their implementation of Controls based on their resources, risk and other factors. The notion being that while every organization needs security, the giant, international, leader on ethical pharmaceutical practices Umbrella Corp probably needs a larger and different set to protect its research facilities in Raccoon City than does the local pet hotel. The IGs build on each other, with Implementation Group 1 being the starting point where a smaller subset of the Controls are implemented,  (approximately 36%) ,and then building all the way up to Implementation Group 3, where all 153 safeguards are implemented.

    Figure 137 breaks out the mapping into more granular detail and shows the relationships between the patterns and the overlap with the CIS Control for each Implementation Group.

    In the report, you have hopefully noticed the addition of the Top Protective Implementation Group 1 Controls listed for each industry. By using the combination of the mappings to patterns, implementation groups, and security functions of the Controls, we identified the core set of Controls that every organization should consider implementing regardless of size and budget:

    Control 4: Secure Configuration of Enterprise Assets and Software.

    This control is not only a mouthful, but it also contains safeguards focused on engineering solutions that are secure from the outset, rather than tacking them on later. In this Control you will see substantial benefit toward reducing Error-based breaches like Misconfiguration and Loss of assets through enforcing remote wipe abilities on portable devices.

    Control 5: Account Management.

    While this is technically a new Control in version 8, it should be extremely familiar as the safeguards are really just a centralization of the previous account management practices that were found in a few previous Controls, like Boundary Protect and Account Monitoring and Control. This control is very much targeted toward helping organizations manage the access to accounts and is useful against brute forcing and credential stuffing attacks.

    Control 6: Access Control Management.

    This is Control 5’s little cousin in which instead of simply looking at the user accounts and managing access to those, you’re managing the rights and privileges and lastly enforcing multifactor authentication on key components of the environment, a useful tactic against Use of stolen credentials.

    Control 14: Security Awareness and Skills Training.

    This control is a classic and hopefully doesn’t need a whole lot of explanation. Considering the high prevalence of Errors and Social Engineering, it is obvious that awareness and technical training are probably a smart place to put some dollars to help support your team against a world full of cognitive hazards.

  • Figure 137
  • Appendix C: U.S. Secret Service

  • David Smith
    Special Agent in Charge Criminal Investigative Division U.S. Secret Service

    Bernard Wilson
    Network Intrusion Response Program Manager Criminal Investigative Division U.S. Secret Service

    The year 2020 will be remembered as the year of the COVID-19 global pandemic, with its short and long-term impacts. The pandemic began with lockdowns and a rapid transition to remote work, and continued with economic slowdowns and associated relief efforts. The pandemic affected all aspects of life and was particularly conducive to cybercrime.

    In a matter of weeks, organizations had to transition to remote work, where possible. The reliance of a vastly expanded remote workforce resulted in a surge in the number and severity of attacks related to the weaknesses in underlying Internet and information technology infrastructure. This led to an increase in the number of incidents associated with the telework portion of the Business Continuity Plan (BCP) for many organizations. BCPs generally contain provisions for remote access to services available on an organization’s network, a proliferation in email traffic for internal communications, and an increased reliance on enterprise video and audio communications. With this shift came an increase in malware and social engineering attacks, consistent with the exploitation of general communications.

    Organizations that neglected to implement multi-factor authentication, along with virtual private networks (VPN), represented a significant percentage of victims targeted during the pandemic. The zero-trust model for access quickly became a fundamental security requirement rather than a future ideal. Nonrepudiation via Personal Identity Verification (PIV), Fast Identity Online (FIDO), or similar solutions became essential in zero-trust architectures. Security postures and principles, such as proper network segmentation, the prevention of lateral movement, least privilege, and “never trust, always verify” have proven to be strong indicators of an organization’s ability to prevent or recover from unauthorized presence in its network environment.

    In 2020, in the midst of the pandemic, cyber actors increased malware attacks against U.S. victims, including the healthcare and public health sector. The U.S. Secret Service noted a marked uptick in the number of ransomware attacks, ranging from small dollar to multi-million dollar ransom demands. While most organizations had adequate data backup solutions to mitigate these attacks, cyber actors shifted their focus to the exfiltration of sensitive data. These cyber actors, often organized criminal groups, proceeded to monetize the theft by threatening to publicize the data unless additional ransom was paid. The monetization of proceeds was typically enabled by cryptocurrency, in an attempt to obfuscate the destination of proceeds and hamper the ability of law enforcement to locate and apprehend those responsible for the crime.

    One of the primary responsibilities of the Secret Service is to protect the financial infrastructure of the United States. The pandemic required an unprecedented response from the Federal government. Legislators approved the release of $2.6 trillion of taxpayer funds to address the economic effects of the pandemic on the nation. The release of federal funding attracted the attention of organized criminal groups and individuals attempting to exploit pandemic relief programs. As a result, preventing and deterring pandemic relief fraud became the focus of the Secret Service and other law enforcement agencies, particularly focused on Federal funding allocated to states for unemployment benefit programs. The Secret Service worked with law enforcement partners at the U.S. Department of Labor to prevent criminal activity and arrest those responsible for exploiting the programs. This effort prevented more than $1.5 billion from reaching criminals and ensured that hundreds of millions of dollars intended to provide support to affected communities was returned to the states and the intended recipients.

    Preventing and deterring pandemic relief fraud became the focus of the Secret Service and other law enforcement agencies, particularly focused on Federal funding allocated to states for unemployment benefit programs.

    Yet in spite of these efforts, criminals continued attempting to divert pandemic relief funds from different programs, to include $697.3 billion in loans intended to support businesses. The Secret Service and partner law enforcement agencies have expanded our efforts to prevent and mitigate these crimes, and ultimately locate and arrest those responsible.

    The year 2020 demonstrated, once again, the enduring threat posed by organized cyber-criminal groups. Whether the crime involves a hospital ransomware attack, the sale of exfiltrated customer data, ATM cash-out attacks, or the theft of pande2ic relief funds, the common indicator is the prevalence of organized crime. Criminals can be either formally or informally organized, at times in partnership with nation-state malicious actors, based on a common interest in illicit profit. Cyber actors quickly shift their activity based on emerging opportunities to steal and launder funds using any tactics, techniques, and procedures available to them. Collaboration between domestic and foreign law enforcement partners to combat cybercriminal groups and their schemes is key to dismantling organized crime and apprehending cyber actors.

    To address this continued shift of criminality, the Secret Service operates a network of Cyber Fraud Task Forces (CFTF), a partnership of federal, state, local, and foreign law enforcement agencies, prosecutors, the private sector, and academia. Outreach is at the core of the Secret Service CFTFs, as it fosters trusted relationships and information sharing, which are important tools in mitigating cybercrimes. While apprehending criminals is, and will continue to be, the ultimate goal of the Secret Service, prevention and mitigation are equally critical in the protection of the U.S. financial infrastructure.

  • Appendix D: Contributing organizations

  • A
    Akamai Technologies
    Apura Cybersecurity Intelligence
    Arics Cooper
    Atos (Paladion)

  • B
    Bad Packets
    Bit Discovery
    BlackBerry Cylance

  • C
    Center for Internet Security
    CERT European Union
    CERT National Insider Threat Center
    CERT Polska
    Checkpoint Software Technologies Ltd.
    Cisco Talos Incident Response
    Computer Incident Response Center Luxembourg (CIRCL)
    Cybersecurity and Infrastructure Security Agency (CISA)
    CyberSecurity Malaysia, an agency under the Ministry of Communications and Multimedia (KKMM)
    Cybir (formerly DFDR Forensics)

  • D
    Digital Shadows
    Dragos, Inc

  • E
    Elevate Security
    Emergence Insurance

  • F
    Farsight Security
    Federal Bureau of Investigation - Internet Crime Complaint Center (FBI IC3)

  • G
    Global Resilience Federation
    Government of Telangana, ITE&C Dept., Secretariat 
    Government of Victoria, Australia - Department of Premier and Cabinet (VIC)
    Grey Noise

  • H
    Hasso-Plattner Institut
    Homeland Security Solutions B. V (HLSS

  • I
    ICSA Labs
    Irish Reporting and Information Security Service (IRISS-CERT)

  • J

  • K

  • L
    Lares Consulting
    Legal Services - ISAO
    LMG Security

  • M
    Malicious Streams
    Maritime Transportation System ISAC (MTS-ISAC)
    Micro Focus
    Mishcon de Reya

  • N
    National Cybersecurity & Communications Integration Center (NCCIC)

  • P
    ParaFlare Pty Ltd

  • Q

  • R
    Recorded Future

  • S
    Shadowserver Foundation
    SISAP - Sistemas Aplicativos

  • T
    Tetra Defense

  • U
    United States Computer Emergency Readiness Team (US-CERT)
    U.S. Secret Service

  • V
    VERIS Community Database
    Verizon Cyber Risk Programs
    Verizon DDoS Shield
    Verizon Digital Media Services
    Verizon Managed Security Services - Analytics (MSS-A)
    Verizon Network Operations and Engineering
    Verizon Professional Services
    Vestige Digital Investigations
    Verizon Threat Research Advisory Center (VTRAC)

  • W
    WatchGuard Technologies

  • Z

  • 2021 Data Breach Investigations Report contributing organizations
  • 78 https://github.com/vz-risk/dbir/tree/gh-pages

    79 Interested in how we test them? Check out Chapter 9, Hypothesis Testing, of ModernDive: https://moderndive.com/9-hypothesis-testing.html

    80 Jacob F. The Statue Within: An Autobiography. CSHL Press; 1995. By way of Selective attention in hypothesis-driven data analysis, Itai Yanai, Martin Lercher, bioRxiv 2020.07.30.228916;

    81 Really. They made printing the data print a gorilla and people trying to test hypotheses completely missed it

    82 Eric Black, “Carl Bernstein Makes the Case for ‘the Best Obtainable Version of the Truth,’” by way of Alberto Cairo, “How Charts Lie” (a good book you should probably read regardless).

    83 Interested in sampling? Check out Chapter 7, Sampling, of ModernDive: https://moderndive.com/7-sampling.html

    84 Interested in sampling? Check out Chapter 7, Sampling, of ModernDive: https://moderndive.com/7-sampling.html

    85 This and all confidence intervals are 95% confidence intervals determined through bootstrap simulation or Markov Chain Monte Carlo. Read more in Chapter 8, Bootstrapping and Confidence Intervals, of ModernDive: https://moderndive.com/8-confidence-intervals.html

Let's get started.