Vulnerability IDs, Fast and Slow
The CERT/CC Vulnerability Analysis team has been engaged in a number of community-based efforts surrounding Coordinated Vulnerability Disclosure lately. I've written previously about our involvement in the NTIA Multistakeholder Process for Cybersecurity Vulnerabilities. Today I'll highlight our ongoing work in the Forum for Incident Response and Security Teams (FIRST). We are currently active in two vulnerability-related working groups within the FIRST organization: the Vulnerability Coordination SIG (recently merged with the NTIA Multiparty Disclosure working group), and the Vulnerability Reporting and Data eXchange SIG (VRDX-SIG). At the CERT Vendor Meeting on February 29, I presented some of our current work within the VRDX-SIG. Given a number of developments in the intervening week I'm introducing that work to a broader audience in this post.
Over the past few years, I have been studying the coordinated vulnerability disclosure (CVD) process as we have implemented it in the CERT/CC. In that time, I've made a few observations that started my team thinking about what the CVD process might look like in a few years' time.
Different Things, Same Identifier
It began with the realization that we have used the VU# namespace as an identifier for three related yet distinct things. From our perspective, there are
- the vulnerability disclosure cases we handle
- the documents we publish about vulnerabilities (i.e., CERT Vulnerability Notes)
- the vulnerabilities those cases and documents describe
Now this isn't really a problem as long as one case describes one vulnerability, resulting in one document. And while that's true for most of the cases we handle, the reality is that the bulk of our coordination effort goes into the exceptions--cases with multiple vendors or multiple vulnerabilities.
We started to get a hint that a problem was brewing when we began developing our Basic Fuzzing Framework and almost immediately found more exploitable or probably exploitable test cases than we could possibly coordinate individually. Then in late 2014, Will Dormann's Android SSL work (original blog post, RSA 2015 slides) made this distinction unmistakable.
While we published only a single Vulnerability Note (VU#582497), in the end it covered 23,667 vulnerable apps. We had published multiple vulnerabilities in a single document before, but due to its sheer scale, this one case forced us to rethink many aspects of our coordination and publication processes.
We, of course, were not alone in discovering the effectiveness of fuzzing and other kinds of automated testing to find new vulnerabilities. The end result was that, from our perspective, the volume and arrival rate of incoming cases was unconstrained, while our capacity to produce documents could only react at a much slower rate (hiring and training wind up being on the critical path).
In the midst of all this, we realized that we needed to start thinking about how to handle cases containing many vulnerabilities and producing documents that might split on boundaries other than cases. This realization seemed to be driving us toward a multi-tracked process in which case IDs, document IDs, and vulnerability IDs could all run at different rates based on the demands of each process.
Every Vulnerability Database Makes Choices
Perhaps not coincidentally, as the CERT/CC's efforts expand into vulnerability coordination for non-traditional computing products (mobile, vehicles, medical devices, IoT, etc.), we've also begun to hit up against another set of issues affecting vulnerability identities and compatibility across vulnerability databases (VDBs): namely, bias.
Steve Christey Coley and Brian Martin talked about this issue in their BlackHat 2013 talk "Buying Into the Bias: Why Vulnerability Statistics Suck" (paper, slides, video). They mention a number of biases that affect all VDBs:
- Selection bias. Not all products receive equal scrutiny. Not all vul reports are included in VDBs.
- Publication bias. Not all results get published. Some vuls are found but never reported to anyone.
- Abstraction bias. This bias is an artifact of the process that VDBs use to assign identifiers to vulnerabilities. (Is it 1 vul or 3? 23,667 or 1?)
- Measurement bias. This bias encompasses errors in how a vulnerability is analyzed, verified, and catalogued.
In an ideal scientific world, bias can be factored into analytical results based on the data collected. But VDBs don't exist solely in the service of scientific purity.
I'll take it further and assert that every vulnerability database or catalog makes choices that are usually driven by the business requirements and organizational environments in which those VDBs operate. These choices include
- Sources of vulnerability information monitored. Monitoring all the potential sources of vulnerability information is unrealistic for resource-constrained VDBs; to date we have found none that are not so constrained. This choice is one source of selection bias.
- Inclusion and exclusion criteria. Rules that define what subset of records from the sources monitored will be included (or not) in the VDB must be decided. What kind of vulnerabilities does the VDB track? Is it platform specific? Is it just a single vendor collecting reports in its own products? Is it focused on a particular business sector? This choice is another source of selection bias.
- Content detail. How much (and what kind of) detail goes into each record in a VDB is something that must be decided. For example, whether to include exploit information, workarounds, detection criteria, etc.
- Abstraction. What is a "unit" vulnerability? Does this report represent one vul or many? That choice depends on what purpose the VDB serves. Christey and Martin cover this issue in their list of biases, describing it as "the most prevalent source of problems for analysis."
- Uncertainty tolerance. How certain is the information included in the record? Is the goal of the VDB to be authoritative on first publication? Or can it tolerate being wrong sometimes in favor of getting things out more quickly?
- Latency tolerance. How quickly do new records need to be placed in the VDB following the initial disclosure? This choice is a distinct tradeoff with uncertainty: consider the differences between breaking news coverage and a history book.
- Capacity constraints. As I mentioned above, incoming vul report volume is unconstrained while the capacity of a VDB to consume and process those reports into database records is decidedly not (especially with humans in the loop, and as of this writing they still are).
- Users and consumers of the data. Ultimately, a VDB must serve some useful purpose to some audience in order for it to continue to exist. There is a wide variety of uses for the information contained in VDBs (vulnerability scanning, vulnerability management systems, long-term trend analysis, academic research, quality improvement efforts, supporting acquisition or purchasing decisions, evaluating vendor process effectiveness, etc.), so it shouldn't be surprising that user requirements can drive many of the other choices the VDB operators have to make.
It's important to note that even if two vulnerability databases agree on the first items in the list above (sources to watch, inclusion criteria, content detail, and abstraction), over time it's easy to wind up with completely distinct data sets due to the latter items (uncertainty tolerance, latency tolerance, capacity constraints, and user needs).
Where We Are vs. Where We Need To Be
The vulnerability databases you are probably most familiar with, such as the National Vulnerability Database (NVD), Common Vulnerabilities and Exposures (CVE), OSVDB (currently offline, I don't know if that's permanent), and the CERT Vulnerability Notes Database have historically focused on vulnerabilities affecting traditional computing platforms (Windows, Linux, OS X, and other Unix-derived operating systems) with only a smattering of coverage for vulnerabilities in other platforms like mobile or embedded systems, websites, and cloud services.
In the case of websites and cloud services this gap may be acceptable since most such services are effectively single instances of a system and therefore only the service provider needs to apply a fix. In those cases there might not be a need for a common identifier since nobody is trying to coordinate efforts across responsible parties. But in the mobile and embedded spaces, we definitely see the need for identifiers to serve the needs of both disclosure coordination and patch deployment. (For more on the challenges of vulnerability analysis and disclosure coordination in embedded systems, see my earlier post What's Different About Vulnerability Analysis and Discovery in Emerging Networked Systems?)
Furthermore, there is a strong English language and English-speaking country bias in the major US-based VDBs (hopefully this isn't terribly surprising). But did you know that China has not one but two major VDBs: China National Vulnerability Database of Information Security (CNNVD) and China National Vulnerability Database (CNVD)? We have been working with CSIRTs around the world (e.g., JPCERT/CC and NCSC-FI) to coordinate vulnerability response for years and realize the importance of international cooperation and interoperability in vulnerability response.
Given all the above, and in the context of the surging prevalence of bug bounty programs, it seems likely that in the coming years there will be more, not fewer VDBs around the world than there are today. We anticipate those VDBs will cover more products, sectors, languages, countries, and platforms than VDBs have in the past.
Coordinating vulnerability response at local, national, and global scales requires that we have the means to relate vulnerability reports to each other, regardless of the process that originated them. Furthermore, whether they are driven by national, commercial, or sector-specific interests, there will be a need for interoperability across all those coordination processes and the VDBs they feed into.
Vulnerability IDs, Fast and Slow
Over time, it has become clear that the days of the "One Vulnerability ID to Rule Them All" were coming to a close and we need to start planning for that change. As I've covered above, one of the key observations we've made has been the growing need for multiple vulnerability identifiers and databases that serve different audiences, support diverse business practices, and operate at different characteristic rates.
- System 1: Fast, automatic, frequent, emotional, stereotypic, subconscious
- System 2: Slow, effortful, infrequent, logical, calculating, conscious
Making the analogy to CVD processes, the thing I notice is that historically there has been a need for slower, consistently high-quality, authoritative vulnerability records, trading off higher latency for lower noise. Deconfliction of duplicate records happens before issuance, and reconciliation of errors can be difficult. To date, this practice is the ideal that many VDBs have strived for. Those VDBs remain a valuable resource in the defense of systems and networks around the globe.
Yet there is a different ideal, just as valid: one in which vulnerability IDs are assigned quickly, possibly non-authoritatively, and based on reports of variable quality. This process looks more like "issue, then deconflict". For this new process to work well, post-hoc reconciliation needs to be easier.
If you're familiar with the gitflow process in software development you might recognize this distinction as analogous to the one between the develop and master branches of a software project. The bulk of the work happens in and around the develop branch, and only when things have settled out does the master branch get updated. (And merge conflicts are as inevitable as death and taxes.)
A Path Toward VDB Interoperability
And so it was that last fall I found myself presenting an idea to the FIRST VRDX-SIG that I had originally sketched out in early 2013 but shelved due to other work. That idea was for a vulnerability cross-reference scheme that would allow for widely distributed vulnerability ID assignments and VDBs to run at whatever rate is necessary while enabling the ability to reconcile them later once the dust clears:
- When necessary, the CVD process could operate in System 1 for quick response, and clean up any resulting confusion afterwards.
- A tactical response-focused VDB might be able to tolerate more uncertainty in trade for lower latency.
- A VDB with more academic leanings could do a deep-dive analysis on root causes in exchange for having fewer records and higher latency.
The main idea was that VDB records can be related to each other in one of the following ways:
- Equality and Inequality (two records describe the same vulnerability or vulnerabilities, or they refer to different ones)
- Superset and Subset (one record is more abstract than the other)
- Overlap (related but not fully contained)
But there is nothing new under the sun. Harold Booth and Karen Scarfone's October 2013 IETF Draft Vulnerability Data Model contained a vulnerabilityAliasEnumType that was conceptually very similar. However while it'd be great if we could get to a unified data model like the IETF draft for vulnerability information exchange eventually, for now the simplest thing that could possibly work seemed to be coming up with a way to relate records within or between vulnerability databases that explicitly addresses the choices and biases described above. The unified data model might be a longer way off, and we were anticipating the need to reconcile VDBs much sooner.
On March 3, 2016, the FIRST VRDX-SIG met in San Francisco (generously hosted by HackerOne) to refine some of the technical details of this cross-referencing scheme. We made significant progress on the simplified cross-reference data model and will be working to implement a pilot of it over the coming months.
That Was Last Week
Alas, the world doesn't slow down just because you're already working on something. Even more recently there have been a number of developments in the vulnerability ID space.
Just this week I've learned about not one, not two, but three distinct efforts to address perceived slowness in the CVE assignment process:
- Distributed Weakness Filing (DWF) - announcement, homepage
- OVE - announcement, details, homepage
- Open Vulnerability ID (OVI) - homepage
Synchronicity? Conspiracy? I have no idea. Maybe it's just an idea whose time has come.
But clearly, if that many folks have a problem they need to solve badly enough that they're starting to fork the idea of CVE, all in rapid succession, that seems to be strong evidence that the need for a cross-referencing mechanism is even greater than we originally thought.
Everything I've discussed in this post is work in progress, and some things are changing rapidly on a number of related fronts. Nevertheless, while it's hard to say how we'll get there, it seems inevitable that we'll eventually reach a point where vulnerability IDs can be issued (and deconflicted) at the speed necessary to improve coordinated global vulnerability response while maintaining our ability to have high-quality, trusted sources of vulnerability information.
Here in the CERT/CC Vulnerability Analysis team, we recognize the need for slower, "correct-then-issue" vul IDs as well as faster moving "issue-then-correct" vul IDs. We believe that there is room for both (and in fact many points in between). Our goal in participating in the VRDX-SIG is to enable interoperability between any willing VDBs. So we look forward to working with DWF, OVE, OVI, and others to integrate their process and learning.
We intend to continue our efforts to build a better way forward that suits everyone who shares our interest in seeing that vulnerabilities get coordinated and disclosed, and that patches are created and deployed, all with an eye toward minimizing societal harm in the process.
The amount of interest and parallel efforts that have recently emerged in this space encourages me that there is a willing community out there working to make it better. I, for one, am excited to be a part of it.