Scaling the Patching in Vulnerability Management

A vulnerability management (VM) program is one of the basic pillars of any security program (ID.RA-01 in NIST CSF). That doesn’t mean that it’s easy though. In fact, I think it can cause a lot of friction with the asset owners/manager who are ultimately responsible for patching their systems.

The problem is that as your infrastructure and application landscape grows, including software dependencies, the number of reported CVEs will grow as well. This applies to host machine packages, container level packages and application level libraries. When you have hundreds of CVEs, or even more, how can you prioritize what should be patched first?

This post presents a solution that helps with prioritizing a large number of CVEs. It is also possible to filter out findings based on certain criteria. This is particularly useful when dealing with large infrastructure or application environments.

Overall VM Process

Everything starts with collecting VM data. This is a well known problem and solution space:

Scan host machines or container images for installed packages
Identify your application level dependencies/libraries
You could also perform network based scans (= black-box approach)

Once you have a list of CVEs for each asset, the next steps are:

Automatically create tickets for new findings and assign them to the asset owners
Owners review findings, potentially with support from security
Issues should then be remediated (=vulnerable dependency patched)
In case remediation is not possible, an extension or exception has to be requested and approved

How to setup the overall process is beyond the scope of this post. Luckily there are already templates out there that can help you with this, like the Vulnerability Management Program Pack v1.2.

The remainder of this post will focus on the following aspect of the overall process: filtering and prioritizing the identified CVEs that are relevant and/or have high impact.

Filtering & Prioritizing

Let’s assume we have 500 CVEs spread across host, container and application libraries. We don’t want to overwhelm the owners of these assets by flooding them with this list of issues. but we want to ensure that the really important issues are fixed nevertheless. We first have to define what "important" means and will do this in a staged approach that can be automated.

This is primarily about prioritizing CVEs, but certain criteria can also be used to completely drop certain CVEs.

Package/Code in Use

This is primarily about false positive filtering: is the package or the code path that is affected by the CVE actually used by your systems? Some examples:

CVE reported for the Debian grep package for one of your host machines, but the grep command is actually not used on that system
Similar situation for a package installed inside a container image that is never used
Your application depends on a specific library that has a CVE. But the CVE only affects one specific function that is not being called by your application

In all these situations the CVE can not be exploited. You can therefore use this as a filtering criteria and completely drop those findings as noise. In general, it is a good idea to remove all packages that are not needed. This strategy of hardening your base images (container level or VM level) should be done independent of whether you actually have package in-use detection available or not.

The challenge lies in actually having this in-use detection. For host/container level package CVEs, this requires a runtime agent on the system that a) monitors what files are being accessed and b) associates this data with package manager information to identify the packages associated with these files. For code level CVEs, this feature is usually not part of the language itself. There are some commercial tools that support this. An exception to this is the Go vulnerability scanner of the Golang ecosystem.

CVSS Score

The Common Vulnerability Scoring System (CVSS) is probably the most basic metric that everyone is familiar with. A single risk score is calculated for a vulnerability based on several metric groups that include characteristics such as impact, complexity of the attack, attack vector (e.g. exploitable over a network connection) and others.

The result is a score between 0 (severity none) and 10 (severity Critical).

This can be used as both a priority and filtering criteria:

Higher score means higher priority
Filtering: e.g. we drop all CVEs that have a CVSS score below 7 (severity High). This is a meaningful first threshold value for when you are starting your VM program that can be further refined in the future.

EPSS + KEV

The next question is: how likely is it that a CVE will actually be exploited? This is captured by the Exploit Prediction Scoring System (EPSS). This model produces a probability score between 0 and 1 (=100%). The higher the score, the greater the probability that a vulnerability might be exploited over the next 30 days.

We can directly use this score for calculating priority.

Another important and related indicator is the CISA Known Exploited Vulnerabilities Catalog (KEV). It provides a list of vulnerabilities that have already been exploited in the wild. A CVE that is on this list effectively implies an EPSS score of 100%.

Our criteria for this section are:

Higher EPSS score means higher priority. If a CVE is listed in KEV, then use a score of 100%
Filtering: you can also filter CVEs based on whether they are already on the KEV list, or whether their EPSS score is above a certain threshold

Internet Exposed

Systems or applications that are directly exposed to the Internet have a higher risk compared to systems that are only available inside internal networks.

This has to be considered with care though: e.g. the Log4j vulnerability (CVE-2021-44228) can also affect systems that are located in private networks, for as long as a weaponized log string is forwarded to those vulnerable backend systems.

We will use this as a prioritization criteria with a value of either 0 (not exposed) or 1 (Internet exposed).

Asset Classification

The next question is: how important is the asset or the data stored/processed on the asset that is affected by the CVE?

How you determine this value and what criteria to use are beyond the scope of this post. Some examples:

Impact from an operational, reputational, financial, etc. perspective
Data sensitivity (e.g. PII/PHI information)

The result should be an asset value between 0 (lowest value) and 10 (highest value).

Formula

Let’s put everything together. The priority rating of a CVE with a range between 0.0 and 10.0 can be calculated as follows: \[ \mathrm{Prio} = \mathrm{CVSS} * max(\mathrm{EPSS},\mathrm{KEV}) * \frac{AC}{10} * \frac{1 + \mathrm{IE} * w_\mathrm{IE}}{1+w_\mathrm{IE}} \]

Where the variables are defined as follows:

CVSS
EPSS
KEV: 0.0 or 1.0
AC: asset classification
IE: Internet exposed (0.0 or 1.0) and it’s weight factor between 0 and 1

Some examples, using an IE weight of 0.333:

CVSS	EPSS	KEV	AC	IE	Prio
10	0.5	1	10	1	10
9	0.7	0	8	1	5.04
9	0.7	0	8	0	3.78
7	0.2	1	8	1	5.60
7	0.5	0	6	0	1.58

For your patching efforts, you can then decide that e.g. everything with a priority > 5 has to be patched as fast as possible. You can also use this to split your patching backlog into smaller, prioritized chunks.

Summary

This post presented a way to automatically prioritize CVEs based on several criteria. This should help with deciding what to fix first. It is also possible to completely filter out certain CVEs based on these criteria, e.g. using severity thresholds. This is especially helpful when you have to deal with a large number of CVEs.

Additional criteria that could be used to further extend the formula: whether the asset has protection mechanisms in place (e.g. WAF, network restrictions depending on exploit, etc.), remediation effort, …

It is important to mention that those scores have to be regularly recalculated, as certain values might change over time. E.g. the EPSS score provides an exploit probability for the next 30 days and could therefore change anytime.

While doing some research for this post I encountered the SecOps Risk-Based Prioritization Methodology that uses a similar approach called Context Prioritization Rating (CPR Score). It is worth checking out if you are interested in this topic. They attach weights to each parameter for further customization, e.g. you can define yourself how important the CVSS or EPSS score is for the final result.

And at last: the long term goal should be to always patch all important CVEs that have been detected on your systems. The presented approach is primarily intended to help with prioritization and to filter out some noise.