External Attack Surface Monitoring (EASM)

In this post we are looking at a more complex security automation example: we will implement an External Attack Surface Monitoring/Management (EASM) solution using a workflow orchestration engine, as described in the first blog post.

What is EASM? There are definitions from different vendors that offer this as a service, like Tenable, Crowdstrike or Rapid7.

My own definition of EASM:

Every organization has assets, and some of these assets are exposed to the Internet. E.g. web application servers, load balancers, VPN Gateways, …
These assets might have known vulnerabilities (CVEs)
Alternatively, these assets might be misconfigured, like a web application that grants anonymous access or uses a default username/password for admin access
Ideally, we want to be able to detect these issues across all assets that are owned by the organization

Simplified, EASM means that we have to collect asset information, scan those assets and then report the findings.

Warning

Before you implement your own EASM workflow: contact your service providers and ask them whether you are allowed to scan your assets hosted in their environment.

Workflow Model

To implement EASM, we first have to collect the public addresses (IPs, DNS names) that are associated with our assets from different data sources. We can then perform a port scan to identify those addresses that expose network service ports to the public Internet ^[1], then run a vulnerability scan against those open ports and finally publish the findings obtained from these scans.

Modeled as a workflow, we end up with the following structure. Keep in mind that each workflow task generates an artifact that is used in another workflow task.

Address data could be collected from different sources and we might also be using multiple vulnerability scanners.

This is a slight more complex workflow compared to what we have seen in the previous post. We will therefore first take a more detailed look at the concepts in the following sections before diving into the implementation.

Address Collection

The first stage in our workflow is the collection of the public IP addresses and DNS names of our assets. We also want to include some asset metadata that will help us later in identifying potential findings.

For public Clouds (e..g. AWS, Azure, GCP) this is relatively easy. You can use the asset inventory workflow that was introduced in the previous post to collect asset data from your AWS accounts. You will then have an asset database that contains all relevant information, e.g. the public IP addresses associated with your virtual machines. The address information for your cloud environments can then be retrieved from your asset database using SQL queries.

For more "classical" on-premise systems you have to manually put the network CIDR ranges or individual IP addresses into a document that is accessible to your workflow via an API. One example to do this is the Google Sheets API. For a more high-level interface to fetch data from a Google Sheet you can use something like Steampipe with the Google Sheets plugin.

The collected address data from the different sources is merged into a single, large JSON file (addresses.json in the graph). We define the structure of this JSON file as follows, using an example with only two addresses for brevity:

[
{ 
  "address"  : "1.2.3.4",
  "resource" : "arn:aws:ec2:eu-west-1:123456789012:instance/i-abcdef0123",
  "type"     : "EC2 instance",
  "csp" : {
    "provider" : "AWS",
    "account"  : "123456789012",
    "org"      : "o-abcdef0123"
  }
},
{ 
  "address"  : "5.6.7.8",
  "resource" : "bare-metal",
  "type"     : "Data Center",
  "csp" : {
    "provider" : "Hosting Company Ltd.",
    "account"  : "Frankfurt DC",
    "org"      : ""
  }
}
]

	Address associated with a virtual machine (EC2 instance) in a particular AWS account that is located in a particular AWS organization. Cloud provider organization data is useful in case you own multiple organizations in one provider.
	Address associated with a machine in a "classical" data center. The organization field is empty on purpose.

While not present in our example, DNS names like machine.example.org are also valid values for the address field.

Port Scan

Given an address list, we now have to identify what machines are actually reachable over the public Internet. This requires performing a port scan. ^[2] This can be done with nmap or Naabu.

The output of this task is a subset of the initial address list. Naabu can generate JSON output natively, whereas for nmap you can generate XML output and convert that to JSON using JSON Convert (jc).

We then convert this intermediary JSON output to a simple text file that lists all addresses, with one entry per line:

1.2.3.4:80
1.2.3.4:443
5.6.7.8:22

We explicitly enumerate the identified open ports because this is required by the vulnerability scanner that we use in the next step. For other scanners this might not be necessary. In that case the list will contain addresses without port numbers.

Vulnerability Scans

The vulnerability scan is the heart of the entire EASM workflow. Given a list of addresses, we launch one or multiple vulnerability scanners to look for known vulnerabilities across our addresses.

The Nuclei scanner is an open source vulnerability tool that can be used to scan (web) applications, cloud infrastructure or networks. The signature files that define the scans to be performed are called templates in Nuclei. There is a repository with community maintained templates, but you can also write your own templates. For our purpose, the community library is sufficient. As this is a CLI tool, it is easy to integrate into our workflow.

The output of this task is a list of findings per IP address.

As we might want to use different scanners that have different output formats, we have to define a uniform JSON format. The Nuclei JSON output can then be transformed to this format using Json query (jq). The structure of the output file is as follows, using an example with only one finding for brevity:

{
  "tool": "Nuclei",
  "address": "1.2.3.4",
  "port": "22",
  "time": "2024-12-10T01:02:33.123456789Z",
  "title": "OpenSSH Terrapin Attack - Detection",
  "description": "The SSH transport protocol with certain OpenSSH extensions ...",
  "remediation": "One can address this vulnerability by temporarily disabling ...",
  "severity": "medium",
  "finding_id": "CVE-2023-48795"
}

When using different vulnerability scanners we have to merge findings obtained with each scanner into one large file in this uniform format, again using jq.

Joining Results

We now have a list of findings, where each finding can be shortly described like this:

Address 1.2.3.4 is affected by CVE-2024-48795.

But we want to improve this to also include metadata:

AWS virtual machine i-abcdef0123 with address 1.2.3.4 in AWS account 123456789012, region eu-west-1, is affected by CVE-2024-48795.

In the SQL world, there is the concept of SQL Joins where you combine data from different tables using a common column between them. We are going to use this approach to combine the scan results with the asset data.

As a reminder, the output of the address collection is a JSON file that contains asset metadata for each address. The output of the vulnerability scans is a JSON file that contains findings for each address. The common element between these two files is the address.

Tools like dsq ^[3] or clickhouse-local can be used to load JSON files into memory and execute a SQL statement on this data. Like for example performing a SQL join. When we use such a tool to join the results from these two files, our final output file will look like this:

{
  "address"     : "1.2.3.4",
  "port"        : "22",
  "tool"        : "Nuclei",
  "finding_id"  : "CVE-2023-48795",
  "title"       : "OpenSSH Terrapin Attack - Detection",
  "description" : "The SSH transport protocol with certain OpenSSH extensions ...",
  "remediation"  : "One can address this vulnerability by temporarily disabling ...",
  "severity"    : "medium",
  "time"        : "2024-12-10T01:02:33.123456789Z",
  "csp" : {
    "provider"     : "AWS",
    "org"          : "o-abcdef0123",
    "account"      : "123456789012",
    "address_type" : "EC2 instance",
    "resource_id"  : "arn:aws:ec2:eu-west-1:123456789012:instance/i-abcdef0123",
  }
}

The address metadata object has now been merged into the corresponding finding. This is a lot more useful when reporting findings to users.

Tip

When you have an asset owner database you can go even one step further: perform another SQL join of these findings with the asset owner data. You can then directly associate findings with the responsible asset owners and notify them directly.

Reporting Findings

Findings have to be reported to somebody so that remediation actions can be taken. There are different options where to store our findings and how to report them to human users.

Some options:

Send them to a data sink, like your SIEM system, where your standard alerting mechanism will be triggered
Post findings to a channel in your organization’s messenger system, like Mattermost, Slack, Teams, etc.,
Send out e-mails

Whatever notification system you are using, you have to think about whether you want to send out one message per finding or one summary message that provides an overview, e.g. how many findings there are per environment.

If you decide to send out messages for individual findings, these messages could look as follows:

New finding: CVE-2023-48795 (tool: Nuclei)
Severity: Medium
Address: 1.2.3.4
Resource type: EC2 instance, resource ID: i-abcdef0123
Environment: AWS o-abcdef0123, account ID 123456789012, region: eu-west-1
Finding: OpenSSH Terrapin Attack - Detection
…

So every time our EASM workflow completed, findings will result in notifications / messages being sent out. If there are no findings then we are also not going to hear anything.

Implementation

Now that we have discussed how to approach this from a conceptual perspective, we will now look into the implementation. We will not describe every individual workflow task, but instead focus on the important parts only. To keep things simple, we will only use one vulnerability scanner instead of two, as shown in the Workflow Model.

As described in the first blog post, we need a container image that contains all the tooling required for EASM. We assume that such an image is available as myrepo/easm-tooling:1.0.

As always with Argo WF, we will have two resource types, the CronWorkflow and the WorkflowTemplate.

CronWorkflow

The CronWorkflow defines the execution schedule and a reference to the actual workflow implementation.

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  generateName: easm-scans-
spec:
  schedule: "2 2 * * 0,3" 
  timezone: "Etc/GMT"
  workflowSpec:
    arguments:
      parameters:
    workflowTemplateRef:  
      name: easm-main

	Execute the workflow every Sunday and Wednesday at 02:02am GMT
	Reference to the WorkflowTemplate that contains the actual automation logic.

We are executing the workflow twice a week. Depending on how many addresses you have to scan and what scanner you are using, the duration of one EASM workflow execution might range from a few hours up to one full day and longer.

Overall WorkflowTemplate

We are first defining the overall workflow structure consisting of individual tasks. This implements the conceptual model defined in the section Workflow Model.

The following WorkflowTemplate specifies what tasks exist, their dependencies and how data is passed between them. We are using the Argo WF directed-acyclic graph (DAG) feature to specify everything.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: easm-main
spec:
  entrypoint: main
  templates:
  - name: main
    dag: 
      tasks:
        - name: get-addresses
          templateRef:
            name: easm-get-addresses
            template: main
        - name: port-scan
          templateRef:
            name: easm-port-scan
            template: main
          arguments:
            artifacts:
            - name: adresses 
              from: "{{tasks.get-addresses.outputs.artifacts.results}}"
          depends: "get-addresses" 
        - name: nuclei-scan
          templateRef:
            name: easm-nuclei
            template: main
          arguments:
            parameters:
            artifacts:
            - name: host-list
              from: "{{tasks.port-scan.outputs.artifacts.results}}"
          depends: "port-scan"
        - name: join-publish-findings
          templateRef:
            name: easm-join
            template: main
          arguments:
            artifacts:
            - name: nuclei-results
              from: "{{tasks.nuclei-scan.outputs.artifacts.results}}"
            - name: adresses
              from: "{{tasks.get-addresses.outputs.artifacts.results}}"
          depends: "get-addresses && nuclei-scan"

	The overall workflow is called `main` and consists of four sub-tasks (`get-addresses`, `port-scan`, `nuclei-scan`, `join-publish-findings`)
	The `artifacts` parameter defines that this task expects an input file. The file is taken from the output of the address collection task. This file contains the list of all collected addresses.
	The `depends` parameter ensures that this task will only be executed after the `get-addresses` task completed.

The actual tasks performing the work are defined in separate templates, each one specified in another file. We could put everything into the template shown above, but that would be one large file and difficult to read. These separate templates implementing the sub-tasks are referenced from the main template with the templateRef argument.

When this workflow is executed, the get-addresses task will be executed first by Argo WF because it has no dependencies. Once this task completed, the port-scan will be started that takes the address list from the first task as input artifact. Continuing the same way, every task of the workflow will be executed when it’s dependencies completed. Argo WF will retrieve files from completed tasks, as specified in the template, and makes them available to other tasks.

For more information on generating and consuming artifacts, please refer to the Argo WF documentation.

If one task fails, all depending tasks will not be executed and the overall workflow fails.

To not make this post longer than it already is, we will skip the implementation details of the first two tasks and immediately proceed to the vulnerability scanning task. The overall idea how to implement those two was provided in Address Collection and Port Scan.

Vulnerability Scan with Nuclei

Let’s look at how the implementation of the vulnerability scan sub-task looks like, which is defined in the WorkflowTemplate below.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: easm-nuclei
spec:
  entrypoint: main
  templates:
  - name: main
    inputs:
      artifacts: 
      - name: host-list
        path: /tmp/host-list.txt
    outputs:
      artifacts: 
        - name: results
          path: /tmp/nuclei-results.json
    script:
      image: myrepo/easm-tooling:1.0 
      command: [bash]
      source: | 
        set -e
        echo "Fetching latest Nuclei templates"
        nuclei -ut -silent
        echo "Performing scan over $(cat /tmp/host-list.txt | wc -l) entries"
        touch /tmp/nuclei-results.json
        nuclei -duc -silent -jsonl -dut \
          -s medium,high,critical \
          -l /tmp/host-list.txt \
          >> tmp-results.json
        echo "Transforming results into uniform JSON format"
        jq -c '. |
          ( .host | capture("(?:https?:\/\/)?(?<host>.+?)(?::(?<port>[0-9]*))?$")) as $capture_group |
          {
            "tool"         : "Nuclei",
            "address"      : $capture_group.host,
            "port"         : $capture_group.port,
            "time"         : .timestamp,
            "title"        : .info.name,
            "description"  : .info.description,
            "remediation"  : .info.remediation end,
            "severity"     : .info.severity,
            "finding_id"   : ."template-id",
          }' tmp-results.json > /tmp/nuclei-results.json
        echo "Done"

	The input artifact specifies that the provided input file will be mounted inside the container’s file system upon container startup
	Once the container execution completed, the output artifact will be retrieved from the container’s file system
	The container image to be used
	The bash script that will be executed inside the container

As a reminder, we are using Argo WF and therefore this workflow task is executed inside a container. The actual task logic is a bash script that consists of only a few lines. There are basically three things that happen:

Download the latest Nuclei scan templates
A scan is performed for the provided list of addresses
The raw JSON output provided by Nuclei is then transformed into the uniform JSON data format

After the script execution is completed, Argo WF will take the result file /tmp/nuclei-results.json from the container and make it available for use in other tasks.

Joining Results

As mentioned in Joining Results, we want to merge the scan results with the address metadata. With the right tooling, this can be achieved in just a few lines of code:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: easm-join
spec:
  entrypoint: main
  templates:
  - name: main
    inputs:
      artifacts:
        - name: addresses
          path: /tmp/addresses.json
        - name: scan-results
          path: /tmp/scan_results.json
    outputs:
      artifacts:
      - name: results
        path: /tmp/results_final.json
    script:
      image: myrepo/easm-tooling:1.0
      command: [bash]
      source: |
        echo "Joining address metadata with scan results"
        dsq \                    
          /tmp/scan_results.json \
          /tmp/addresses.json \
          'SELECT
             {0}.*,
             {1}."csp.provider" as cloud_provider,
             {1}."csp.account" as cloud_account,
             {1}."csp.org" as cloud_org,
             {1}.type as address_type,
             {1}.resource as resource
           FROM {0}
           LEFT JOIN {1} ON {0}.address == {1}.address' \
          > merged.json
        echo "Reformatting merged data" 
        jq -c '.[] |
          {
            "tool"         : .tool,
            "address"      : .address,
            "port"         : .port,
            "time"         : .time,
            "title"        : .title,
            "description"  : .description,
            "remediation"  : .remediation,
            "severity"     : .severity,
            "finding_id"   : .finding_id,
            "csp" : {
              "provider"     : .cloud_provider,
              "account"      : .cloud_account,
              "org"          : .cloud_org,
              "address_type" : .address_type,
              "resource_id"  : .resource,
            }
          }' merged.json > /tmp/results_final.json
        # additional code for reporting results
        # ...  
        echo "Done"

	This is the most important part: we are using `dsq` ^[3] to merge our two JSON files. The first file (index 0) contains the scan results, the second file (index 1) contains the address list. The SQL statement performing the Join is referencing these files.
	Unfortunately `dsq` can not generate nested JSON objects, so we have to recreate the `csp` object using `jq`.
	We should also publish the final result file (`/tmp/results_final.json`) to some system, e.g. a SIEM. This is omitted for brevity.

Using dsq we can perform a SQL join with the data from our two JSON files using the one field they have in common, the address. As a result we have on file that contains everything we need for reporting.

We are skipping the implementation of the publishing of findings, as this depends heavily on your environment (SIEM, etc. and messaging system).

Deploying

This workflow can be deployed as follows, assuming that the Kubernetes namespace $NS already exists and has been configured in Argo WF for workflow execution. We also assume that each task is defined in a dedicated file.

$ argo template create easm_get_addresses.yml -n $NS
$ argo template create easm_port_scan.yml -n $NS
$ argo template create easm_nuclei.yml -n $NS
$ argo template create easm_join.yml -n $NS
$ argo template create easm_main.yml -n $NS
$ argo cron create easm_cron.yml -n $NS --serviceaccount $K8S_SERVICE_ACCOUNT_NAME

The order is important: the main file (easm_main.yml) can only be created after all the sub-task templates have been created, as there is a direct dependency on them. Similarly, the CronWorkflow can only be created once all templates exist.

This will be executed automatically and regularly as defined by the cron definition. An Argo WF web UI screenshot from an executed EASM workflow is shown below.

This is a slightly more complex implementation, but it follows the same basic structure. Address data is collected from four different sources. Once the port scan completed, three different scanners are launched and findings are published to two different systems, across two steps (first individually for each scan result and then again for the aggregated results).

Summary

That was a long post. To summarize:

We are collecting asset address data (IP addresses, DNS names) from different sources
After a port scan identified what systems are reachable over the public Internet, we launch vulnerability scans against those addresses
Vulnerability findings are merged with the address metadata and published
We use JSON files for exchanging data between the different workflow tasks
We can split the workflow templates into smaller pieces, using one template per task ^[4]

Various open source tools can be used to implement EASM: port and vulnerability scanners as well as tools for processing JSON data.

As a final note, the workflow templates for the sub-tasks in this post have been simplified and stripped to the bare minimum. For use in an actual production environment these will have to be extended, e.g. with the Kubernetes secrets that provide required credentials, Kubernetes resource limits, etc.

1. We follow a black box approach: no assumptions are made on what network services might be exposed by an asset. Just because a system has a public IP doesn’t mean that it is actually exposed to the public Internet. There could be firewall rules preventing access. This is why a port scan has to be performed to identify those assets that expose services to the public Internet.

2. Because we are using a black box approach, the source IP address that is used by the scanner should not be on the allowlist of any asset to be scanned.

3. Unfortunately, dsq is no longer maintained :(

4. It is also possible to further nest the templates, e.g. the main template references a scan template that in turn references other templates where each one implements a different scanner.