Marcin Jahn | Dev Notebook
  • Home
  • Programming
  • Technologies
  • Projects
  • About
  • Home
  • Programming
  • Technologies
  • Projects
  • About
  • An icon of the Networking section Networking
    • HTTP Protocol
    • OSI Model
    • TCP Procol
    • UDP Protocol
    • WebSocket
    • HSTS
    • DNS
    • Server Name Indication
    • gRPC
  • An icon of the Security section Security
    • OAuth2
      • Sender Constraint
    • Cryptography
      • Cryptography Basics
    • TPM
      • Overiew
      • TPM Entities
      • TPM Operations
  • An icon of the Linux section Linux
    • Gist of Linux Tooling
    • Unknown
    • SELinux
    • Containers
    • Bash Scripting
    • Linux From Scratch
    • Networking
  • An icon of the Kubernetes section Kubernetes
    • Meaning and Purpose
    • Cluster
    • Dev Environment
    • Kubernetes API
    • Objects
    • Pods
    • Scaling
    • Events
    • Storage
    • Configuration
    • Organizing Objects
    • Services
    • Ingress
    • Helm
  • An icon of the Observability section Observability
    • Tracing
  • An icon of the Databases section Databases
    • ACID
    • Glossary
    • Index
    • B-Tree and B+Tree
    • Partitioning and Sharding
    • Concurrency
    • Database Tips
  • An icon of the SQL Server section SQL Server
    • Overview
    • T-SQL
  • An icon of the MongoDB section MongoDB
    • NoSQL Overview
    • MongoDB Overview
    • CRUD
    • Free Text Search
  • An icon of the Elasticsearch section Elasticsearch
    • Overview
  • An icon of the Git section Git
    • Git
  • An icon of the Ansible section Ansible
    • Ansible
  • An icon of the Azure section Azure
    • Table Storage
    • Microsoft Identity
  • An icon of the Google Cloud section Google Cloud
    • Overview
  • An icon of the Blockchain section Blockchain
    • Overview
    • Smart Contracts
    • Solidity
    • Dapps
  • Home Assistant
    • Home Assistant Tips
  • An icon of the Networking section Networking
    • HTTP Protocol
    • OSI Model
    • TCP Procol
    • UDP Protocol
    • WebSocket
    • HSTS
    • DNS
    • Server Name Indication
    • gRPC
  • An icon of the Security section Security
    • OAuth2
      • Sender Constraint
    • Cryptography
      • Cryptography Basics
    • TPM
      • Overiew
      • TPM Entities
      • TPM Operations
  • An icon of the Linux section Linux
    • Gist of Linux Tooling
    • Unknown
    • SELinux
    • Containers
    • Bash Scripting
    • Linux From Scratch
    • Networking
  • An icon of the Kubernetes section Kubernetes
    • Meaning and Purpose
    • Cluster
    • Dev Environment
    • Kubernetes API
    • Objects
    • Pods
    • Scaling
    • Events
    • Storage
    • Configuration
    • Organizing Objects
    • Services
    • Ingress
    • Helm
  • An icon of the Observability section Observability
    • Tracing
  • An icon of the Databases section Databases
    • ACID
    • Glossary
    • Index
    • B-Tree and B+Tree
    • Partitioning and Sharding
    • Concurrency
    • Database Tips
  • An icon of the SQL Server section SQL Server
    • Overview
    • T-SQL
  • An icon of the MongoDB section MongoDB
    • NoSQL Overview
    • MongoDB Overview
    • CRUD
    • Free Text Search
  • An icon of the Elasticsearch section Elasticsearch
    • Overview
  • An icon of the Git section Git
    • Git
  • An icon of the Ansible section Ansible
    • Ansible
  • An icon of the Azure section Azure
    • Table Storage
    • Microsoft Identity
  • An icon of the Google Cloud section Google Cloud
    • Overview
  • An icon of the Blockchain section Blockchain
    • Overview
    • Smart Contracts
    • Solidity
    • Dapps
  • Home Assistant
    • Home Assistant Tips

Containers

Namespaces

Containers rely on namespaces in Linux. There are 8 kinds of namespaces:

  • cgroups
  • mount
  • process ID
  • network
  • IPC
  • UNIX time-sharing system (UTS) - hostname, domain name
  • users, groups
  • time

Each container is assigned with new namespaces of these types, creating an illusion for the process that it runs on a separate machine.

Diferent processes may also share some namespace types, but not others:

Networking

Here’s how network namespacing works. When a new container is started, it receives a set of interfaces that are placed in new namespace:

cgroups

cgroups are another feature of Linux kernel. It allows to limit system resources assigned to a process (CPU time, CPU cores, RAM, disk, network bandwidth).

When we’re setting restrictions for containers (e.g. --memory="100m"), container engine actualy uses cgroups to limit the process.

Capabilities

Containers should not be able to invoke sys-calls that may break other containers (like changing time, or loading kernel modules).

Docker has --privileged flag that gives special permissions to containers. It’s not ideal, because it gives ALL permissions.

Another option is to use Linux capabilities. There are many of them giivng granular access to specific operations.

Another option is seccomp (Secure Computing Mode). A custom profile (JSON file) can be applied to a containers listing sys-calls that it can make.

Further hardening may be achieved with AppArmor or SELinux (MAC).

Rootless Containers

Podman popularized the idea of using rootless containers. Previously, it was common to run containers with Docker with root privileges, which translated to root access on the host as well (although Docker does support rootless containers as well!).

Rootless containers make use of user namespaces. It gives us access to user mapping and allows us to run the container with any UID, even UID = 0, while on the host system that UID would be mapped to a “normal”, non-root user.

An easy way to test that is to run podman unshare id. It’s going to run id program in a user namespace. It will print “0” as a result. A non-root user on a host (typically 1000) translates to a root in the namespace. We can see the mapping offsets in the /etc/subuid file:

mnj:100000:65536

The output means, that the user 1 in a container will be mapped to user 100000 on a host. User 2 would be mapped to 100001, and so on. A maximum of 65536 users may be mapped (that value may be modified, it’s not a hard limit).

unshare

The podman unshare command is great help in translating host UIDs to container UIDs. If, for example, some container runs with UID 1002, and we want to to give that UID access to some files on the host, we could run podman unshare chown 1002:1002 some-file.

Volumes and SELinux

Mounting volumes in rootles context is different than it is with rootful environments. In the latter case, whatever you mount, it will probably just work without any further tinkering. In rootless podman, you will often experience issues with SELinux (if your system uses SELinux MAC). One of the ways to get around that is to apply the :z or :Z (private, additionally uses MCS) to volume definition. That will apply the right SELinux labeling to the files being shared as a volume (only if we’re attaching host dir, it is not needed when creating a volume entity).

Containers run with the container_t SELinux domain. They are allowed to access the container_file_t and container_ro_file_t typed files. The :z/:Z parameters apply the container_file_t to the mounted files.

It’s important to note that :z and :Z relabel the resource, which could impact some other processes that require the original labels to be there! After finishing the work with containers, it’s probably a good idea to reset the labeling with restorecon.

Another issue could be due to traditional DAC permissions. The user mapping also works for volumes, so a UID 0 in a container will map to UID 1000 on a host. So, a container will be able to access files of UID 1000 on the host.

Disable Labeling

Labeling is a security feature that keeps processes within containers from doing harm to files on the host. When the container is trusted though, we can relax the security by disabling SELinux label separation for containers.

We can do that for a single container while runing it with podamn run by adding --security-opt label=disable.

It’s also possible to disable SELinux for all podman containers via /etc/containers/containers.conf.

Standarization

Docker was the first container platform to make them popular. There’s CRI (Container Runtime Interface) that container platforms adhere to.

References

  • “Kubernetes in Action (Second Edition)” book
  • man namespaces
  • Rootless Containers
  • container_selinux
  • User namespaces with Podman (Red Hat)
←  SELinux
Bash Scripting  →
© 2023 Marcin Jahn | Dev Notebook | All Rights Reserved. | Built with Astro.