Containers on All Levels
When most developers think of containers, they think of Docker. A single command like docker run alpine
spins up an isolated environment, making containerization feel almost magical. But under the hood, there’s an entire stack of tools working together to make this happen. Understanding this stack is crucial for those who want to dive deeper into container internals, troubleshoot issues, or work in environments where Docker isn’t available.
What to Expect From This?
The following is a summary of notes about the responsabilities of each tool. This journey will help appreciate the abstractions provided. It is worth noting that this blog only focuses on the way docker does things. Tools like podman
are also in my backlog, so watch out for those! In addition, this tutorial assumes you know what a container is or what an image consists of. If not, this is a good place to start.
Docker
Docker provides a user-friendly way to run containers. It handles image pulling and networking with a simple frontend. Let’s see this in action by running an Nginx web server inside a container.
1
docker run --rm -p 8080:80 nginx
Once the container is running, you can verify that Nginx is serving pages by making an HTTP request: curl http://localhost:8080
. You should see the default Nginx welcome page in the terminal output.
It is worth noting that docker utilizes a client-server protocol. So, the docker
binary communicates with the dockerd
process running in the background via RESTful API calls. The daemon then checks if the image is available locally and pulls it if necessary. If something like compose
or swarm
was used, those configurations are handled as well. Afterwards, the execution is handed over to containerd
.
ContainerD
This tool handles cgroups, namespaces, networking and container life-cycle management. However, it assumes an image is present locally. This implies that when working with containerd, one needs to pull an image before running it:
1
2
sudo ctr images pull docker.io/library/nginx:latest
sudo ctr run --rm docker.io/library/nginx:latest nginx
The attentive reader might have noticed that the
ctr
binary is used here, instead ofcontainerd
. This is due toctr
being a neat wrapper aroundcontainerd
. You’re welcome to try this in “vanilla”containerd
, but you will quickly find your options on the CLI to be severly limited.
Since containerd foresees no way of providing a port-mapping for the container, an IP will have to be retrieved in a different manner. Since ctr
sets up the networking namespace for the container, the idea would be to look into this namespace for an IP.
1
2
3
4
5
6
7
8
9
10
11
12
13
sudo lsns | grep "nginx"
# --> 4026533340 mnt 9 63579 root nginx: master process nginx -g daemon off;
# --> 4026533341 uts 9 63579 root nginx: master process nginx -g daemon off;
# --> 4026533342 ipc 9 63579 root nginx: master process nginx -g daemon off;
# --> 4026533343 pid 9 63579 root nginx: master process nginx -g daemon off;
# --> 4026533344 net 9 63579 root nginx: master process nginx -g daemon off;
sudo nsenter --target 63579 -n ip addr show
# --> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
# --> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# --> inet 127.0.0.1/8 scope host lo
# --> valid_lft forever preferred_lft forever
# --> inet6 ::1/128 scope host proto kernel_lo
# --> valid_lft forever preferred_lft forever
So, the container is running, but it isn’t reachable from the outside! The bare minimum configuration that ctr
performs when starting a container is creating a new network namespace. Adding network interfaces and configuring them is usually delegated to various CNI plugins (e.g. flannel, calico) or a higher-level container runtime (e.g., docker). The former is out of scope and the latter was recently demonstrated 😜
After using the provided image and setting up the necessary namespaces and cgroups, containerd hands over the execution to runc
.
RunC
This was the original tool used to spin up containers.
Practically speaking, a container consists of:
- a tarball of files, and
- 1 config file glueing it all together
The tarball is the filesystem of the container: complete with all the binaries necessary to run the processes in the container. The config file is a JSON that dictates where to expect certain libraries and what to execute. Thus, in order to run the container, we need those two.
You can get the tarball like so:
1
2
3
4
5
mkdir -p container/rootfs
docker pull nginx
container_id=$(docker create nginx --name mynginx)
docker export "$container_id" | tar -C container/rootfs -xf -
docker rm "$container_id"
Your local folder doesn’t have to be called container, but I think it’s a nice way of directly knowing what’s in there. Next, you can generate the config file by running the specs on the container
directory:
1
2
cd container
runc spec
You might also want to change the config.json
like so:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
{
"ociVersion": "1.2.0",
"process": {
"terminal": false,
"user": {
"uid": 1000,
"gid": 1000
},
"args": [
"/usr/sbin/nginx",
"-g",
"daemon off;"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"effective": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"permitted": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"ambient": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
]
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"noNewPrivileges": true
},
"root": {
"path": "rootfs",
"readonly": false
},
"hostname": "runc",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/sys/fs/cgroup",
"type": "cgroup",
"source": "cgroup",
"options": [
"nosuid",
"noexec",
"nodev",
"relatime",
"ro"
]
}
],
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
]
},
"namespaces": [
{
"type": "pid"
},
{
"type": "network",
"path": "/var/run/netns/nginx_netw"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "cgroup"
}
],
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
Two of the most prominent changes were providing the .process.args
to the command that will be run when the container starts. For nginx, we just start the server. The value of terminal
is false, due to not wanting to spawn a shell within the container. The other important change is under .linux.namespaces.type
. Since we want to be able to view and be able to talk to the container later, we have to tell it in which namespace it is being deployed.
Now, lets create the aforementioned namespace. The following block creates the namespace and connects it via a virtual ethernet cable to the host. The ip assigned serves two purposes. Firstly, it makes the namespace reachable from host under a local ip. Secondly, it assigns an address for the services inside the namespace to use.
1
2
3
4
5
6
7
8
9
sudo ip netns add nginx_netw
sudo ip link add name veth-host type veth peer name veth-alpine
sudo ip link set veth-alpine netns nginx_netw
sudo ip netns exec nginx_netw ip addr add 192.168.10.1/24 dev veth-alpine
sudo ip netns exec nginx_netw ip link set veth-alpine up
sudo ip netns exec nginx_netw ip link set lo up
sudo ip link set veth-host up
sudo ip route add 192.168.10.1/32 dev veth-host
sudo ip netns exec nginx_netw ip route add default via 192.168.10.1 dev veth-alpine
You can now finally run the container via:
1
2
3
cd container
sudo runc create nginx --bundle .
sudo runc run nginx
At this point, you should see the average nginx output on your terminal. If you open a new one and curl http://192.168.10.1:8080
, you can also make sure you see the welcome page! Browsers like firefox work here as well!
Summary
This went through ways of starting containers in different levels. It started with the familiar docker, then continued to containerd and it closed with runc. I hope that after reading through or trying out the runc tweaks yourself, you can begin to appreciate and understand docker a bit more :D