Containers, Jails, and chroots; An introduction

You probably heard of one of these by now. It’s a hard to avoid piece of jargon when you dabble even the tiniest amount in IT, and everyone seems to be really happy about them.

So what’s the fuzz all about? First, lets break down some core concepts to make them more understandable;

What is a chroot?

Lets start small. Some of you may already understand this one, but I do think it’s important to start here.

What does a chroot do? Well, it’s quite simply a pivot of the root filesystem, to another directory.

In layman’s terms;

  • Suppose you create a folder called /mybeautifulchroot on your root filesystem.
  • You then cp -a /bin/bash /mybeautifulchroot/bin/bash
  • Now you can (theoretically*) execute the command chroot /mybeautifulchroot /bin/bash, and voila! You appear to be running in a fresh system with only bash installed, and a completely empty root (/) filesystem.

(*: in practice you need more files/dependencies, as you’ll need to supply the complete root filesystem as Linux would use/create it for the userspace applications, but this example demonstrates the idea pretty well)

This is the first of a few pieces that make up a container. It allows you to create a fresh environment with completely different binaries than the host OS, which can be useful for many purposes, including impersonating other Linux distros or running different libraries from the ground up.

If you understand the concept of chroots, containers aren’t that big of a step up.

What is a container?

A running gag I used to hear a lot was “Containers are just spicy chroots”, which honestly isn’t that bad of a comparison; You’re just running applications in a pivoted/virtual filesystem, with virtual /dev/ /sys/ /proc/ and so forth, that pretend to be in an otherwise empty OS.

This allows you to, in addition to the different libraries and applications that chroots give you, pretend Linux is effectively running on a completely different machine. You can spawn virtual network cards with their own IPs separate from the host, you can mount networked or virtual filesystems into it, you can create a fresh list of applications and isolate it from those running on the host, etc etc.

These virtual ‘instances’ of the kernel are called Namespaces. A Namespace is effectively just a fresh instance of all the variables and components of the kernel that make up the userspace environment; the running processes, the network stack and its configuration, the list of mounted filesystems, and so forth.

These Namespaces can then be more finely controlled and limited down using the CGroups APIs, which you may have heard more about in the context of containers. CGroups is actually an addition on top of Namespaces, but usually people refer to CGroups when indicating the combination of both.

How does Docker/LXC/etc work?

Programs like Docker, LXC, Podman, Kubernetes, and so forth, are essentially abstractions of the Namespaces and CGroups APIs, to create a more coherent ecosystem for the sysadmins and ‘users’.

Docker for example

  • runs a central Daemon that takes a configuration from e.g Docker Hub or a dockerfile
  • Downloads the appropriate referenced components and dependencies (as virtual filesystems)
  • Creates a virtual root filesystem, and layers these downloaded virtual filesystems on top of each other to create one whole root filesystem for the container to use
  • Spawns a virtual network card and connects it to its internal virtual network
  • Uses the CGroups/Namespaces APIs to create a container with that composed root filesystem
  • Manages the container, stores its logs, migrates it to other servers if you’re running Docker Swarm, etc.

LXC and Kubernetes are much the same in this, but with different approaches, and more or less features.
LXC for example requires you to provide the root filesystem yourself.
Kubernetes includes everything to scale up to thousands of servers, setting up load balancing, redundancy, backups, automated provisioning, and loads more.

How do I get started with this?

There are many ways! And none of them are really wrong.

You could start with just running some Docker containers, there are many good examples to find on the Docker hub. Do be aware that you need to specify where it’ll store its files; by default, the container will be deleted after it exits, including all the data that was in it. Using something like Docker-compose makes this much easier, as you’re forced to define the configuration in advance, and are usually given a good example file by the developers of the software you wish to containerize.

If you’d like to see an example, I created a blog about what I’m using for my server. It explains how to set up a stack that automatically forwards multiple containers to the internet and requests LetsEncrypt certificates for you, and gives an example on how to use this with Jekyll.

If you want to experiment with the basics, you can also try set up some LXC containers. The simplest way to do this, is by installing virt-manager and setting up an ‘lxc’ connection. You will have to define a folder that it will use as chroot; Don’t use your root folder as it can mess with your host OS. Try make a copy, or download a root filesystem of some other distro, such as Ubuntu or Fedora.

A note on starting out with Kubernetes

Going all out and starting with Kubernetes may seem like an interesting idea, and it definitely is a mighty product to experiment with, but you’ll quickly find it really is made for the largest hyperscale datacenter deployments, and if you don’t have say, your own public IP subnets, BGP routing for your Autonomous Systems, and don’t need something like “Geo-aware redundancy”, then it’s perfectly fine to start with something smaller, like Docker for instance.

That said, if you’re the person that can deep dive into documentation and make it work, then by all means, I know people that went from a full Virtual Machine stack to purely kubernetes containers in a matter of weeks after they were introduced to it. Kubernetes does do a lot of thinking for you, and if you e.g want to do load balancing and centralized configuration management, it could be fun to mess with. Just be aware it’s got some overhead for smaller deployments, as it’s kinda made for much larger ones.

How to actually do the chroot thing

To get a chroot working, all you need is the right files and libraries, and permissions.

  • Create a folder where you want to set up your chroot; e.g /mychroot.
  • Copy the basic requirements into your filesystem; cp -a /usr /bin /sbin /mychroot (you can copy less, but make sure to include the dependencies the program requires!)
  • Make sure you have permissions. You probably have to turn SELinux off temporarily or it’ll prevent you from doing this; setenforce 0
  • Now you can chroot into your little environment; chroot /mychroot /bin/bash

Updated: