Taking KubeSpray on a journey from Docker

Björn Runåker
3 min readJan 13, 2021

As we all know, Docker is now deprecated in Kubernetes 1.20 and onward. We also know all the good reason why we should not panic. However, there are some interesting challenges with this change that did not have any resolution to be found on the Internet.

KubeSpray makes it easy to switch runtime to containerd and cri-o. Even adding KataContainers is just a matter of setting a switch. This text will describe how to solve one specific problem when switching to containerd. There maybe other write-ups for other container runtime and other settings regarding KubeSpray and found issues.

How to reset your cluster when you get “failed to destroy network for sandbox”

Getting your cluster to a clean state is important. Especially when changing container runtime. After switching container runtime problems started to occur. The problem with “failed to destroy network for sandbox” stopped the reset process and left the cluster in an inconsistent state.

The complete process to solve this type of problems is as follows:

First run the usual reset process from KubeSpray:

ansible-playbook -i inventory/intlab/hosts.yaml reset.yml -b -v — private-key=~/.ssh/id_rsa

Note: change the inventory name to match your cluster.

Then you need to run some extra commands on each server. Utilizing the benefit of the Ansible set up makes it quick and easy:

ansible kube-node -i inventory/intlab/hosts.yaml — become — become-user root -a ‘sudo apt-get -y — allow-change-held-packages purge containerd.io docker-ce runc’

This removes packages with purge to clean up as much as possible.

However, the purge is no enough. There are a few directories that must be cleaned as well. If not done, the cluster set up will fail.

ansible kube-node -i inventory/intlab/hosts.yaml — become — become-user root -a ‘ rm -r /etc/cni/net.d’

ansible kube-node -i inventory/intlab/hosts.yaml — become — become-user root -a ‘rm -r /var/lib/containerd’

Now a tougher problem arises. The last crud to remove are the containerd images, and they are mounted on the servers and locked from removal. The number of mounted file systems belonging to containerd is well over 100s so this little script solves this problem:

#!/bin/bash

for host in ml01.intlab ml02.intlab ml03.intlab ml04.intlab ml07.intlab ml08.intlab ml09.intlab ml10.intlab ml11.intlab ml12.intlab ml13.intlab ml16.intlab ml17.intlab ml18.intlab ml21.intlab ml22.intlab ml24.intlab ml25.intlab ml26.intlab ml27.intlab ml28.intlab ml29.intlab ml30.intlab ml31.intlab
do
echo $host
for i in $(ssh $host df | grep containerd | awk ‘{print $6}’)
do
echo $i
ssh $host sudo umount $i
done
done

Note: Your hosts will probable be named differently so go ahead and change the for loop.

Now the files are not locked anymore, so this script will do the rest:

ansible kube-node -i inventory/intlab/hosts.yaml — become — become-user root -a ‘rm -r /var/run/containerd/’

Now you have clean servers ready for an installation with KubeSpray (again).

ansible-playbook -i inventory/intlab/hosts.yaml cluster.yml -b -v — private-key=~/.ssh/id_rsa

Warning, your local storage of container images is probably wiped as well. This means that next installation will pull everything again from Docker Hub as an anonymous user. This may trigger this message "ERROR: Too Many Requests." which means your IP is banned for 6 hours.

Running the same command cluster setup command again usually gets the remaining containers well before 6 hours.

Let me know how it works for you and comment.

--

--

Björn Runåker

Software developer into deep learning in combination of Big Data and security