'NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from10.x.x.x - Error
I get this error after I start the worker node VMs(Kubernetes
) from the AWS
console. I am using PKS
( Pivotal Container Service
)
network for pod "xxxxx": NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from 10.x.x.x/xx
I supppose that Flannel
assigns a subnet lease to the workers in the cluster, which expires after 24 hours - and flannel.1
and cni0 /24
subnet no longer match, which causes this issue.
I also know a workaround:
bosh ssh -d worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld"
bosh ssh -d worker -c "sudo rm /var/vcap/store/docker/docker/network/files/local-kv.db"
bosh ssh -d worker -c "sudo /var/vcap/bosh/bin/monit restart all"
However is there any permanent fix to this?
Solution 1:[1]
TL;DR - recreate network
$ ip link set cni0 down
$ brctl delbr cni0
Or, as @ws_ suggested in comments - remove interfaces and restart k8s services:
ip link set cni0 down && ip link set flannel.1 down
ip link delete cni0 && ip link delete flannel.1
systemctl restart containerd && systemctl restart kubelet
Community solutions
It is a known issue
And there are some solutions to fix it.
Solution by filipenv is:
on master and slaves:
$ kubeadm reset $ systemctl stop kubelet $ systemctl stop docker $ rm -rf /var/lib/cni/ $ rm -rf /var/lib/kubelet/* $ rm -rf /etc/cni/ $ ifconfig cni0 down $ ifconfig flannel.1 down $ ifconfig docker0 down
you may need to manually
umount
filesystems from/var/lib/kubelet
before calling rm on that dir) After doing that I started docker and kubelet back again and restarted the kubeadm process
aysark: and kubernetes-handbook in a recipe for Pod stuck in Waiting or ContainerCreating
both recommend
$ ip link set cni0 down
$ brctl delbr cni0
Some workarounds from Flannel's KB article
And there is an article in Flannel
's KB: PKS Flannel network gets out of sync with docker bridge network (cni0)
Workaround 1:
WA1 is just like yours:
bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld"
bosh ssh -d <deployment_name> worker -c "sudo rm /var/vcap/store/docker/docker/network/files/local-kv.db"
bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit restart all"
Workaround 2:
If WA1 didn't help, KB recommends:
bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld"
bosh ssh -d <> worker -c "ifconfig | grep -A 1 flannel"
On a master node get access to etcd using the following KB
On a master node run `etcdctlv2 ls /coreos.com/network/subnets/`
Remove all the worker subnet leases from etcd by running `etcdctlv2 rm /coreos.com/network/subnets/<worker_subnet>;` for each of the worker subnets from point 2 above.
bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit restart flanneld"
Solution 2:[2]
I am running docker with kubernetes. Did the following on all my master and slave nodes and got my cluster work:
sudo su
ip link set cni0 down && ip link set flannel.1 down
ip link delete cni0 && ip link delete flannel.1
systemctl restart docker && systemctl restart kubelet
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Q. Qiao |