'NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from10.x.x.x - Error

I get this error after I start the worker node VMs(Kubernetes) from the AWS console. I am using PKS ( Pivotal Container Service)

network for pod "xxxxx": NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from 10.x.x.x/xx

I supppose that Flannel assigns a subnet lease to the workers in the cluster, which expires after 24 hours - and flannel.1 and cni0 /24 subnet no longer match, which causes this issue.

I also know a workaround:

bosh ssh -d worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld" 
bosh ssh -d worker -c "sudo rm /var/vcap/store/docker/docker/network/files/local-kv.db" 
bosh ssh -d worker -c "sudo /var/vcap/bosh/bin/monit restart all"

However is there any permanent fix to this?



Solution 1:[1]

TL;DR - recreate network

$ ip link set cni0 down
$ brctl delbr cni0  

Or, as @ws_ suggested in comments - remove interfaces and restart k8s services:

ip link set cni0 down && ip link set flannel.1 down 
ip link delete cni0 && ip link delete flannel.1
systemctl restart containerd && systemctl restart kubelet

Community solutions

It is a known issue

And there are some solutions to fix it.

Solution by filipenv is:

on master and slaves:

$ kubeadm reset
$ systemctl stop kubelet
$ systemctl stop docker
$ rm -rf /var/lib/cni/
$ rm -rf /var/lib/kubelet/*
$ rm -rf /etc/cni/
$ ifconfig cni0 down
$ ifconfig flannel.1 down
$ ifconfig docker0 down

you may need to manually umount filesystems from /var/lib/kubelet before calling rm on that dir) After doing that I started docker and kubelet back again and restarted the kubeadm process

aysark: and kubernetes-handbook in a recipe for Pod stuck in Waiting or ContainerCreating both recommend

$ ip link set cni0 down
$ brctl delbr cni0  

Some workarounds from Flannel's KB article

And there is an article in Flannel's KB: PKS Flannel network gets out of sync with docker bridge network (cni0)

Workaround 1:

WA1 is just like yours:

    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld"
    bosh ssh -d <deployment_name> worker -c "sudo rm /var/vcap/store/docker/docker/network/files/local-kv.db"
    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit restart all"

Workaround 2:

If WA1 didn't help, KB recommends:

    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld"
    bosh ssh -d <> worker -c "ifconfig | grep -A 1 flannel"
    On a master node get access to etcd using the following KB 
    On a master node run `etcdctlv2 ls /coreos.com/network/subnets/`
    Remove all the worker subnet leases from etcd by running `etcdctlv2 rm /coreos.com/network/subnets/<worker_subnet>;` for each of the worker subnets from point 2 above.
    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit restart flanneld"

Solution 2:[2]

I am running docker with kubernetes. Did the following on all my master and slave nodes and got my cluster work:

sudo su
ip link set cni0 down && ip link set flannel.1 down 
ip link delete cni0 && ip link delete flannel.1
systemctl restart docker && systemctl restart kubelet

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Q. Qiao