'Kubernetes ingress controller upgrade doesn't finish the upgrade

We have an issue upgrading our nginx ingress controller:
We have thousands of ingress objects - all with the same ingress class, provided as an annotation and not as the IngressClass object (because of the version of the nginx, see versions at the end).

When we run the upgrade, the new replicaset pods just don't finish syncing the ingresses, stuck on 0/1 Running, and eventually get restarted.
If we do a helm rollback to the previous revision, the pods that come up finish the sync in seconds, and get to a 1/1 Running state.

We tried installing a new chart of the latest version and have it "listen" on the same ingress class, but it seems like the old and the new deployments fight one another on control of the ingresses. So the pods are stuck in a restart loop and the deployment never finishes.

When doing an upgrade (and not a new chart release install) we tried scaling down the old replicaset of the deployment to zero (causing downtime) to try and rule out any cases of a race condition loop, but the new pods still never finished starting.

I know it’s a lot of ingresses, but as I said before - when rolling back to a previous revision, the pods manage to finish the syncing in seconds and there aren’t any "races" or collisions.
I know using the ingress class annotation is deprecated but in order to migrate to the IngressClass object we need to first upgrade the ingress controller.

Here's an example for the output of kubectl describe on a failing pod of the latest chart and app versions:

Events:
  Type     Reason     Age    From                      Message
  ----     ------     ----   ----                      -------
  Normal   Scheduled  2m17s  default-scheduler         Successfully assigned my-namespace/nginx-ingress-controller-d58fdfd89-54b2w to ip-10-10-163-229.ec2.internal
  Warning  RELOAD     84s    nginx-ingress-controller  Error reloading NGINX: 
-------------------------------------------------------------------------------
Error: signal: terminated
2022/05/11 09:31:52 [warn] 41#41: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg3797289662:150
nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg3797289662:150
2022/05/11 09:31:52 [warn] 41#41: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg3797289662:151
nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg3797289662:151
2022/05/11 09:31:52 [warn] 41#41: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg3797289662:152
nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg3797289662:152

-------------------------------------------------------------------------------
  Normal   Killing    84s                  kubelet                   Container controller failed liveness probe, will be restarted
  Normal   Pulled     72s (x2 over 2m16s)  kubelet                   Container image "k8s.gcr.io/ingress-nginx/controller:v1.2.0@sha256:d8196e3bc1e72547c5dec66d6556c0ff92a23f6d0919b206be170bc90d5f9185" already present on machine
  Normal   Created    71s (x2 over 2m16s)  kubelet                   Created container controller
  Normal   Started    71s (x2 over 2m16s)  kubelet                   Started container controller
  Warning  Unhealthy  34s (x8 over 2m4s)   kubelet                   Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy  26s (x10 over 2m6s)  kubelet                   Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  RELOAD     14s                  nginx-ingress-controller  Error reloading NGINX: 
-------------------------------------------------------------------------------
Error: signal: terminated
2022/05/11 09:32:57 [warn] 40#40: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg1563251063:150
nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg1563251063:150
2022/05/11 09:32:57 [warn] 40#40: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg1563251063:151
nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg1563251063:151
2022/05/11 09:32:57 [warn] 40#40: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg1563251063:152
nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg1563251063:152

-------------------------------------------------------------------------------

Here's the versions of everything we use:

  • kubernetes version 1.19.16 (eks)
  • helm version 3.8.2
  • helm repo used - https://kubernetes.github.io/ingress-nginx
  • current ingress controller version: helm chart 3.7.1, app version 0.40.2
  • we’ve tried updating to the latest and we’ve tried updating to the chart 3.8.0, app version 0.41.0 but both cases resulted in the same outcome. Any ideas what can we do?


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source