'GKE stuck at autorepairing and not able to add the nodepools

Recently I faced an issue in my Kubernetes cluster on GCP, From all services which is running in the my cluster, one of the service used the node memory out of limit and Node get crashed, Application node status changed to NotReady status due to the pressure in memory and GKE started auto repairing the node, meanwhile I was not able to add new nodes in GKE cluster as the cluster was on autorepair and GCP UI disabled all the options to add newer nodepool. Still the cluster is in auto-repairing continuously. Also I already enabled the node auto-scaling feature in my cluster, though it did not add new node. I wonder what should be done to solve the issue.



Solution 1:[1]

You can not do if GKE is disabling the option to add new node pool instead you can follow other best practices first.

You can first add the resource limit and request to the workload so won't take much resources of Node.

requests:
  cpu: "150m"
  memory: "80Mi"
limits:
  cpu: "1"
  memory: "1024Mi" 

Read more : https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

You can also search using the Nodeselector and affinity to schedule specific PODs on specific nodes.

You can also check for QoS class once.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Harsh Manvar