'Cloud Run reports that a request is aborted with no available instance but seems to be automatically retried and successful

We have an endpoint hosted in Cloud Run which receives requests to print a receipt for the customer with no retry mechanism.

Earlier today, there was a single request to print a specific receipt which was met with the No available instance error and reported that the request was aborted. This is fine for us but the problem is that the request was replayed multiple times all with the same error response which is unexpected.

The kicker is that these request although reported as aborted were actually successful as we have the data written in our print queue and the printers ended up printing 100+ duplicates of the same thing.

Is there a way to prevent/fix this without having to do a central lock throttle?

Below is what we can see in our logs:

enter image description here



Solution 1:[1]

It appears that your application made requests over and over again without implementing exponential backoff. I counted 20 requests with the same second timestamp.

Hammering a service with requests that is saying wait is not a good strategy.

I would implement a smarter strategy for errors. Since the HTTP error is 500 which means Cloud Run couldn't manage the rate of traffic, I would have delayed making any requests for 60 seconds and then retrying once and then again with a longer delay. If that fails, stop making requests and send a notification message (text, email, etc) so that a human can look into the problem.

Note: there is most likely an underlying problem or two. There might be a resource availability issue for the region. Your containers might be taking too long on startup. Review earlier logs to analyze what was happening when the errors started. Sometimes these errors are temporary, self-healing problems. The logs should help you determine that.

Implement different strategies based upon the error condition.

Exponential backoff

Troubleshoot Cloud Run issues

Also consider opening a Google Cloud support (paid) support ticket. Share the information that you collected with Google support. You might have a deployment that the team can learn from. If this is a Google problem, request a credit.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 John Hanley