'How to fix "INVALID_ARGUMENT: Cloud TPU received an invalid argument. The "GuestAttributes" value "" was not found."?

I recently started using TPUv3-8 VMs to train language models and haven't had any issues with VMs crashing or the like. However, one of my TPU VMs seems to now have broken out of nowhere and I am completely lost.

When trying to ssh to the VM, I get the following error message:

ERROR: (gcloud.alpha.compute.tpus.tpu-vm.ssh) INVALID_ARGUMENT: Cloud TPU received an invalid argument. The "GuestAttributes" value "" was not found. [EID: 0xdffd54714f63b861]

I also cannot start or stop (only delete) the VM from https://console.cloud.google.com/compute/tpus because its status is "unknown".

Is there any way I can get the VM running again?



Solution 1:[1]

This issue can be transient and go away if you retry a couple of times. Do you continue to have this issue?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Milad M