'AWS Databricks Cluster terminated.Reason:Container launch failure

We're developing custom runtime for databricks cluster. We need to version and archive our clusters for client. We made it run successfully in our own environment but we're not able to make it work in client's environment. It's large corporation with many restrictions.

We’re able to start EC2 instance and pull image, but there must be some other blocker. I think ec2 instance is succefully running, but I have error in databricks

Cluster terminated.Reason:Container launch failure

An unexpected error was encountered while launching containers on worker instances for the cluster. Please retry and contact Databricks if the problem persists.

Instance ID: i-0fb50653895453fdf

Internal error message: Failed to launch spark container on instance i-0fb50653895453fdf. Exception: Container setup has timed out

It should be in some settings/permissions inside client's environment.

Here is end of ec2 log

-----END SSH HOST KEY KEYS----- [ 59.876874] cloud-init[1705]: Cloud-init v. 21.4-0ubuntu1~18.04.1 running 'modules:final' at Wed, 09 Mar 2022 15:05:30 +0000. Up 17.38 seconds. [ 59.877016] cloud-init[1705]: Cloud-init v. 21.4-0ubuntu1~18.04.1 finished at Wed, 09 Mar 2022 15:06:13 +0000. Datasource DataSourceEc2Local. Up 59.86 seconds [ 59.819059] audit: kauditd hold queue overflow [
66.068641] audit: kauditd hold queue overflow [ 66.070755] audit: kauditd hold queue overflow [ 66.072833] audit: kauditd hold queue overflow [ 74.733249] audit: kauditd hold queue overflow [
74.735227] audit: kauditd hold queue overflow [ 74.737109] audit: kauditd hold queue overflow [ 79.899966] audit: kauditd hold queue overflow [ 79.903557] audit: kauditd hold queue overflow [
79.907108] audit: kauditd hold queue overflow [ 89.324990] audit: kauditd hold queue overflow [ 89.329193] audit: kauditd hold queue overflow [ 89.333125] audit: kauditd hold queue overflow [ 106.617320] audit: kauditd hold queue overflow [ 106.620980] audit: kauditd hold queue overflow [ 107.464865] audit: kauditd hold queue overflow [ 127.175767] audit: kauditd hold queue overflow [ 127.179897] audit: kauditd hold queue overflow [ 127.215281] audit: kauditd hold queue overflow [ 132.190357] audit: kauditd hold queue overflow [ 132.193968] audit: kauditd hold queue overflow [ 132.197546] audit: kauditd hold queue overflow [ 156.211713] audit: kauditd hold queue overflow [ 156.215388] audit: kauditd hold queue overflow [ 228.558571] audit: kauditd hold queue overflow [ 228.562120] audit: kauditd hold queue overflow [ 228.565629] audit: kauditd hold queue overflow [ 316.405562] audit: kauditd hold queue overflow [ 316.409136] audit: kauditd hold queue overflow



Solution 1:[1]

This is usually caused by slowness in downloading the custom docker image, please check if you can download from the docker repository properly from the network where your VMs are launched.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 user13035930