We constantly get Waiting: ImagePullBackOff during CI upgrades. Anybody know whats happening? k8s cluster 1.6.2 installed via kops. During upgrades, we do kubectl set image and during the last 2 days, we are seeing the following error
Failed to pull image "********.dkr.ecr.eu-west-1.amazonaws.com/backend:da76bb49ec9a": rpc error: code = 2 desc = net/http: request canceled
Error syncing pod, skipping: failed to "StartContainer" for "backend" with ErrImagePull: "rpc error: code = 2 desc = net/http: request canceled"
journalctl -r -u kubelet
Jul 26 09:32:40 ip-10-0-49-227 kubelet[840]: W0726 09:32:40.731903 840 docker_sandbox.go:263] NetworkPlugin kubenet failed on the status hook for pod "backend-1277054742-bb8zm_default": Unexpected command output nsenter: cannot open : No such file or directory
Jul 26 09:32:40 ip-10-0-49-227 kubelet[840]: E0726 09:32:40.724387 840 generic.go:239] PLEG: Ignoring events for pod frontend-1493767179-84rkl/default: rpc error: code = 2 desc = Error: No such container: 2421109e0d1eb31242c5088b547c0f29377816ca068a283b8fe6c2d8e7e5874d
Jul 26 09:32:40 ip-10-0-49-227 kubelet[840]: E0726 09:32:40.724371 840 kuberuntime_manager.go:858] getPodContainerStatuses for pod "frontend-1493767179-84rkl_default(0fff3b22-71c8-11e7-9679-02c1112ca4ec)" failed: rpc error: code = 2 desc = Error: No such container: 2421109e0d1eb31242c5088b547c0f29377816ca068a283b8fe6c2d8e7e5874d
Jul 26 09:32:40 ip-10-0-49-227 kubelet[840]: E0726 09:32:40.724358 840 kuberuntime_container.go:385] ContainerStatus for 2421109e0d1eb31242c5088b547c0f29377816ca068a283b8fe6c2d8e7e5874d error: rpc error: code = 2 desc = Error: No such container: 2421109e0d1eb31242c5088b547c0f29377816ca068a283b8fe6c2d8e7e5874d
Jul 26 09:32:40 ip-10-0-49-227 kubelet[840]: E0726 09:32:40.724329 840 remote_runtime.go:269] ContainerStatus "2421109e0d1eb31242c5088b547c0f29377816ca068a283b8fe6c2d8e7e5874d" from runtime service failed: rpc error: code = 2 desc = Error: No such container: 2421109e0d1eb31242c5088b547c0f29377816ca068a283b8fe6c2d8e7e5874d
Jul 26 09:32:40 ip-10-0-49-227 kubelet[840]: with error: exit status 1
Try running kubectl create configmap -n kube-system kube-dns
For more context, check out known issues with kubernetes 1.6 https://github.com/kubernetes/kops/releases/tag/1.6.0
This may be caused by a known docker bug where shutdown occurs before the content is synced to disk on layer creation. The fix is included in docker v1.13.
work around is to remove the empty files and re-pull the image.
Related
I'm using RabbitMQ 3.8.2 with Erlang 22.2.7 and having a problem while consuming tasks. My configuration is django-celery-rabbitmq. While publishing messages in a queue everything goes ok until the length of the queue reaches 1200 messages. After this point RabbitMQ starts to close AMQP connection with following errors:
...
2022-11-01 09:35:25.327 [info] <0.20608.9> accepting AMQP connection <0.20608.9> (185.121.83.107:60447 -> 185.121.83.116:5672)
2022-11-01 09:35:25.483 [info] <0.20608.9> connection <0.20608.9> (185.121.83.107:60447 -> 185.121.83.116:5672): user 'rabbit_admin' authenticated and granted access to vhost '/'
...
2022-11-01 09:36:59.129 [warning] <0.19994.9> closing AMQP connection <0.19994.9> (185.121.83.108:36149 -> 185.121.83.116:5672, vhost: '/', user: 'rabbit_admin'):
client unexpectedly closed TCP connection
...
[error] <0.11162.9> closing AMQP connection <0.11162.9> (185.121.83.108:57631 -> 185.121.83.116:5672):
{writer,send_failed,{error,enotconn}}
...
2022-11-01 09:35:48.256 [error] <0.20201.9> closing AMQP connection <0.20201.9> (185.121.83.108:50058 -> 185.121.83.116:5672):
{inet_error,enotconn}
...
Then the django-celery consumer disappears from queue list, messages become "ready" and celery pods are unable to ack the message after the job is finished with the following error:
ERROR: [2022-11-01 09:20:23] /usr/src/app/project/celery.py:114 handle_message Error while handling Rabbit task: [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/amqp/connection.py", line 514, in channel
return self.channels[channel_id]
KeyError: None
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/project/celery.py", line 76, in handle_message
message.ack()
File "/usr/local/lib/python3.10/site-packages/kombu/message.py", line 125, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/usr/local/lib/python3.10/site-packages/amqp/channel.py", line 1407, in basic_ack
return self.send_method(
File "/usr/local/lib/python3.10/site-packages/amqp/abstract_channel.py", line 70, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/usr/local/lib/python3.10/site-packages/amqp/method_framing.py", line 186, in write_frame
write(buffer_store.view[:offset])
File "/usr/local/lib/python3.10/site-packages/amqp/transport.py", line 347, in write
self._write(s)
ConnectionResetError: [Errno 104] Connection reset by peer
I have noticed that the message size also affects this behavior. In the above case there are like 1000-1500 symbols in each message. If I decrease it to 50 symbols, then the threshold at which RabbitMQ starts to close AMQP connection shifts to 4000-5000 messages.
I suspect that the problem is with lack of resources for RabbitMQ, but I don't know how find what exactly is going wrong. If I run htop on the server, I see that 2 available CPU are not at high load at any time (loaded less than 20% each) and RAM is 400mb / 3840mb used. So nothing seems to be wrong. Is there any resource checking command for RabbitMQ? The tasks do not take long time to complete, about 10 seconds each.
Also maybe there are some missing heartbeats from the client (I had the problem earlier, but not now, there are currently no error messages about that).
Also if I run sudo journalctl --system | grep rabbitmq, I get the following output:
......
Мау 24 05:15:49 oms-git.omsystem sshd[809111]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=43.154.63.169 user=rabbitmq
Мау 24 05:15:51 oms-git.omsystem sshd[809111]: Failed password for rabbitmq from 43.154.63.169 port 37010 ssh2
Мау 24 05:15:51 oms-git.omsystem sshd[809111]: Disconnected from authenticating user rabbitmq 43.154.63.169 port 37010 [preauth]
Мау 24 16:12:32 oms-git.omsystem sudo[842182]: ad : TTY=pts/3 ; PWD=/var/log/rabbitmq ; USER=root ; COMMAND=/usr/bin/tail -f -n 1000 rabbit#XXX-git.log
......
Maybe here is another issue with firewall, but I don't see any error messages about that in /var/log/rabbitmq/rabbit#XXX.log.
My Celery configuration on client is like:
CELERY_TASK_IGNORE_RESULT = True
CELERY_RESULT_BACKEND = 'django-db'
CELERY_CACHE_BACKEND = 'django-cache'
CELERY_SEND_EVENTS = False
CELERY_BROKER_POOL_LIMIT = 30
CELERY_BROKER_HEARTBEAT = 30
CELERY_BROKER_CONNECTION_TIMEOUT = 600
CELERY_PREFETCH_MULTIPLIER = 1
CELERY_SEND_EVENTS = False
CELERY_WORKER_CONCURRENCY = 1
CELERY_TASK_ACKS_LATE = True
Currently I'm running the pod using following command:
celery -A project.celery worker -l info -f /var/log/celery/celery.log -Ofair
Also I have tried to use various arguments to limit prefetch or turn off heartbit but it didn't work:
celery -A project.celery worker -l info -f /var/log/celery/celery.log --without-heartbeat --without-gossip --without-mingle
celery -A project.celery worker -l info -f /var/log/celery/celery.log --prefetch-multiplier=1 --pool=solo --
I expect that there are no limitations on queue length and every celery pod in my kubernetes cluster consumes and acks messages without errors.
When trying to SSH to GCE VMs using metadata-based SSH keys I get the following error:
ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
While troubleshooting I can see the keys in the instance metadata, but they are not being added to the user's authorized_keys file:
$ curl -H "Metadata-Flavor: Google" "http://metadata.google.internal/computeMetadata/v1/instance/attributes/ssh-keys"
username:ssh-ed25519 AAAAC3NzaC....omitted....
admin:ssh-ed25519 AAAAC3NzaC....omitted....
$ sudo ls -hal /home/**/.ssh/
/home/ubuntu/.ssh/:
total 8.0K
drwx------ 2 ubuntu ubuntu 4.0K Aug 11 23:19 .
drwxr-xr-x 3 ubuntu ubuntu 4.0K Aug 11 23:19 ..
-rw------- 1 ubuntu ubuntu 0 Aug 11 23:19 authorized_keys
# Only result is the default zero-length file for ubuntu user
I also see the following errors in the ssh server auth log and Google Guest Environment services:
$ sudo less /var/log/auth.log
Aug 11 23:28:59 test-vm sshd[2197]: Invalid user admin from 1.2.3.4 port 34570
Aug 11 23:28:59 test-vm sshd[2197]: Connection closed by invalid user admin 1.2.3.4 port 34570 [preauth]
$ sudo journalctl -u google-guest-agent.service
Aug 11 22:24:42 test-vm oslogin_cache_refresh[907]: Refreshing passwd entry cache
Aug 11 22:24:42 test-vm oslogin_cache_refresh[907]: Refreshing group entry cache
Aug 11 22:24:42 test-vm oslogin_cache_refresh[907]: Failure getting groups, quitting
Aug 11 22:24:42 test-vm oslogin_cache_refresh[907]: Failed to get groups, not updating group cache file, removing /etc/oslogin_group.cache.bak.
# or
Aug 11 23:19:37 test-vm GCEGuestAgent[766]: 2022-08-11T23:19:37.6541Z GCEGuestAgent Info: Creating user admin.
Aug 11 23:19:37 test-vm useradd[885]: failed adding user 'admin', data deleted
Aug 11 23:19:37 test-vm GCEGuestAgent[766]: 2022-08-11T23:19:37.6869Z GCEGuestAgent Error non_windows_accounts.go:144:
Error creating user: useradd: group admin exists - if you want to add this user to that group, use -g.
Currently the latest cloud-init and guest-oslogin packages for Ubuntu 20.04.4 LTS (focal) seem to have an issue that causes google-guest-agent.service to exit before completing its task. The issue was fixed and committed but not yet released for focal (and likely other Ubuntu versions).
For now you can try disabling OS Login by setting instance or project metadata enable-oslogin=FALSE. After which you should see the expected results and be able to SSH using those keys:
$ sudo journalctl -u google-guest-agent.service
Aug 11 23:10:33 test-vm GCEGuestAgent[761]: 2022-08-11T23:10:33.0517Z GCEGuestAgent Info: Created google sudoers file
Aug 11 23:10:33 test-vm GCEGuestAgent[761]: 2022-08-11T23:10:33.0522Z GCEGuestAgent Info: Creating user username.
Aug 11 23:10:33 test-vm useradd[881]: new group: name=username, GID=1002
Aug 11 23:10:33 test-vm useradd[881]: new user: name=username, UID=1001, GID=1002, home=/home/username, shell=/bin/bash, from=none
Aug 11 23:10:33 test-vm gpasswd[895]: user username added by root to group ubuntu
Aug 11 23:10:33 test-vm gpasswd[904]: user username added by root to group adm
Aug 11 23:10:33 test-vm gpasswd[983]: user username added by root to group google-sudoers
Aug 11 23:10:33 test-vm GCEGuestAgent[761]: 2022-08-11T23:10:33.7615Z GCEGuestAgent Info: Updating keys for user username.
$ sudo ls -hal /home/username/.ssh/
/home/username/.ssh/:
total 12K
drwx------ 2 username username 4.0K Aug 11 23:19 .
drwxr-xr-x 4 username username 4.0K Aug 11 23:35 ..
-rw------- 1 username username 589 Aug 11 23:19 authorized_keys
The admin user however will not work, since it conflicts with an existing linux group. You should pick a username that does not conflict with any of the name:x:123: names listed at getent group
Elastic Beanstalk is adding & removing instances one after the other. Googling around points to checking the "State transition message" which is coming up as "Client.UserInitiatedShutdown: User initiated shutdown" for which https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/troubleshooting-launch.html#troubleshooting-launch-internal states some possible reasons but none of these apply. No one has touched any setting, etc. Any ideas?
UPDATE: Did a bit more digging and found out that app deployment is failing. Relevant log errors are below:
eb-engine.log
2021/08/05 15:46:29.272215 [INFO] Executing instruction: PreBuildEbExtension
2021/08/05 15:46:29.272220 [INFO] Starting executing the config set Infra-EmbeddedPreBuild.
2021/08/05 15:46:29.272235 [INFO] Running command /bin/sh -c /opt/aws/bin/cfn-init -s arn:aws:cloudformation:us-east-1:345470085661:stack/awseb-e-mecfm5qc8z-stack/317924c0-a106-11ea-a8a3-12498e67507f -r AWSEBAutoScalingGroup --region us-east-1 --configsets Infra-EmbeddedPreBuild
2021/08/05 15:50:44.538818 [ERROR] An error occurred during execution of command [app-deploy] - [PreBuildEbExtension]. Stop running the command. Error: EbExtension build failed. Please refer to /var/log/cfn-init.log for more details.
2021/08/05 15:50:44.540438 [INFO] Executing cleanup logic
2021/08/05 15:50:44.581445 [INFO] CommandService Response: {"status":"FAILURE","api_version":"1.0","results":[{"status":"FAILURE","msg":"Engine execution has encountered an error.","returncode":1,"events":[{"msg":"Instance deployment failed. For details, see 'eb-engine.log'.","timestamp":1628178644,"severity":"ERROR"}]}]}
2021/08/05 15:50:44.620394 [INFO] Platform Engine finished execution on command: app-deploy
2021/08/05 15:51:22.196186 [ERROR] An error occurred during execution of command [self-startup] - [PreBuildEbExtension]. Stop running the command. Error: EbExtension build failed. Please refer to /var/log/cfn-init.log for more details.
2021/08/05 15:51:22.196215 [INFO] Executing cleanup logic
eb-cfn-init.log
[2021-08-05T15:42:44.199Z] Completed executing cfn_init.
[2021-08-05T15:42:44.226Z] finished _OnInstanceReboot
+ RESULT=1
+ [[ 1 -ne 0 ]]
+ sleep_delay
+ (( 2 < 3600 ))
+ echo Sleeping 2
Sleeping 2
+ sleep 2
+ SLEEP_TIME=4
+ true
+ curl https://elasticbeanstalk-platform-assets-us-east-1.s3.amazonaws.com/stalks/eb_php74_amazon_linux_2_1.0.1153.0_20210728213922/lib/UserDataScript.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 4627 100 4627 0 0 24098 0 --:--:-- --:--:-- --:--:-- 24098
+ RESULT=0
+ [[ 0 -ne 0 ]]
+ SLEEP_TIME=2
+ /bin/bash /tmp/ebbootstrap.sh 'https://cloudformation-waitcondition-us-east-1.s3.amazonaws.com/arn%3Aaws%3Acloudformation%3Aus-east-1%3A345470085661%3Astack/awseb-e-mecfm5qc8z-stack/317924c0-a106-11ea-a8a3-12498e67507f/AWSEBInstanceLaunchWaitHandle?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200528T171102Z&X-Amz-SignedHeaders=host&X-Amz-Expires=86399&X-Amz-Credential=AKIAIIT3CWAIMJYUTISA%2F20200528%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=57c7da0aec730af1b425d1aff68517c333cf9d5432c984d775419b415cac8513' arn:aws:cloudformation:us-east-1:345470085661:stack/awseb-e-mecfm5qc8z-stack/317924c0-a106-11ea-a8a3-12498e67507f 65c52bb7-0376-4d43-b304-b64890a34c1c https://elasticbeanstalk-health.us-east-1.amazonaws.com '' https://elasticbeanstalk-platform-assets-us-east-1.s3.amazonaws.com/stalks/eb_php74_amazon_linux_2_1.0.1153.0_20210728213922 us-east-1
[2021-08-05T15:46:07.683Z] Started EB Bootstrapping Script.
[2021-08-05T15:46:07.739Z] Received parameters:
TARBALLS =
EB_GEMS =
SIGNAL_URL = https://cloudformation-waitcondition-us-east-1.s3.amazonaws.com/arn%3Aaws%3Acloudformation%3Aus-east-1%3A345470085661%3Astack/awseb-e-mecfm5qc8z-stack/317924c0-a106-11ea-a8a3-12498e67507f/AWSEBInstanceLaunchWaitHandle?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200528T171102Z&X-Amz-SignedHeaders=host&X-Amz-Expires=86399&X-Amz-Credential=AKIAIIT3CWAIMJYUTISA%2F20200528%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=57c7da0aec730af1b425d1aff68517c333cf9d5432c984d775419b415cac8513
STACK_ID = arn:aws:cloudformation:us-east-1:345470085661:stack/awseb-e-mecfm5qc8z-stack/317924c0-a106-11ea-a8a3-12498e67507f
REGION = us-east-1
GUID =
HEALTHD_GROUP_ID = 65c52bb7-0376-4d43-b304-b64890a34c1c
HEALTHD_ENDPOINT = https://elasticbeanstalk-health.us-east-1.amazonaws.com
PROXY_SERVER =
HEALTHD_PROXY_LOG_LOCATION =
PLATFORM_ASSETS_URL = https://elasticbeanstalk-platform-assets-us-east-1.s3.amazonaws.com/stalks/eb_php74_amazon_linux_2_1.0.1153.0_20210728213922
Is this some corrupted AMI?
Turned out that there was a config script in the .ebextension directory that was not behaving.
I'm preparing a cluster with personal machine, I mount Centos 7 in the server and I'm trying to start the slurm clients but when I typed this command:
pdsh -w n[00-09] systemctl start slurmd
I had this error:
n07: Job for slurmd.service failed because the control process exited with error code. See "systemctl status slurmd.service" and "journalctl -xe" for details.
pdsh#localhost: n07: ssh exited with exit code 1
I had that message for all the nodes.
[root#localhost ~]# systemctl status slurmd.service -l
● slurmd.service - Slurm node daemon
Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2020-12-22 18:27:30 CST; 27min ago
Process: 1589 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=203/EXEC)
Dec 22 18:27:30 localhost.localdomain systemd[1]: Starting Slurm node daemon...
Dec 22 18:27:30 localhost.localdomain systemd[1]: slurmd.service: control process exited, code=exited status=203
Dec 22 18:27:30 localhost.localdomain systemd[1]: Failed to start Slurm node daemon.
Dec 22 18:27:30 localhost.localdomain systemd[1]: Unit slurmd.service entered failed state.
Dec 22 18:27:30 localhost.localdomain systemd[1]: slurmd.service failed.
This is the slurm.conf file:
ClusterName=linux
ControlMachine=localhost
#ControlAddr=
#BackupController=
#BackupAddr=
#
SlurmUser=slurm
#SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
#PluginDir=
#FirstJobId=
#MaxJobCount=
#PlugStackConfig=
#PropagatePrioProcess=
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#Prolog=
#Epilog=
#SrunProlog=
#SrunEpilog=
#TaskProlog=
#TaskEpilog=
#TaskPlugin=
#TrackWCKey=no
#TreeWidth=50
#TmpFS=
#UsePAM=
#
#TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
#SCHEDULING
SchedulerType=sched/backfill
#SchedulerAuth=
#SelectType=select/linear
FastSchedule=1
#PriorityType=priority/multifactor
#PriorityDecayHalfLife=14-0
#PriorityUsageResetPeriod=14-0
#PriorityWeightFairshare=100000
#PriorityWeightAge=1000
#PriorityWeightPartition=10000
#PriorityWeightJobSize=1000
#PriorityMaxAge=1-0
#
#LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none
#JobCompLoc=
#
#ACCOUNTING #JobAcctGatherType=jobacct_gather/linux #JobAcctGatherFrequency=30 # #AccountingStorageType=accounting_storage/slurmdbd #AccountingStorageHost= #AccountingStorageLoc= #AccountingStoragePass= #AccountingStorageUser= # #COMPUTE NODES # OpenHPC default configuration TaskPlugin=task/affinity PropagateResourceLimitsExcept=MEMLOCK AccountingStorageType=accounting_storage/filetxt Epilog=/etc/slurm/slurm.epilog.clean NodeName=n[00-09] Sockets=1 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN
PartitionName=normal Nodes=n[00-09] Default=YES MaxTime=24:00:00 State=UP
ReturnToService=1
The control machine was set by hostname -s
I am trying to create hadoop-cluster in GCP, by using below commands:
cd bdutil
$ ./bdutil -b [Bucket Name] \
-z us-east1-b \
-n 2 \
-P [Project-ID] \
deploy
...
At (y/n) enter y
Facing below issues, please help to resolve the issues:
Mon Oct 8 05:35:49 UTC 2018: Exited 1 : gcloud --project=geslanu-218716 --quiet --verbosity=info compute instances
create geslanu-218716-w-0 --machine-type=n1-standard-1 --image-family=debian-8 --image-project=debian-cloud --netw
ork=default --tags=bdutil --scopes storage-full --boot-disk-type=pd-standard --zone=us-east1-b
Mon Oct 8 05:35:50 UTC 2018: Exited 1 : gcloud --project=geslanu-218716 --quiet --verbosity=info compute instances
create geslanu-218716-w-1 --machine-type=n1-standard-1 --image-family=debian-8 --image-project=debian-cloud --netw
ork=default --tags=bdutil --scopes storage-full --boot-disk-type=pd-standard --zone=us-east1-b
Mon Oct 8 05:35:50 UTC 2018: Exited 1 : gcloud --project=geslanu-218716 --quiet --verbosity=info compute instances
create geslanu-218716-m --machine-type=n1-standard-1 --image-family=debian-8 --image-project=debian-cloud --networ
k=default --tags=bdutil --scopes storage-full --boot-disk-type=pd-standard --zone=us-east1-b
Mon Oct 8 05:35:50 UTC 2018: Command failed: wait ${SUBPROC} on line 326.
Mon Oct 8 05:35:50 UTC 2018: Exit code of failed command: 1
Mon Oct 8 05:35:50 UTC 2018: Detailed debug info available in file: /tmp/bdutil-20181008-053541-yeq/debuginfo.txt
Mon Oct 8 05:35:50 UTC 2018: Check console output for error messages and/or retry your command.
check next output - Detailed debug info available in file: /tmp/bdutil-20181111-105223-nET/debuginfo.txt
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Invalid value for field 'resource.name': 'antonmobile_gmail-w-0'. Must be a match of regex '(?:[a-z](?:[-a-z0-9$
I am personally had an issue with UNIQUE_ID where used underscore. that does not conform with name restrictions.
PS: I know it is probably to late to answer but somebody else could have the same issue.