Different results when running shell command via ssh CLI vs. ssh shell - google-cloud-platform

I have some baffling behavior that I'm sure has a logical explanation, but I'm wasting a lot of time trying to figure out.
Basically I'm getting different results for lsb_release -cs when run in different contexts against a Debian compute engine instance.
For example:
$ gcloud compute ssh instance-1 --command "lsb_release -cs"
bullseye
$ gcloud compute ssh instance-1 --command "echo $(lsb_release -cs)"
focal
$ gcloud compute ssh instance-1
user#instance-1:~$ lsb_release -cs
bullseye
user#instance-1:~$ echo $(lsb_release -cs)
bullseye
What in the world? For some reason using $(lsb_release -cs) passed via --command yields focal, but everywhere else it yields bullseye.

"$(lsb_release -cs)" was being evaluated by the local shell before ssh executed it.
Solution was to switch to single quotes: gcloud compute ssh instance-1 --command 'echo $(lsb_release -cs)'

Related

SSHOperator with ComputeEngineSSHHook

I am trying to run a command using ssh in a GCP VM in airflow via the SSHOperator as described here:
ssh_to_vm_task = SSHOperator(
task_id="ssh_to_vm_task",
ssh_hook=ComputeEngineSSHHook(
instance_name=<MYINSTANCE>,
project_id=<MYPROJECT>,
zone=<MYZONE>,
use_oslogin=False,
use_iap_tunnel=True,
use_internal_ip=False
),
command="echo test_message",
dag=dag
)
However, I get a airflow.exceptions.AirflowException: SSH operator error: [Errno 2] No such file or directory: 'gcloud' error.
Docker is installed via docker-compose following these instructions.
Other Airflow GCP operators (such as BigQueryCheckOperator) work correctly. So at first sight it does not seem like a configuration problem.
Could you please help me? Is this a bug?
It seems the issue is that gcloud was not installed in the docker container by default. This has been solved by following instructions in here: it is necessary to add
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] http://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && apt-get update -y && apt-get install google-cloud-sdk -y
to the dockerfile that is used to install airflow / install dependencies.
Check if the TCP port 22 is allowed through the firewall on your GCP VM instance, and make sure that the VM instance also allows SSH access and is properly configured in that VM instance. Furthermore, be sure that the IP from which you are trying to SSH at the VM instance is whitelisted through the firewall.
You can use the following command in GCP to check the ingress firewall rule for the network that contains the destination VM instance. Additionally, you can consult this [link]for more information.
This is an example of what you have to do.
´´´
gcloud compute firewall-rules list --filter network=[NETWORK-NAME] \
--filter INGRESS \
--sort-by priority \
--format="table(
name,
network,
direction,
priority,
sourceRanges.list():label=SRC_RANGES,
destinationRanges.list():label=DEST_RANGES,
allowed[].map().firewall_rule().list():label=ALLOW,
denied[].map().firewall_rule().list():label=DENY,
sourceTags.list():label=SRC_TAGS,
sourceServiceAccounts.list():label=SRC_SVC_ACCT,
targetTags.list():label=TARGET_TAGS,
targetServiceAccounts.list():label=TARGET_SVC_ACCT
)"
´´´

How to detach a disk in a Google Cloud TPU VM instance?

I created a TPU-VM instance (not a normal compute instance) and attach an external disk to it using this command:
gcloud alpha compute tpus tpu-vm create TPU-VM-NAME \
--zone=europe-west4-a \
--accelerator-type=v3-8 \
--version=v2-alpha \
--data-disk source=[PATH/TO/DISK]
Now I want to detach that disk from the TPU-VM but I cannot find the instance in the VM instances tab in the Google cloud console (They treated it as a TPU instance so it's not listed there). I can only find it in the TPUs tab, but in the TPUs tab I cannot edit the disk out of the instance.
I tried using this command too but it doesn't work:
gcloud compute instances detach-disk INSTANCE-NAME --disk=DISK-NAME
It says that resource (projects/project-name/zone/instances/tpu-vm-name) was not found.
Detaching a disk for the TPU VM architecture is not supported right now.
Actually, it is supported according to this tutorial! You need to follow this config when you are in TPU VM.Don't forget creating disk before detaching it and be sure you are using same billing account in both TPU VM and Disk. Otherwise, system will throw INTERNAL ERROR.
sudo lsblk
#Find the disk here, and the device name. Most likely it will be "sdb". Given that is correct, format the disk with this command:
sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
#Mount the disk - and give everyone
sudo mkdir -p /mnt/disks/flaxdisk
sudo mount -o discard,defaults /dev/sdb /mnt/disks/flaxdisk
sudo chmod a+w /mnt/disks/flaxdisk
#Configure automatic mount on restarts
sudo cp /etc/fstab /etc/fstab.backup
#Find the uid of the disk - you need this value in the following steps
sudo blkid /dev/sdb
#Add this to /etc/fstab with the correct uuid
UUID=52af08e4-f249-4efa-9aa3-7c7a9fd560b0 /mnt/disks/flaxdisk ext4 discard,defaults,nofail 0 2

Using SSH to access JupyterLab on GCP Doesn't Work Yet

I would like to ssh myself into a Jupyter Lab notebook on GCP. For that, I followed this guide by Google.
I already have CloudSDK installed on my PC, and when I type gcloud compute ssh --project $PROJECT_ID ... and all the rest, it throws the following error on my terminal:
ERROR: (gcloud.compute.ssh) Could not fetch resource:
Invalid value ' test-gpu'. Values must match the following regular expression:
'a-z?|[1-9][0-9]{0,19}
Unfortunately, I do not really understand this error message, as my instance-name is called "test-gpu", which was a perfectly valid name when creating the instance.
Any help would be appreciated!
It seems you have a space in the beginning of the name of your instance, please remove and try again.
Able to reproduce the the issue in my Windows machine while trying to access the notebook as per the doc as per below command:
set PROJECT_ID="my-project-id"
set ZONE="my-zone"
set INSTANCE_NAME="my-instance"
gcloud compute ssh --project $PROJECT_ID --zone $ZONE $INSTANCE_NAME -- -L 8080:localhost:8080
ERROR: (gcloud.compute.ssh) Could not fetch resource:
Invalid value 'my-instance'. Values must match the following regular expression: 'a-z?|[1-9][0-9]{0,19}
However, while use the command as like below in windows machine it's work as intended:
$gcloud compute ssh --project my-test-project --zone us-west1-b instance-name -- -L 8080:localhost:8080
At the same time, able to access the instance from the Chromebook using the below mentioned command:
$gcloud compute ssh --project $PROJECT_ID --zone $ZONE $INSTANCE_NAME -- -L 8080:localhost:8080

How to make VM-instance with python3.5 in Google Cloud Platform

I have make Tensorflow model using Python3.5.5. I want to deploy it on the Google Cloud Platform. But since Google Cloud Platform support python 3.6 and Python 3.7, I am not able to make out how to do it. And If possible please guide me how to access Google Cloud Storage Bucket data in my tensorflow model.
Thanks in Advance.
You can either search in here if there is an image with that specific Python version or you can pick any Linux distribution image and install that specific Python version in it.
For the latter case here’s a working example:
ZONE=us-central1-a
INSTANCE_ID=my-vm-00
gcloud compute instances create $INSTANCE_ID --machine-type n1-standard-1 --image-project debian-cloud --image debian-9-stretch-v20190326 --metadata startup-script="sudo apt update && sudo apt install gcc make -y && wget https://www.python.org/ftp/python/3.5.5/Python-3.5.5.tgz && tar xvf Python-3.5.5.tgz && cd Python-3.5.5 && sudo ./configure --enable-optimizations && sudo make altinstall" --subnet default --zone $ZONE
Check Google Cloud SDK for more background on the command used above.
With the following command you can check which Python 3.5.x are available (you have to some seconds after the last command returns though).
gcloud compute ssh $INSTANCE_ID --command 'for pythonv in python3 python3.5; do type $pythonv; $pythonv --version; done' --zone $ZONE
Just remember to use the python3.5 executable when you need Python 3.5.3.
To deploy your code, you can use the following command:
gcloud compute scp --recurse my_code_local/ $INSTANCE_ID:~ --zone $ZONE
Or research a solution along the lines of Cloud Build.
To upload or download data to or from a Google Cloud Storage Bucket from your Python application, you just need to use Cloud Storage Client Libraries.
Along with links for examples with download and upload operations with it.

CoreOS AWS userdata "docker run" on startup won't run

I'm trying to setup CoreOS on AWS to run specific commands on boot to download our DCOS cluster's info tarball and run scripts contained within it. These scripts help add the instance as an "agent" to our DC/OS cluster.
However, I don't seem to be able to get the docker run commands to run. I do see that the userdata is creating the tee's output file (which remains empty) and also the /opt/dcos_install_tmp/ directory (also remains empty).
The docker run commands here download an "awscli" container, fetch packages from S3 (using IAM instance profile credentials), and spit it out to the CoreOS file system.
Installing AWS CLI on CoreOS didn't seem straightforward (there's no package manager, no python), so I had to resort to this.
If I login to the instance and run the same commands by putting them in a script, I have absolutely no issues.
I check "journalctl --identifier=coreos-cloudinit" and found nothing to indicate issues. It just reports:
15:58:34 Parsing user-data as script
There is no "boot" log file for CoreOS in /var/log/ unlike in other AMIs.
I'm really stuck right now and would love some nudges in the right direction.
Here's my userdata (which I post as text during instance boot):
#!/bin/bash
/usr/bin/docker run -it --name cli governmentpaas/awscli aws s3 cp s3://<bucket>/dcos/dcos_preconfig.sh /root && /usr/bin/docker cp cli:/root/dcos_preconfig.sh . && /usr/bin/docker rm cli | tee -a /root/userdatalog.txt
/usr/bin/docker run -it --name cli governmentpaas/awscli aws s3 cp s3://<bucket>/dcos/dcos-install.tar /root && /usr/bin/docker cp cli:/root/dcos-install.tar . && /usr/bin/docker rm cli | tee -a /root/userdatalog.txt
sudo mkdir -p /opt/dcos_install_tmp
sudo tar xf dcos-install.tar -C /opt/dcos_install_tmp | tee -a /root/userdatalog.txt
sudo /bin/bash /opt/dcos_install_tmp/dcos_install.sh slave | tee -a /root/userdatalog.txt
Remove -t flag from the docker run command.
I had a similar problem: DigitalOcean: How to run Docker command on newly created Droplet via Java API
The problem ended up being the -t flag in the docker run command. Apparently this doesn't work because it isn't a terminal or something like that. Remove the flag and it runs fine.