SSHOperator with ComputeEngineSSHHook - google-cloud-platform

I am trying to run a command using ssh in a GCP VM in airflow via the SSHOperator as described here:
ssh_to_vm_task = SSHOperator(
task_id="ssh_to_vm_task",
ssh_hook=ComputeEngineSSHHook(
instance_name=<MYINSTANCE>,
project_id=<MYPROJECT>,
zone=<MYZONE>,
use_oslogin=False,
use_iap_tunnel=True,
use_internal_ip=False
),
command="echo test_message",
dag=dag
)
However, I get a airflow.exceptions.AirflowException: SSH operator error: [Errno 2] No such file or directory: 'gcloud' error.
Docker is installed via docker-compose following these instructions.
Other Airflow GCP operators (such as BigQueryCheckOperator) work correctly. So at first sight it does not seem like a configuration problem.
Could you please help me? Is this a bug?

It seems the issue is that gcloud was not installed in the docker container by default. This has been solved by following instructions in here: it is necessary to add
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] http://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && apt-get update -y && apt-get install google-cloud-sdk -y
to the dockerfile that is used to install airflow / install dependencies.

Check if the TCP port 22 is allowed through the firewall on your GCP VM instance, and make sure that the VM instance also allows SSH access and is properly configured in that VM instance. Furthermore, be sure that the IP from which you are trying to SSH at the VM instance is whitelisted through the firewall.
You can use the following command in GCP to check the ingress firewall rule for the network that contains the destination VM instance. Additionally, you can consult this [link]for more information.
This is an example of what you have to do.
´´´
gcloud compute firewall-rules list --filter network=[NETWORK-NAME] \
--filter INGRESS \
--sort-by priority \
--format="table(
name,
network,
direction,
priority,
sourceRanges.list():label=SRC_RANGES,
destinationRanges.list():label=DEST_RANGES,
allowed[].map().firewall_rule().list():label=ALLOW,
denied[].map().firewall_rule().list():label=DENY,
sourceTags.list():label=SRC_TAGS,
sourceServiceAccounts.list():label=SRC_SVC_ACCT,
targetTags.list():label=TARGET_TAGS,
targetServiceAccounts.list():label=TARGET_SVC_ACCT
)"
´´´

Related

Cloud Run: Forbidden error while accesing service

I have created a Wordpress Service using Cloud Run . I deployed using below command
gcloud beta run deploy wp --image gcr.io/<project>/wp:v1 \
--add-cloudsql-instances <project>:us-central1:mysql2 \
--update-env-vars DB_HOST='127.0.0.1',DB_NAME=mysql2,DB_USER=wordpress,DB_PASSWORD=password,CLOUDSQL_INSTANCE='<project>:us-central1:mysql2'
The service is deployed fine but while trying to access the service it is showing below error
<h1>Error: Forbidden</h1>
<h2>Your client does not have permission to get URL <code>/</code> from this server.</h2>
UPDATES:
Dockerfile is as follows . I am following this...
https://github.com/acadevmy/cloud-run-wordpress
FROM wordpress:5.2.1-php7.3-apache
EXPOSE 80
# Use the PORT environment variable in Apache configuration files.
RUN sed -i 's/80/${PORT}/g' /etc/apache2/sites-available/000-default.conf /etc/apache2/ports.conf
# wordpress conf
COPY wordpress/wp-config.php /var/www/html/wp-config.php
# download and install cloud_sql_proxy
RUN apt-get update && apt-get -y install net-tools wget && \
wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O /usr/local/bin/cloud_sql_proxy && \
chmod +x /usr/local/bin/cloud_sql_proxy
COPY wordpress/cloud-run-entrypoint.sh /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["/usr/local/sbin/apache2ctl -D FOREGROUND"]
##docker-entrypoint.sh
#!/usr/bin/env bash
# Start the sql proxy
cloud_sql_proxy -instances=$CLOUDSQL_INSTANCE=tcp:3306 &
# Execute the rest of your ENTRYPOINT and CMD as expected.
Following can be seen in Console Log
We allowed Unauthenticated authentication and now the error is
"Error establishing a database connection"
Additional Updates:
The DB is running with a private IP so using Serverless VPC .
DB information is as follows:
gcloud sql instances list
NAME DATABASE_VERSION LOCATION TIER PRIMARY_ADDRESS PRIVATE_ADDRESS STATUS
mysql2 MYSQL_5_7 us-central1-b db-f1-micro - 10.0.100.5 RUNNABLE
This is Serverless VPC access range
testserverlessvpc kube-shared-vpc us-central1 192.168.60.0/28 200 300
Now I have added an additional parameter as shown below with both gcloud run deploy and gcloud run service command
--vpc-connector projects/< HOST-Project >/locations/us-central1/connectors/testserverlessvpc
But during gcloud run deploy it is failing with below error
⠏ Deploying new service... Internal system error, system will retry.

Connect to Memorystore from Cloud Run

I want to run a service on Google Cloud Run that uses Cloud Memorystore as cache.
I created an Memorystore instance in the same region as Cloud Run and used the example code to connect: https://github.com/GoogleCloudPlatform/golang-samples/blob/master/memorystore/redis/main.go this didn't work.
Next I created a Serverless VPC access Connectore which didn't help. I use Cloud Run without a GKE Cluster so I can't change any configuration.
Is there a way to connect from Cloud Run to Memorystore?
To connect Cloud Run (fully managed) to Memorystore you need to use the mechanism called "Serverless VPC Access" or a "VPC Connector".
As of May 2020, Cloud Run (fully managed) has Beta support for the Serverless VPC Access. See Connecting to a VPC Network for more information.
Alternatives to using this Beta include:
Use Cloud Run for Anthos, where GKE provides the capability to connect to Memorystore if the cluster is configured for it.
Stay within fully managed Serverless but use a GA version of the Serverless VPC Access feature by using App Engine with Memorystore.
While waiting for serverless VPC connectors on Cloud Run - Google said yesterday that announcements would be made in the near term - you can connect to Memorystore from Cloud Run using an SSH tunnel via GCE.
The basic approach is the following.
First, create a forwarder instance on GCE
gcloud compute instances create vpc-forwarder --machine-type=f1-micro --zone=us-central1-a
Don't forget to open port 22 in your firewall policies (it's open by default).
Then install the gcloud CLI via your Dockerfile
Here is an example for a Rails app. The Dockerfile makes use of a script for the entrypoint.
# Use the official lightweight Ruby image.
# https://hub.docker.com/_/ruby
FROM ruby:2.5.5
# Install gcloud
RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz
RUN mkdir -p /usr/local/gcloud \
&& tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz \
&& /usr/local/gcloud/google-cloud-sdk/install.sh
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin
# Generate SSH key to be used by the SSH tunnel (see entrypoint.sh)
RUN mkdir -p /home/.ssh && ssh-keygen -b 2048 -t rsa -f /home/.ssh/google_compute_engine -q -N ""
# Install bundler
RUN gem update --system
RUN gem install bundler
# Install production dependencies.
WORKDIR /usr/src/app
COPY Gemfile Gemfile.lock ./
ENV BUNDLE_FROZEN=true
RUN bundle install
# Copy local code to the container image.
COPY . ./
# Run the web service on container startup.
CMD ["bash", "entrypoint.sh"]
Finally open an SSH tunnel to Redis in your entrypoint.sh script
# !/bin/bash
# Memorystore config
MEMORYSTORE_IP=10.0.0.5
MEMORYSTORE_REMOTE_PORT=6379
MEMORYSTORE_LOCAL_PORT=6379
# Forwarder config
FORWARDER_ID=vpc-forwarder
FORWARDER_ZONE=us-central1-a
# Start tunnel to Redis Memorystore in background
gcloud compute ssh \
--zone=${FORWARDER_ZONE} \
--ssh-flag="-N -L ${MEMORYSTORE_LOCAL_PORT}:${MEMORYSTORE_IP}:${MEMORYSTORE_REMOTE_PORT}" \
${FORWARDER_ID} &
# Run migrations and start Puma
bundle exec rake db:migrate && bundle exec puma -p 8080
With the solution above Memorystore will be available to your application on localhost:6379.
There are a few caveats though
This approach requires the service account configured on your Cloud Run service to have the roles/compute.instanceAdmin role, which is quite powerful.
The SSH keys are backed into the image to speedup container boot time. That's not ideal.
There is no failover if your forwarder crashes.
I've written a longer and more elaborated approach in a blog post that improves the overall security and adds failover capabilities. The solution uses plain SSH instead of the gcloud CLI.
If you need something in your VPC, you can also spin up Redis on Compute Engine
It's more costly (especially for a Cluster) than Redis Cloud - but an temp solution if you have to keep the data in your VPC.

Can't customize port for Jupyter and Zeppelin at Google DataProc creation time

I have a DataProc cluster that initialize DataLab, and install Jupyter and Zeppelin as optional components. I want to make the Jupyter port as 8124, and Zeppelin port as 8081 at cluster creation time. I need them to be exclusively in these two ports and not any other ports. I used the following command with gcloud dataproc clusters create at cluster creation time:
--metadata ZEPPELIN-PORT=8081 (tried --metadata zeppelin-port=8081 as well)
--metadata JUPYTER_PORT=8124
However, they are both still using their default port, i.e., 8123 for jupyter and 8080 for zeppelin, while 8124 and 8081 are unavailable. What makes things worse, since DataLab uses 8080 by default as well, I'm unable to access DataLab from this port but only zeppelin.
I can customize the port AFTER creation time, but that's not ideal for my use cases.
Any suggestions are appreciated. Thank you.
Using last Dataproc version you should be able to remap the ports
Image 1.3 and 1.4:
Allow remapping Jupyter and Zeppelin Optional Component ports via dataproc:{jupyter,zeppelin}.port properties
https://cloud.google.com/dataproc/docs/release-notes#may_9_2019
Unfortunately there is indeed no way to do this in a first-class supported property at the moment, but it could become a feature in Dataproc someday in the future.
In the meantime, however, running an initialization action which modifies the ports should be effectively equivalent to modifying it through a property, with just a few seconds of delay to reboot the services.
The following init action will remap Jupyter to 8124 and Zeppelin 8081 automatically at cluster-creation time, and also works with Dataproc Component Gateway if that is enabled.
#!/bin/bash
# change-ports.sh
ZEPPELIN_PORT=8081
JUPYTER_PORT=8124
readonly ROLE="$(/usr/share/google/get_metadata_value attributes/dataproc-role)"
if [[ "${ROLE}" == 'Master' ]]; then
if [ -f /etc/zeppelin/conf/zeppelin-env.sh ]; then
echo "export ZEPPELIN_PORT=${ZEPPELIN_PORT}" \
>> /etc/zeppelin/conf/zeppelin-env.sh
systemctl restart zeppelin
fi
if [ -f /etc/jupyter/jupyter_notebook_config.py ]; then
echo "c.NotebookApp.port = ${JUPYTER_PORT}" \
>> /etc/jupyter/jupyter_notebook_config.py
systemctl restart jupyter
fi
if [ -f /etc/knox/conf/topologies/default.xml ]; then
sed -i "s/localhost:8080/localhost:${ZEPPELIN_PORT}/g" \
/etc/knox/conf/topologies/default.xml
sed -i "s/localhost:8123/localhost:${JUPYTER_PORT}/g" \
/etc/knox/conf/topologies/default.xml
systemctl restart knox
fi
fi

User Data is not running on EC2 instance in Private VPC subnet

This is the user data used:
#!/bin/bash
yum install httpd -y
yum update -y
aws s3 cp s3://YOURBUCKETNAMEHERE/index.html /var/www/html/
service httpd start
chkconfig httpd on
NAT gateway is configured for the private EC2 instance and also s3fullaccess permissions are given.
Please help me troubleshoot!
You can add some code to the start of your user-data script to redirect the output to logs.
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
Then you can use those logs to troubleshoot from the AWS Console. Select the instance, then Actions menu -> Instance settings -> Get system log. Here is more documentation on what to add to your bash script, as well as a video that shows where to find the logs.

Configuring AWS Elastic Beanstalk Timezone for Auto Scaling

I have a single instance server deployed on AWS - Elastic Beanstalk that needs timezone configuration, and I changed the timezone as logging into the EC2 environment with ssh, and update it with the linux commands listed below;
sudo rm /etc/localtime
sudo ln -sf /usr/share/zoneinfo/Europe/Istanbul /etc/localtime
sudo reboot
Everything is fine as the server is running as a single instance. The problem arose as I wanted to use Auto Scaling, Load Balancing feature. On single instance, updating the timezone on linux AMI is fine, but on auto scaling mode, because that the instances are created/destroyed/recreated according to the threshold metrics, all the configuration is lost.
My simple question is, how can I change/configure the timezone for an auto scalable, load balancing mode in AWS Elastic Beanstalk ?
you can configure the newly starting server with ebextensions.
Here's an example that works for me. Add the following command into the file .ebextensions/timezone.config:
commands:
set_time_zone:
command: ln -f -s /usr/share/zoneinfo/US/Pacific /etc/localtime
The answers here only managed to work for me partially (I had errors deploying when using the answers above). After some modifications, the following worked for me. I believe it has something to do with "cwd" and "permissions".
commands:
0000_0remove_localtime:
command: rm -rf /etc/localtime
0000_1change_clock:
command: sed -i 's/UTC/Asia\/Singapore/g' /etc/sysconfig/clock
cwd: /etc/sysconfig
0000_2link_singapore_timezone:
command: ln -f -s /usr/share/zoneinfo/Asia/Singapore /etc/localtime
cwd: /etc
For my first answer on StackOverflow ... I have to add new information to an excellent earlier answer.
For AWS Linux 2, Elastic Beanstalk, there is a new simple method of setting time. Add the following commands into the file .ebextensions/xxyyzz.config:
container_commands:
01_set_bne:
command: "sudo timedatectl set-timezone Australia/Brisbane"
command: "sudo systemctl restart crond.service"
I'm not sure if the second command is absolutely essential, but the instances certainly play nice with it there (especially with tasks due to happen right away !).
You can also configure it via ssh in the command line:
when connected to your Elastic Beanstalk Instance:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#change_time_zone
sudo ln -sf /usr/share/zoneinfo/America/Montreal /etc/localtime
You can connect to your EB instance with the eb command line tool.
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb3-cmd-commands.html
eb ssh