dataproc hadoop/spark job can not connect to cloudSQL via Private IP - google-cloud-platform

I am facing this issue of setting up private ip access between dataproc and cloud sql with vpc network and peering setup, would really appreciate help since not able to figure this our since last 2 days of debugging, after following pretty much all the docs.
so far the setup i tried ( with internal IP only )
enabled "private google access" to default subnet and used the default subnetwork for the dataproc and SQL.
created the new VPX network/subnetwork and used that to create dataproc and updated cloud sql to use that network.
created ip range and "private service connection" to "google cloud platform" service provider -- enabled it as well. Along with vpc network peering to "servicenetworking"
explicitly added sql client role to default dataproc compute service account ( event though I didnt needed this for other VM connectivity to cloud sql, using the same role, because its a admin ("editor") role anyway. )
All according to the doc : https://cloud.google.com/sql/docs/mysql/private-ip and other links there
Problem:
when I submit spark job on dataproc that connects to this cloud sql, it fails with following error: Communications link failure....
Caused by: java.net.ConnectException: Connection refused (Connection refused)
Test & debug:
connectivity test all passes from the exact internal IP address on both side ( dataproc node and cloud sql node )
mysql command line client can connect fine from dataproc master node
checked cloud logging does not show any deny or issue in connecting mysql
screenshot for the connectivity test on both default and new vpc network.
other stackoverflow questions I referred on using private ip:
Cannot connect to Cloud SQL from Cloud Run after enabling private IP and turning off public iP
How to access Cloud SQL from dataproc?
ps: I want to avoid cloud proxy route to connect to cloud SQL from dataproc so dont want to install cloud_proxy service via initialization.

A "Connection refused" normally means that nothing is listening on the other end. The logs also contain hints that the database connection is attempted to localhost, port 3307. This is the right port for the CloudSQL proxy, one higher than the usual MySQL port.
Check whether the metadata configuration for your cluster is correct:
Workaround 1 :
Check the proxy is a different version in the cluster that is having issues version 1.xx. The difference in SQL proxy version seems to be in this issue. You can pin the suitable version of Cloud SQL proxy to 1.xx.
Workaround 2:
Run the command : journalctl -r -u cloud-sql-proxy.service | grep -i err,
Based on the logs check which sql proxy causes issues.
Check if the root cause may be the Data project was hitting "sql query per 100 sec per user" quota.
Actions:
Increase the Quota and restart the affected cloud sql proxy services (by monitoring jobs running on the master nodes that failed)
this is similar to the link but with the quota error preventing the startup instead of network errors in the link. With the updated quota, the cloud sql proxy should not have this reoccur.
here's a recommended set of next steps:
Reboot any nodes that appear to have a defunct/broken cloudsql proxy -- systemd won't report the truth, but running "mysql --host ... --port ..." trying to connect to the cloudsql proxy on the bad nodes would detect this.
Bump up API quota immediately - in Cloud Console just go to "IAM and Admin", go to "Quotas", search for the "Cloud SQL Admin API", click through it: then click on the pencil to "edit" and should be able to bump to 300 as self service without approval needed. If you want it to be more than 300 per 100s you might need to file an approval request.
If you look at the quota usage, if it's approaching 100 per 100s from time to time, update the quota to 300.
It's possible that the extra cloudsql proxy instances on the worker nodes are causing more load than is necessary just running cloudsql proxy on the master node. If the cluster is only using a driver that runs on a master node, then the other worker nodes don't need to run the proxy.
To find the nodes which are broken, you can see which are responding to the cloud sql proxy port.
You can loop over each hostname and ssh to it and run this command:
nc -zv localhost 3307 || sudo systemctl restart cloud-sql-proxy
or you could check the logs on each to see which ones have logged a quota message like this:
grep cloud_sql_proxy /var/log/syslog | tail
and see if the very last message they see says "Error 429: Quota exceeded for quota group 'default' and limit 'USER-100s' of service
'sqladmin.googleapis.com' for consumer ..."
The nodes which aren't running cloud sql proxy could be rebooted to start from scratch, or restart the proxy with this command on each:
"sudo systemctl restart cloud-sql-proxy"

Related

Why are outbound SSH connections from Google CloudRun to EC2 instances unspeakably slow?

I have a Node API deployed to Google CloudRun and it is responsible for managing external servers (clean, new Amazon EC2 Linux VM's), including through SSH and SFTP. SSH and SFTP actually work eventually but the connections take 2-5 MINUTES to initiate. Sometimes they timeout with handshake timeout errors.
The same service running on my laptop, connecting to the same external servers, has no issues and the connections are as fast as any normal SSH connection.
The deployment on CloudRun is pretty standard. I'm running it with a service account that permits access to secrets, etc. Plenty of memory allocated.
I have a VPC Connector set up, and have routed all traffic through the VPC connector, as per the instructions here: https://cloud.google.com/run/docs/configuring/static-outbound-ip
I also tried setting UseDNS no in the /etc/ssh/sshd_config file on the EC2 as per some suggestions online re: slow SSH logins, but that has not make a difference.
I have rebuilt and redeployed the project a few dozen times and all tests are on brand new EC2 instances.
I am attempting these connections using open source wrappers on the Node ssh2 library, node-ssh and ssh2-sftp-client.
Ideas?
Cloud Run works only until you have a HTTP request active.
You proably don't have an active request during this on Cloud Run, as outside of the active request the CPU is throttled.
Best for this pipeline is Cloud Workflows and regular Compute Engine instances.
You can setup a Workflow to start a Compute Engine for this task, and stop once it finished doing the steps.
I am the author of article: Run shell commands and orchestrate Compute Engine VMs with Cloud Workflows it will guide you how to setup.
Executing the Workflow can be triggered by Cloud Scheduler or by HTTP ping.

Connect to Google Cloud SQL from Macbook without firewall rules or allowed networks?

I leave this here in case someone else struggles with the same issue.
Visual representation of what I am trying to reach from my MacBook
MacBook -> VPN -> On-Prem Firewall -> GCP Firewall -> Cloud SQL Instance NOT working - detailed workaround below
GCE VM -> GCP Firewall -> Cloud SQL Instance Working
I had the issues where I could connect to Google Cloud SQL from GCE VM instance, but not from my MacBook, although I had firewall allow rules in place(which were correctly written).
I determined the problem was happening because I was on a work VPN that goes thru an On-Prem network that had is own firewall rules, so I had 2 firewalls to go thru, 1 On-prem and 1 GCP. I can edit the GCP Firewall rules, but am not allowed to do anything to the On-prem Firewall.
The workaround I found is the below:
Steps to be done in Google Cloud GUI
Enable SQl Admin Api for the project your instance is part of
Give instance Public IP: Edit SQL instance > Connectivity > Public IP > Save
Don't authorize any external networks
Steps to be done locally on your MacBook
Install gcloud SDK
, dont forget about running gcloud init
Install a mysql client
a.brew install mysql-client
b.echo 'export PATH="/usr/local/opt/mysql-client/bin:$PATH"' >> /Users/YOUR_USERNAME_HERE/.bash_profile
Download and install the SQL proxy(ignore the other steps 3,4,5 from the SQL Proxy article)
Disconnect from VPN
Run step 4 to start the SQL proxy
Connect to your instance from the mysql client(ex. mysql -u test_user --host 127.0.0.1 -p )
LE: The same approach can be done for windows users as well.
Any suggestions for optimization are welcome.

GCP Firewall allow connection from cloud build to compute engine instance

We have a GCE VM with MySQL server. Firewall rules deny incoming connections from external IP. Our cloud build process requires to perform DB migrations so it needs to connect to MySQL from Cloud Build. I want to add a Firewall rule to allow only cloud builder to connect through 3306 from the external IP address.
Cloud Build does not run on internal network so there is no way to connect from the internal IP.
I tried adding a rule for service account scope but I can't see cloud build service account in the list.
Currently, as mentioned by #guillaume blaquiere, there is a Feature Request. I recommend you to follow it (star) to receive all the updates there. Seems that the FR has been sent to the Cloud Build engineering team and they will evaluate it. Also note that there is not yet an ETA of the implementation.

Google Cloud Composer and Google Cloud SQL Proxy

I have a project with Cloud Composer and Cloud SQL.
I am able to connect to Cloud SQL because i edited the yaml of airflow-sqlproxy-service and added my Cloud SQL instance on cloud proxy used for the airflow-db, mapping to port 3307.
The workers can connect to airflow-sqlproxy-service on port 3307 but i think the webserver can't connect to this.
Do i need to add some firewall rule to map the 3307 port so the webserver or the UI can connect to airflow-sqlproxy-service?
https://i.stack.imgur.com/LwKQK.png
https://i.stack.imgur.com/CJf7Q.png
https://i.stack.imgur.com/oC2dJ.png
Best regards.
Composer does not currently support configuring additional sql proxies from the webserver. One workaround for cases like this is to have a separate DAG which loads Airflow Variables with the information needed from the other database (via the workers which do have access), then generate a DAG based on the Variable which the webserver can access.
https://github.com/apache/incubator-airflow/pull/4170 recently got merged (not currently available in Composer), which defines a CloudSQL connection type. This might work for these use cases in the future.

Cannot Connect by Cloud SQL Proxy from Cloud Shell By Proxy

I am following the Django sample for GAE and have problem to connect to Cloud SQL instance by Proxy from Google Cloud Shell. Possibly related to permission setting since I see the request not authorized,
Other context,
"gcloud beta sql connect auth-instance --user=root" has no problem to connect.
I have a service account for SQL Proxy Client.
I possibly miss something. Could someone please shed some light? Thanks in advance.
Thanks in advance.
Proxy log:
./cloud_sql_proxy -instances=auth-158903:asia-east1:auth-instance=tcp:3306
2017/02/17 14:00:59 Listening on 127.0.0.1:3306 for auth-158903:asia-east1:auth-instance
2017/02/17 14:00:59 Ready for new connections
2017/02/17 14:01:07 New connection for "auth-158903:asia-east1:auth-instance"
2017/02/17 14:03:16 couldn't connect to "auth-158903:asia-east1:auth-instance": dial tcp 107.167.191.26:3307: getsockopt: connection timed out
Client Log:
mysql -u root -p --host 127.0.0.1
Enter password:
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0
I also try with credential file but still no luck,
./cloud_sql_proxy -instances=auth-158903:asia-east1:auth-instance=tcp:3306 -credential_file=Auth-2eede8ae0d0b.jason
2017/02/17 14:21:36 using credential file for authentication; email=sql-proxy-client#auth-158903.iam.gserviceaccount.com
2017/02/17 14:21:36 Listening on 127.0.0.1:3306 for auth-158903:asia-east1:auth-instance
2017/02/17 14:21:36 Ready for new connections
2017/02/17 14:21:46 New connection for "auth-158903:asia-east1:auth-instance"
2017/02/17 14:21:48 couldn't connect to "auth-158903:asia-east1:auth-instance": ensure that the account has access to "auth-158903:asia-east1:auth-instance" (and make sure there's no typo in that name). Error during get instance auth-158903:asia-east1:auth-instance: googleapi: **Error 403: The client is not authorized to make this request., notAuthorized**
I can reproduce this issue exactly if I only give my service account "Cloud SQL Client" IAM role. When I give my service account the "Cloud SQL Viewer" role as well, it can then connect. I suggest you try this and see if it helps.
It looks like a network connectivity issue.
Read this carefully if you use a private IP :
https://cloud.google.com/sql/docs/mysql/private-ip
Note that the Cloud SQL instance is in a Google managed network and the proxy is meant to be used to simplify connections to the DB within the VPC network.
In short: running cloud-sql-proxy from a local machine will not work, because it's not in the VPC network. It should work from a Compute Engine VM that is connected to the same VPC as the DB.
What I usually do as a workaround is use gcloud ssh from a local machine and port forward over a small VM in compute engine, like:
gcloud beta compute ssh --zone "europe-north1-b" "instance-1" --project "my-project" -- -L 3306:cloud_sql_server_ip:3306
Then you can connect to localhost:3306 (make sure nothing else is running or change first port number to one that is free locally)
The Cloud SQL proxy uses port 3307 instead of the more usual MySQL port 3306. This is because it uses TLS in a different way and has different IP ACLs. As a consequence, firewalls that allow MySQL traffic won't allow Cloud SQL proxy by default.
Take a look and see if you have a firewall on your network that blocks port 3307. To use Cloud SQL proxy, authorize this port for outbound connections.