GCLB Connection Drainning for Cloud Run - google-cloud-platform

How does GCLB connection drainning apply to Cloud Run when Cloud Run + Serverless NEG is used as a backend?
Also, when using http or http/2 as a connection between LB and Cloud Run, is there any difference in behavior between the two?
I am using Cloud Run in the above configuration, and I am getting 503 errors when the container drops, so I thought it might have to do with Connection Drainning in GCLB.

Related

dataproc hadoop/spark job can not connect to cloudSQL via Private IP

I am facing this issue of setting up private ip access between dataproc and cloud sql with vpc network and peering setup, would really appreciate help since not able to figure this our since last 2 days of debugging, after following pretty much all the docs.
so far the setup i tried ( with internal IP only )
enabled "private google access" to default subnet and used the default subnetwork for the dataproc and SQL.
created the new VPX network/subnetwork and used that to create dataproc and updated cloud sql to use that network.
created ip range and "private service connection" to "google cloud platform" service provider -- enabled it as well. Along with vpc network peering to "servicenetworking"
explicitly added sql client role to default dataproc compute service account ( event though I didnt needed this for other VM connectivity to cloud sql, using the same role, because its a admin ("editor") role anyway. )
All according to the doc : https://cloud.google.com/sql/docs/mysql/private-ip and other links there
Problem:
when I submit spark job on dataproc that connects to this cloud sql, it fails with following error: Communications link failure....
Caused by: java.net.ConnectException: Connection refused (Connection refused)
Test & debug:
connectivity test all passes from the exact internal IP address on both side ( dataproc node and cloud sql node )
mysql command line client can connect fine from dataproc master node
checked cloud logging does not show any deny or issue in connecting mysql
screenshot for the connectivity test on both default and new vpc network.
other stackoverflow questions I referred on using private ip:
Cannot connect to Cloud SQL from Cloud Run after enabling private IP and turning off public iP
How to access Cloud SQL from dataproc?
ps: I want to avoid cloud proxy route to connect to cloud SQL from dataproc so dont want to install cloud_proxy service via initialization.
A "Connection refused" normally means that nothing is listening on the other end. The logs also contain hints that the database connection is attempted to localhost, port 3307. This is the right port for the CloudSQL proxy, one higher than the usual MySQL port.
Check whether the metadata configuration for your cluster is correct:
Workaround 1 :
Check the proxy is a different version in the cluster that is having issues version 1.xx. The difference in SQL proxy version seems to be in this issue. You can pin the suitable version of Cloud SQL proxy to 1.xx.
Workaround 2:
Run the command : journalctl -r -u cloud-sql-proxy.service | grep -i err,
Based on the logs check which sql proxy causes issues.
Check if the root cause may be the Data project was hitting "sql query per 100 sec per user" quota.
Actions:
Increase the Quota and restart the affected cloud sql proxy services (by monitoring jobs running on the master nodes that failed)
this is similar to the link but with the quota error preventing the startup instead of network errors in the link. With the updated quota, the cloud sql proxy should not have this reoccur.
here's a recommended set of next steps:
Reboot any nodes that appear to have a defunct/broken cloudsql proxy -- systemd won't report the truth, but running "mysql --host ... --port ..." trying to connect to the cloudsql proxy on the bad nodes would detect this.
Bump up API quota immediately - in Cloud Console just go to "IAM and Admin", go to "Quotas", search for the "Cloud SQL Admin API", click through it: then click on the pencil to "edit" and should be able to bump to 300 as self service without approval needed. If you want it to be more than 300 per 100s you might need to file an approval request.
If you look at the quota usage, if it's approaching 100 per 100s from time to time, update the quota to 300.
It's possible that the extra cloudsql proxy instances on the worker nodes are causing more load than is necessary just running cloudsql proxy on the master node. If the cluster is only using a driver that runs on a master node, then the other worker nodes don't need to run the proxy.
To find the nodes which are broken, you can see which are responding to the cloud sql proxy port.
You can loop over each hostname and ssh to it and run this command:
nc -zv localhost 3307 || sudo systemctl restart cloud-sql-proxy
or you could check the logs on each to see which ones have logged a quota message like this:
grep cloud_sql_proxy /var/log/syslog | tail
and see if the very last message they see says "Error 429: Quota exceeded for quota group 'default' and limit 'USER-100s' of service
'sqladmin.googleapis.com' for consumer ..."
The nodes which aren't running cloud sql proxy could be rebooted to start from scratch, or restart the proxy with this command on each:
"sudo systemctl restart cloud-sql-proxy"

Cloud run invoking another Cloud run with "Allow internal traffic and traffic from Cloud Load Balancing" fails with error 403

I have built a Django backend and deployed it in cloud run. I have also built a react frontend that was also deployed in cloud run. Frontend calls Django backend. Everything works while backend Allow all traffic, when I change it to "Allow internal traffic and traffic from Cloud Load Balancing" I get 403 error. Both are using VPC connector. And also both are on un-authenticated cloud Run.
Focus on your architecture and where the code is running.
Your backend run on Cloud Run
Your front ent? it's served by Cloud Run, but executed on your browser.
That's why, your browser haven't a serverlessVPC connector or something like that and the request to the backend come from the internet, nothing from your Cloud Run frontend.

Why are outbound SSH connections from Google CloudRun to EC2 instances unspeakably slow?

I have a Node API deployed to Google CloudRun and it is responsible for managing external servers (clean, new Amazon EC2 Linux VM's), including through SSH and SFTP. SSH and SFTP actually work eventually but the connections take 2-5 MINUTES to initiate. Sometimes they timeout with handshake timeout errors.
The same service running on my laptop, connecting to the same external servers, has no issues and the connections are as fast as any normal SSH connection.
The deployment on CloudRun is pretty standard. I'm running it with a service account that permits access to secrets, etc. Plenty of memory allocated.
I have a VPC Connector set up, and have routed all traffic through the VPC connector, as per the instructions here: https://cloud.google.com/run/docs/configuring/static-outbound-ip
I also tried setting UseDNS no in the /etc/ssh/sshd_config file on the EC2 as per some suggestions online re: slow SSH logins, but that has not make a difference.
I have rebuilt and redeployed the project a few dozen times and all tests are on brand new EC2 instances.
I am attempting these connections using open source wrappers on the Node ssh2 library, node-ssh and ssh2-sftp-client.
Ideas?
Cloud Run works only until you have a HTTP request active.
You proably don't have an active request during this on Cloud Run, as outside of the active request the CPU is throttled.
Best for this pipeline is Cloud Workflows and regular Compute Engine instances.
You can setup a Workflow to start a Compute Engine for this task, and stop once it finished doing the steps.
I am the author of article: Run shell commands and orchestrate Compute Engine VMs with Cloud Workflows it will guide you how to setup.
Executing the Workflow can be triggered by Cloud Scheduler or by HTTP ping.

encrypted links from google cloudrun svc to cloudrun svc

Backstory(but possibly can be skipped): The other day, I finished connecting to MySQL full SSL from a Cloud Run service without really doing any SSL cert stuff which was great!!! Just click 'only allow SSL' in GCP and click 'generate server certs', allow my Cloud Run service to have access to database instance, swap out tcp socket factory with google's factory and set some props and it worked which was great!
PROBLEM:
NOW, I am trying to figure out the secure Google Cloud Run service to Cloud Run service security and reading
https://cloud.google.com/run/docs/authenticating/service-to-service
which has us requesting a token over HTTP??? Why is this not over HTTPS? Is communication from my Docker container to the token service actually encrypted?
Can I communicate HTTP to HTTP between two Cloud Run services and it will be encrypted?
thanks,
Dean
From https://cloud.google.com/compute/docs/storing-retrieving-metadata#is_metadata_information_secure:
When you make a request to get information from the metadata server, your request and the subsequent metadata response never leave the physical host that is running the virtual machine instance.
The traffic from your container to the metadata server at http://metadata/ stays entirely within your project and thus SSL is not required, there is no opportunity for it to be intercepted.

Get HTTP request logs from kubernetes pods ? (Running JupyterHub)

I am running JupyterHub application on a kubernetes cluster (specifically, managed kubernetes on aws, EKS). Each JupyterHub user has their own pod, when they spin up their JupyterHub notebook server.
I need to be able to monitor the HTTP requests that are being made from their notebook server.
Is there any way for me to enable this type of logging? And if so, how could I consume these logs?
With Istio service mesh you will be able to trace all incoming/outgoing HTTP requests within your JupyterHub pod.
Alternatively, you may use Zipkin - a distributed tracing system