Why are outbound SSH connections from Google CloudRun to EC2 instances unspeakably slow? - amazon-web-services

I have a Node API deployed to Google CloudRun and it is responsible for managing external servers (clean, new Amazon EC2 Linux VM's), including through SSH and SFTP. SSH and SFTP actually work eventually but the connections take 2-5 MINUTES to initiate. Sometimes they timeout with handshake timeout errors.
The same service running on my laptop, connecting to the same external servers, has no issues and the connections are as fast as any normal SSH connection.
The deployment on CloudRun is pretty standard. I'm running it with a service account that permits access to secrets, etc. Plenty of memory allocated.
I have a VPC Connector set up, and have routed all traffic through the VPC connector, as per the instructions here: https://cloud.google.com/run/docs/configuring/static-outbound-ip
I also tried setting UseDNS no in the /etc/ssh/sshd_config file on the EC2 as per some suggestions online re: slow SSH logins, but that has not make a difference.
I have rebuilt and redeployed the project a few dozen times and all tests are on brand new EC2 instances.
I am attempting these connections using open source wrappers on the Node ssh2 library, node-ssh and ssh2-sftp-client.
Ideas?

Cloud Run works only until you have a HTTP request active.
You proably don't have an active request during this on Cloud Run, as outside of the active request the CPU is throttled.
Best for this pipeline is Cloud Workflows and regular Compute Engine instances.
You can setup a Workflow to start a Compute Engine for this task, and stop once it finished doing the steps.
I am the author of article: Run shell commands and orchestrate Compute Engine VMs with Cloud Workflows it will guide you how to setup.
Executing the Workflow can be triggered by Cloud Scheduler or by HTTP ping.

Related

How can I deploy and connect to a postgreSQL instance in AlloyDB without utilizing VM?

Currently, I have followed the google docs quick start docs for deploying a simple cloud run web server that is connected to AlloyDB. However, in the docs, it all seem to point towards of having to utilize VM for a postgreSQL client, which then is connected to my AlloyDB cluster instance. I believe a connection can only be made within the same VPC and/or a proxy service via the VM(? Please correct me if I'm wrong)
I was wondering, if I only want to give access to services within the same VPC, is having a VM a must? or is there another way?
You're correct. AlloyDB currently only allows connecting via Private IP, so the only way to talk directly to the instances is within the same VPC. The reason all the tutorials (e.g. https://cloud.google.com/alloydb/docs/quickstart/integrate-cloud-run, which is likely the quickstart you mention) talk about a VM is that in order to create your databases themselves within the AlloyDB cluster, set user grants, etc, you need to be able to talk to it from inside the VPC. Another option for example, would be to set up Cloud VPN to some local network to connect your LAN to the VPC directly. But that's slow, costly, and kind of a pain.
Cloud Run itself does not require the VM piece, the quickstart I linked to above walks through setting up the Serverless VPC Connector which is the required piece to connect Cloud Run to AlloyDB. The VM in those instructions is only for configuring the PG database itself. So once you've done all the configuration you need, you can shut down the VM so it's not costing you anything. If you needed to step back in to make configuration changes, you can spin the VM back up, but it's not something that needs to be running for the Cloud Run -> AlloyDB connection.
Providing public ip functionality for AlloyDB is on the roadmap, but I don't have any kind of timeframe for when it will be implemented.

EC2 instances connecting to lambda result in ConnectFailure

I'm trying to access lambda functions from a Windows VM I have created in EC2 for dev purposes but even a simple 'list functions' command fails to connect
I have tried using the AWS CLI through PowerShell, the dotnet sdk and the VS AWS Toolkit but each of these times out after a long waiting period. I can, however, list other services such as my databases and S3 buckets.
aws cli failure message
VS toolkit failure message
I have tried creating a new VM with the same results. I've disabled windows firewall altogether, allowed all traffic through the security group and have VPC endpoints for my subnet (ssm, ec2messages, lambda, ec2).
I have no trouble connecting to the lambda service through my own computer. On the VM, I have modified the .aws/credentials file to match the one on my computer for both the admin and current user but I still can't connect. This tells me that the problem isn't related to my access key credentials.
I'm reaching the end of the troubleshooting options I can think of so any help would be very much appreciated!
Update: using telnet, I cannot connect to lambda.ap-southeast-2 but I can connect to s3.ap-southeast-2 and lambda.ap-southeast-1. It seems lambda.ap-southeast-2 is being blocked somewhere but it isn't windows firewall because it's off and the same problem happens on Ubuntu VMs.
In the VPC Management Console, I haven't set up any firewalls under network or dns filewalls and my network ACL allows all traffic.

Private service to service communication for Google Cloud Run

I'd like to have my Google Cloud Run services privately communicate with one another over non-HTTP and/or without having to add bearer authentication in my code.
I'm aware of this documentation from Google which describes how you can do authenticated access between services, although it's obviously only for HTTP.
I think I have a general idea of what's necessary:
Create a custom VPC for my project
Enable the Serverless VPC Connector
What I'm not totally clear on is:
Is any of this necessary? Can Cloud Run services within the same project already see each other?
How do services address one another after this?
Do I gain the ability to use simpler by-convention DNS names? For example, could I have each service in Cloud Run manifest on my VPC as a single first level DNS name like apione and apitwo rather than a larger DNS name that I'd then have to hint in through my deployments?
If not, is there any kind of mechanism for services to discover names?
If I put my managed Cloud SQL postgres database on this network, can I control its DNS name?
Finally, are there any other gotchas I might want to be aware of? You can assume my use case is very simple, two or more long lived services on Cloud Run, doing non-HTTP TCP/UDP communications.
I also found a potentially related Google Cloud Run feature request that is worth upvoting if this isn't currently possible.
Cloud Run services are only reachable through HTTP request. you can't use other network protocol (SSH to log into instances for example, or TCP/UDP communication).
However, Cloud Run can initiate these kind of connection to external services (for instance Compute Engine instances deployed in your VPC, thanks to the serverless VPC Connector).
the serverless VPC connector allow you to make a bridge between the Google Cloud managed environment (where live the Cloud Run (and Cloud Functions/App Engine) instances) and the VPC of your project where you have your own instances (Compute Engine, GKE node pools,...)
Thus you can have a Cloud Run service that reach a Kubernetes pods on GKE through a TCP connection, if it's your requirement.
About service discovery, it's not yet the case but Google work actively on that and Ahmet (Google Cloud Dev Advocate on Cloud Run) has released recently a tool for that. But nothing really build in.

GCP Firewall allow connection from cloud build to compute engine instance

We have a GCE VM with MySQL server. Firewall rules deny incoming connections from external IP. Our cloud build process requires to perform DB migrations so it needs to connect to MySQL from Cloud Build. I want to add a Firewall rule to allow only cloud builder to connect through 3306 from the external IP address.
Cloud Build does not run on internal network so there is no way to connect from the internal IP.
I tried adding a rule for service account scope but I can't see cloud build service account in the list.
Currently, as mentioned by #guillaume blaquiere, there is a Feature Request. I recommend you to follow it (star) to receive all the updates there. Seems that the FR has been sent to the Cloud Build engineering team and they will evaluate it. Also note that there is not yet an ETA of the implementation.

Restrict network activity in Google Cloud Run

I'm using Cloud Run containers to run untrusted (user-supplied) code. The container receives a POST request, runs the code, and responds with the result. For security reasons, it's deployed on a locked down service account, but I also want to block all other network activity. How can this be accomplished?
Cloud Run (managed) currently doesn't offer firewall restrictions to selectively block inbound or outbound traffic by IP/host. I'm assuming you're trying to block connections initiated from container to outside. In the future, Cloud Run has plans to add support for Google Cloud VPC Service Controls feature, so that might help.
However, if you have a chance to use Cloud Run for Anthos (on GKE) which has a similar developer experience but runs on Kubernetes clusters, you can actually easily write Kubernetes NetworkPolicy policies (which I have some recipes here) to control which sort of traffic can come/go from the containers running. You can also use GCE firewall rules and VPC service controls when using a Kubernetes cluster.
Other than that, your only option on a Cloud Run (fully managed) environment is to use Linux iptables command while starting your container to block certain network patterns. Importantly, note that Cloud Run (fully managed) runs on a gVisor sandbox which emulates system calls. And many of the features in iptables are currently not implemented/supported in gVisor. By looking at issue tracker and patches , I can tell that it's on the roadmap and some may even be working today.
You could couple the Cloud Run (managed) deployment to a VPC Network that doesn't have any internet access.
I figured this out for my usecase (blocking all egress).
In the first generation of cloud run atlease, there's 2 eth interfaces - eth0 and eth2. Blocking traffic on eth2 solves egress traffic.
iptables -I OUTPUT -o eth2 -j DROP
Run this on startup of the container/app and then ensure the running application is not run (and hence cannot undo this).