Can a private Cloud data fusion connect to the internet? - google-cloud-platform

Our application is made of a spring-bot app server deployed through "cloud run" and a "cloud sql postgres" database.
The database is private and connected to a private VPC .
The app server can connect to the database through a gateway to this private VPC provided by the "cloud run" configuration.
We'd like to feed this database with "cloud data fusion" (CDF) periodically.
CDF should fetch data from AWS S3 and push it into our database.
We've designed and validated a pipeline for that purpose but we're facing a network paradox :
Either CDF is public, can read from S3 over internet, but can't reach the cloud database
or CDF is private, can reach our database but can't reach internet for S3 fetching...
How can CDF both write to the private database and read data from the internet ?
I'm surprised that a CDF instance, even being private, can't establish an EGRES connection to an internet resouce.

Cloud Data fusion is a tool that help you to build pipeline (based on CDAP). If you set the Data Fusion private, it's the access to the tool that is private, not the runtime! On Google Cloud, the pipeline runs on Dataproc cluster.
So now, the question is: Can your Dataproc cluster reach internet and your database?
If your cluster run in the same VPC as your Cloud SQL database private IP connection, and there is no firewall rule that prevent the communication, it's OK
If your Compute Engines that compose your cluster have public IP, no problem, you can access to public URL. Else, as said by John Hanley, you can create a Cloud NAT to allow your Compute Engine to initiate call to external URL.

Related

BigQuery data transfer service does not work when using VPC

I have an issue when migrating Redshift to BigQuery. So what have I done so far?
Created VPN that connects GCP VPC and AWS VPC. (VPCs IPs are not overlapped)
VPN works excellent. (I tested: created EC2 instance and pinged through GCP Compute Engine VM to AWS EC2 instance private IP ---> it works excellent)
I created Redshift instance with publicly accessible option ----> then created BigQuery data transfer service ----> It works excellent
BUT, when I create a Redshift cluster with NO publicly accessible option ----> Then create BigQuery data transfer service, it brings me an error
ERROR:
Unable to proceed: Could not connect with provided parameters: No suitable driver found for jdbc:redshift://redshift-cluster-1.cbr8ra8jmxgm.us-east-1.redshift.amazonaws.com:5439/dev
Also I tried to ping to AWS Redshift IP address from GCP Compute Engine VM. -----> It does not ping.
What can be the reason?

Cloud Sql Proxy Private IP External application

Simple question: Is there any way to connect to a GCP SQL Database under private IP through cloud sql proxy from an external application ? (local development environment)
I followed every step in the official tutorial to configure cloud sql proxy with all requirements, but all connections fail in a sort of timeout.
In order to connect to a Cloud SQL instance using only private IP through the Cloud SQL proxy will be to install the proxy within a resource (could be for example a Compute Engine instance) with access to the same VPC network as the Cloud SQL instance. Since your local development environment might not comply to that requirement the connection will fail.
You could move your local development environment to a compute engine instance located within the same VPC network as your Cloud SQL instance or temporarily enable Public IP on the instance and authorize only your IP through the authorized network options and use the Cloud SQL proxy to gain access to the instance.
The Cloud SQL proxy will work with a private IP address as long as it can reach the private IP address.
See the "Connecting from an external source" section on the Configuring Private IP page for a list of steps to set up a VPN that can provide access to your VPC.

Connecting Google Cloud Run Service to Google Cloud SQL database

I have 2 google cloud services:
Google Cloud Run Service (Node Js / Strapi)
Google Cloud SQL Service (Mysql)
I have added the Cloud SQL connection to the Google Cloud Run Service from the UI, and have a public IP for the Google Cloud SQL Service. On top of that I have added the Run Service IP to the Authorised networks of SQL Service.
If I try and connect from another server (external from Google cloud) I can easily connect to the Google Cloud SQL Service and execute queries.
But if I try and connect from inside the GCloud Run Service with exactly the same settings (Ip, database_name, etc) my connection hangs and I get a timeout error in the logs...
How to properly allow Gcloud SQL to accept connections from GCloud RUN?
I looked for other answers in here, but they all look very old (around 2015 )
You can use 3 modes to access to your database
Use the built-in feature. In this case, you don't need to specify the IP address, it's a linux socket that is open to communicate with the database as described in the documentation
Use Cloud SQL private IP. This time, no need to configure a connection in the Cloud Run service, you won't use it because you will use the IP, not the linux socket. This solution required 2 things
Firstly attach your database to your VPC and give it a private IP
Then, you need to route the private IP traffic of Cloud Run through your VPC. For this you have to create, and then to attach to the Cloud RUn service, a serverless VPC Connector
Use CLoud SQL public IP. This time again, no need to configure a connection in the Cloud Run service, you won't use it because you will use the IP, not the linux socket. To achieve this, you need more steps (and it's less secure)
You need to route all the egress traffic of Cloud Run through your VPC. For this you have to create, and then to attach to the Cloud RUn service, a serverless VPC Connector
Deploy your Cloud Run service with the Serverless VPC Connector and the egress connectivity param to "all"
Then create a Cloud NAT to route all the VPC Connector ip range traffic to a single IP (or set of IPs) (The link is the Cloud Functions documentation, but it works exactly in the same way)
Finally authorize the Cloud NAT IP(s) on Cloud SQL authorized networks.
In your case, you have whitelisted the Cloud Run IP, but it's a shared IP (other service can use the same!! Be careful) and it's not always the same, there is a pool of IP addresses used by Google cloud.

Connecting an AWS EC2 to a Google Cloud SQL instance locally using VPN Gateway

I have an AWS account with an EC2 in it that I am trying to connect to a Cloud SQL Server (MySQL 5.6) inside of Google Cloud Platform.
I have successfully set up a VPN between AWS and GCP and can echo a message over nc between an ec2 on AWS and a vm on GCP.
As GCP managed DB's are not placed inside of a VPC of my choosing I followed this guide to give the DB a private IP and to then peer that with my google VPC. I tested this works by accessing the DB via pymsql from an VM in GCP using the private IP of the DB.
However my issues come from connecting the EC2 inside of AWS to the Cloud SQL DB in the same way, I have followed this guide to allow the use of the DB's private IP from an external source but I seem to be getting stuck with how to set the routing up to the peered network the DB is sitting in using AWS Routing.
The problem has been sorted!
In the Advertised routes Settings of my Cloud Router, I had misunderstood the function of Advertise all subnets visible to the Cloud Router (Default)
I needed to instead choose Create custom routes" And then the sub-option Advertise all subnets visible to the Cloud Router.
This then allowed me to add the Cloud SQL subnet to my router to that IP block propagate over to AWS.

Cannot connect to Cloud SQL from Cloud Run after enabling private IP and turning off public iP

I have a postgreSQL CLoud SQL instance which I am connecting to via UNIX socket and the instance name from a Cloud Run container as per the documentation. With a public IP, this connection works fine. I was looking to turn off the public IP and only have a private IP, so I would not be charged for the public IP going forward.
When I first created the Cloud SQL instance, I only enabled the public IP. A couple of days later I enabled the private IP. For the assocaited network for the private IP, I accepted the default as the Cloud Run instance is in the same project.
When I turn off the public IP, my application can no longer connect to the Cloud SQL instance. I get a connection refused error:
sqlalchemy.exc.InterfaceError: (pg8000.core.InterfaceError) ('communication error', ConnectionRefusedError(111, 'Connection refused'))
As stated above, I did follow the instruaction on the Connecting to Cloud SQL from Cloud Run page:
https://cloud.google.com/sql/docs/postgres/connect-run
I even ran the gcloud command to update an the exsiting deployed revision after turning off the public IP and only having the private IP available but it made no difference.
Is a public IP required for a connection from Cloud Run to Cloud SQL? I do not see that in the connection documentation page. Or is there something else I missed when trying to switch over to only having a private IP? Or do I need to create a new Cloud Instance without a public IP and go through the instructions for connecting Cloud Run via an instance anme again?
Is a public IP required for a connection from Cloud Run to Cloud SQL? I do not see that in the connection documentation page.
On the Connecting to Cloud SQL from Cloud Run page, it says "Note: These instructions require your Cloud SQL instance to have a public IP address configured."
Private IP access is access from a Virtual Private Cloud (VPC). In order to access your instance through a VPC, the resource you are connecting to needs to be a part of the VPC. Cloud Run doesn't currently support VPC access, so you'll need to use have a public IP for now.
TL;DR: Open a case to the Google support
Your case is interesting because, by design, I think it's not yet supported.
In fact, when you create a Cloud SQL database with a private IP, a network peering is done between your VPC and the Cloud SQL VPC (or something equivalent).
In addition, today, it's not possible to plug your Cloud Run instance to your VPC. With function and App Engine, you have a serverless VPC connector, and not yet with Cloud Run (it's coming!).
The serverless VPC connector perform the same things as the Cloud SQL private IP, I mean a peering between your VPC and the Cloud Functions (or App Engine) VPC (or something equivalent).
And even if the serverless VPC connector is available on Cloud Run, it's not sure that it work because of network peering transitivity. In short, If you have a peering between VPC A -> VPC B and between VPC B -> VPC C, you can't reach VPC C from VPC A by performing an hop in VPC B. Replace A by VPC Cloud Run, B by VPC of your project, and C by VPC Cloud SQL.
Only directly peered networks can communicate. Transitive peering is not supported. In other words, if VPC network N1 is peered with N2 and N3, but N2 and N3 are not directly connected, VPC network N2 cannot communicate with VPC network N3 over VPC Network Peering.
I didn't check with AppEngine or Cloud Function, but this design shouldn't work.
But I'm not sure, that's why a case to the Google support will allow you to have a clear answer and maybe inputs on the roadmap. Any valuable information from Google Support are welcomed here!
I was also getting the following error when I was trying to connect to postgres using the following command from cloud shell:
gcloud sql connect
it seems your client does not have ipv6 connectivity...
What I do is that I login to one of the pods deployed using Google cloud Kubernetes using the following command:
kubectl exec --stdin --tty java-hello-world-7fdecb9894-smql4 -- /bin/bash
Then for 1st time I ran:
apt-get update
apt install postgresql-client
And now I can connect using:
psql -h postgres-private-ip -U username