GCP dataflow and On-premDB - google-cloud-platform

can we connect to On-Prem Sql DB and cloud Dataflow(GCP) without API? Our databases do not provide API’s for data extraction.
Please help me in this , we are stuck on this for a quite sometime.

Yes, you can do this. If you have a look to the Beam documentation, you have several Database built-in connector, like a JDBC IO connector. So you can connect any database with an IP:PORT and the correct drivers.
Now, a security topic: You can choose to add a public IP to your on-prem database to access it with Dataflow. If you do this, it's secure (firstly), and then your dataflow worker node need to have a public IP (or you need to set up a Cloud NAT) to access to internet
A better solution is to create a VPN (or an interconnect) from the same VPC as this one where you run the dataflow worker. Like this, you can use the on-prem Database private IP address to reach it and it's more secure!

Related

How can I deploy and connect to a postgreSQL instance in AlloyDB without utilizing VM?

Currently, I have followed the google docs quick start docs for deploying a simple cloud run web server that is connected to AlloyDB. However, in the docs, it all seem to point towards of having to utilize VM for a postgreSQL client, which then is connected to my AlloyDB cluster instance. I believe a connection can only be made within the same VPC and/or a proxy service via the VM(? Please correct me if I'm wrong)
I was wondering, if I only want to give access to services within the same VPC, is having a VM a must? or is there another way?
You're correct. AlloyDB currently only allows connecting via Private IP, so the only way to talk directly to the instances is within the same VPC. The reason all the tutorials (e.g. https://cloud.google.com/alloydb/docs/quickstart/integrate-cloud-run, which is likely the quickstart you mention) talk about a VM is that in order to create your databases themselves within the AlloyDB cluster, set user grants, etc, you need to be able to talk to it from inside the VPC. Another option for example, would be to set up Cloud VPN to some local network to connect your LAN to the VPC directly. But that's slow, costly, and kind of a pain.
Cloud Run itself does not require the VM piece, the quickstart I linked to above walks through setting up the Serverless VPC Connector which is the required piece to connect Cloud Run to AlloyDB. The VM in those instructions is only for configuring the PG database itself. So once you've done all the configuration you need, you can shut down the VM so it's not costing you anything. If you needed to step back in to make configuration changes, you can spin the VM back up, but it's not something that needs to be running for the Cloud Run -> AlloyDB connection.
Providing public ip functionality for AlloyDB is on the roadmap, but I don't have any kind of timeframe for when it will be implemented.

Cloud fusion to on-prem postgresql database connection

I'm trying to establish connection to on-premise postgresql database from cloud data fusion. But I' not able to resolve Host and port. Where can I find host and port for postgresql DB, also anything needs to be done for postgresql db to access from data fusion?
I downloaded postgresl JDBC from cloud data fusion hub. In Data fusion studio, I selected PostgeSQL a source. While filling details in properties, I'm not sure where to find host/port in postgresql.
the answer is more "theoretical" than practical but I will try to resume it as easily as possible.
The first question you need to ask yourself is, is my PostgreSQL instance accessible via a public IP address ? If so, this is quite easy, all you need is an authorised user, the correct password, and the public IP adress of you instance (and the port you configured).
If your instance is not publically accessible, which is often the case, then the first thing to do is to setup a VPN connection between your on-prem database and the GCP Virtual Private Cloud (VPC) that is peered to your Data Fusion instance (if you setup a private instance which you usually do for security reasons). Once this is done, Data Fusion should be able to connect directly to the PostgreSQL source via the exported VPN route / routes from you peered GCP VPC.
I am happy to edit my answer with more detail based on your follow up questions.

Connecting to Cloud SQL from a host without a fixed IP

I have a server running at home but I haven't a fixed ip address. So, I use DDNS to update my domains DNS when IP changed, and it is working fine. My problem comes trying to access a MySQL instance, because currently it is using a VPC, so I need to update manually adding new IP as Authorized Network. I wonder if it is possible to do that with a API REST call, in that way I can add a crontab in my server to check changes each n minutes and update Authorized Networks.
I read Google documentation, but in my understanding (I am not an english speaker) it is possible just from an authorized network. Somebody can give me a clue?.
Thanks in advance.
Take a look at installing and using Cloud SQL Auth Proxy on your local server. This will remove the need to keep updating Authorized Networks when your IP changes.
I wonder if it is possible to do that with a API REST call, in that
way I can add a crontab in my server to check changes each n minutes
and update Authorized Networks.
Google Cloud provides the Cloud SQL Admin API. To modify the authorized networks, use the instances.patch API.
Google Cloud SQL Method: instances.patch
Modify this data structure to change the authorized networks:
Google Cloud SQL IP Configuration
You might find it easier to use the CLI to modify the authorized networks:
gcloud sql instances patch <INSTANCENAME> --authorized-networks=x.x.x.x/32
gcloud sql instances patch
I do not recommend constantly updating the authorized networks when not required. Use an external service to fetch your public IP and compare with the last saved value. Only update Cloud SQL if your public IP address changed.
Common public services to determine your public IP address. Note you should randomly select one as these services can rate limit you. Some of the endpoints require query parameters to only return your IP address and not a web page. Consult their documentation.
https://checkip.amazonaws.com/
https://ifconfig.me/
https://icanhazip.com/
https://ipecho.net/plain
https://api.ipify.org
https://ipinfo.io/ip
Note: I recommend that you use the Google Cloud SQL Auth Proxy. This provides several benefits including network traffic encryption. The auth proxy does not require that you whitelist your network.
Refer to my other answer for more details

How do I connect to Google Cloud SQL from Google Cloud Run via TCP?

Based on my current understanding, when I enable a service connection to my Cloud SQL instance in one of my revisions, the path /cloudsql/[instance name]/.s.PGSQL.5432 becomes populated. This is a UNIX socket connection.
Unfortunately, a 3rd party application I'm using doesn't support UNIX socket connections and as such I'm required to connect via TCP.
Does the Google Cloud SQL Proxy also configure any way I can connect to Cloud SQL via something like localhost:5432, or other equivalent? Some of the documentation I'm reading suggests that I have to do elaborate networking configuration with private IPs just to enable TCP based Cloud SQL for my Cloud Run revisions, but I feel like the Cloud Proxy is already capable of giving me a TCP connection instead of a UNIX socket.
What is the right and most minimal way forward here, obviously assuming I do not have the ability to modify the code I'm running.
I've also cross posted this question to the Google Cloud SQL Proxy repo.
The most secure and easiest way is to use the private IP. It's not so long and so hard, you have 3 steps
Create a serverless VPC connector. Create it in the same region as your Cloud Run service. Note the VPC Network that you use (by default it's "default")
Add the serverless VPC Connector to Cloud Run service. Route only the private IPs through this connector
Add a private connection to your Cloud SQL database. Attached it in the same VPC Network as your serverless VPC Connector.
The Cloud configuration is over. Now you have to get the Cloud SQL private IP of your instance and to add it in parameters of your Cloud Run service to open a connection to this IP.

JDBC connections from managed Cloud Run?

Looking for a way to JDBC from managed Cloud Run to a Cloud SQL instance that doesn't require opening up a public IP on the database.
Per the documentation, managed Cloud Run only supports UNIX socket access, which JDBC doesn't really support. I tried junixsockets (https://kohlschutter.github.io/junixsocket/dependency.html) and couldn't get it to work.
I'd prefer to not be reduced to having to run a SOCKS proxy :).
There is now documentation on connecting Cloud SQL to Cloud Run via JDBC: https://cloud.google.com/sql/docs/mysql/connect-run#connecting_to (click on the Java tab)