Cloud fusion to on-prem postgresql database connection - google-cloud-platform

I'm trying to establish connection to on-premise postgresql database from cloud data fusion. But I' not able to resolve Host and port. Where can I find host and port for postgresql DB, also anything needs to be done for postgresql db to access from data fusion?
I downloaded postgresl JDBC from cloud data fusion hub. In Data fusion studio, I selected PostgeSQL a source. While filling details in properties, I'm not sure where to find host/port in postgresql.

the answer is more "theoretical" than practical but I will try to resume it as easily as possible.
The first question you need to ask yourself is, is my PostgreSQL instance accessible via a public IP address ? If so, this is quite easy, all you need is an authorised user, the correct password, and the public IP adress of you instance (and the port you configured).
If your instance is not publically accessible, which is often the case, then the first thing to do is to setup a VPN connection between your on-prem database and the GCP Virtual Private Cloud (VPC) that is peered to your Data Fusion instance (if you setup a private instance which you usually do for security reasons). Once this is done, Data Fusion should be able to connect directly to the PostgreSQL source via the exported VPN route / routes from you peered GCP VPC.
I am happy to edit my answer with more detail based on your follow up questions.

Related

connect local environment to CloudSQL with private IP

I have hosted my application in a CloudRun Container and connected it to CloudSQL.
Everything is in a VPC Network and is running smoothly. Now I would like to modify data in production from a Database tool like DataGrid. Therefore I need to connect my local environment to my VPC-Network. I did this through a Cloud VPN Tunnel. Now I would like to connect to the SQL instance.
Here I got stuck and I'm wondering how I can establish the connection.
It would be great if someone would know how I can solve this issue. Thanks!
My preferred solution is to use the public IP BUT without whitelisting any network. In fact, it's like if y ou have a public IP and all the connexion are forbidden.
The solution here is to use Cloud SQL proxy and to open a tunnel from your computer to the Cloud SQL database (that you reach on the public IP, but the tunnel is secured); It's exactly like a VPN connexion: a secure tunnel.
You can do this
Download Cloud SQL prowy
Launch it
./cloud_sql_proxy -instances=<INSTANCE_CONNECTION_NAME>=tcp:3306
Connect your SQL client on localhost:3306
If the port 3306 is already in use, feel free to use another one
If you prefer the private IP only (sometime, it's security team requirement), I wrote an article on this.
If you use a VPN (and you are connected to Cloud VPN) take care to open the correct route and firewalls in both way (in and out)

AWS EC2 for QuickBooks

AWS and network noob. I've been asked to migrate QuickBooks Desktop Enterprise to AWS. This seems easy in principle but I'm finding a lot of conflicting and confusing information on how best to do it. The requirements are:
Setup a Windows Server using AWS EC2
QuickBooks will be installed on the server, including a file share that users will map to.
Configure VPN connectivity so that the EC2 instance appears and behaves as if it were on prem.
Allow additional off site VPN connectivity as needed for ad hoc remote access
Cost is a major consideration, which is why I am doing this instead of getting someone who knows this stuff.
The on-prem network is very small - one Win2008R2 server (I know...) that hosts QB now and acts as a file server, 10-15 PCs/printers and a Netgear Nighthawk router with a static IP.
My approach was to first create a new VPC with a private subnet that will contain the EC2 instance and setup a site-to-site VPN connection with the Nighthawk for the on-prem users. I'm unclear as to if I also need to create security group rules to only allow inbound traffic (UDP,TCP file sharing ports) from the static IP or if the VPN negates that need.
I'm trying to test this one step at a time and have an instance setup now. I am remote and am using my current IP address in the security group rules for the test (no VPN yet). I setup the file share but I am unable to access it from my computer. I can RDP and ping it and have turned on the firewall rules to allow NB and SMB but still nothing. I just read another thread that says I need to setup a storage gateway but before I do that, I wanted to see if that is really required or if there's another/better approach. I have to believe this is a common requirement but I seem to be missing something.
This is a bad approach for QuickBooks. Intuit explicitly recommends against using QuickBooks with a file share via VPN:
Networks that are NOT recommended
Virtual Private Network (VPN) Connects computers over long distances via the Internet using an encrypted tunnel.
From here: https://quickbooks.intuit.com/learn-support/en-us/configure-for-multiple-users/recommended-networks-for-quickbooks/00/203276
The correct approach here is to host QuickBooks on the EC2 instance, and let people RDP (remote desktop) into the EC2 Windows server to use QuickBooks. Do not let them install QuickBooks on their client machines and access the QuickBooks data file over the VPN link. Make them RDP directly to the QuickBooks server and access it from there.

GCP dataflow and On-premDB

can we connect to On-Prem Sql DB and cloud Dataflow(GCP) without API? Our databases do not provide API’s for data extraction.
Please help me in this , we are stuck on this for a quite sometime.
Yes, you can do this. If you have a look to the Beam documentation, you have several Database built-in connector, like a JDBC IO connector. So you can connect any database with an IP:PORT and the correct drivers.
Now, a security topic: You can choose to add a public IP to your on-prem database to access it with Dataflow. If you do this, it's secure (firstly), and then your dataflow worker node need to have a public IP (or you need to set up a Cloud NAT) to access to internet
A better solution is to create a VPN (or an interconnect) from the same VPC as this one where you run the dataflow worker. Like this, you can use the on-prem Database private IP address to reach it and it's more secure!

How do I set up an AWS RDS instance for production so that I can regularly read in custom data from my personal computer

I have built a REST API in Spring that I am ready to deploy as the back-end for my company's website. It utilizes a mySQL RDS instance to store data. I'm going to host it on AWS and am currently in the process of learning how to do that. I connect to my database with Spring's jdbc template and make SQL queries to create and edit tables.
There is a big concern I have that has not been addressed by any of the tutorials I've read: Once everything is up and running on AWS, I will not have direct access to the database anymore as it will only be accessible from behind a my REST API which makes the necessary queries. And the REST API will only be accessible by the front end server (which is also on AWS). But I will regularly need to read in custom data in different formats.
Currently it is very easy to do that, because I can read in a random excel file and directly call the methods that actually make SQL queries on startup of the server. But that is because my test RDS database is publicly accessible. And I am pretty sure that is terrible practice.
So how can I set things up on AWS so that I can still connect to my database from my laptop and make custom SQL queries to my database?
I am following this tutorial (https://keyholesoftware.com/2017/09/26/using-docker-aws-to-build-deploy-and-scale-your-application/) to get my REST service up and running, and will have to set up the RDS instance separately.
Best choice I know of is to SSH into an EC2 then connect to RDS. If you're on Mac, Sequel Pro makes this easy since you can provide SSH settings along with your MySQL connection settings.
This can also be accomplished with an SSH port forwarding then you can use your local SQL client. Here's a link to an article that appears to have correct information MySQL SSH Tunnel
Only other secure option is to allow RDS connections from your IP. I can't verify that still works but my memory says I used to run my former companies RDS that way.

Streamlining Azure set up with app and DB on separate VMs

A Django app of mine (with a postgresql backend) is hosted over two separate Ubuntu VMs. I use Azure as my infrastructure provider, and the VMs are classic. Both are part of the same resource group, and map to the same DNS as well (i.e. they both live on xyz.cloudapp.net). Currently, I have the following database url defined in my app's settings.py:
DATABASE_URL = 'postgres://username:password#public_ip_address:5432/dbname'
The DB port 5432 is publicly open, and I'm assuming the above DB url implies the web app is connecting to the DB as if it's on a remote machine. If so, that's not the best practice: it has security repercussions, not to mention it adds anything from 20-30 milliseconds to a hundred milliseconds to each query (in latency).
My question is, how does one program such a Django+postgres setup on Azure such that the database is only exposed on the private network? I want to keep the two-VM set up intact. An illustrative example would be nice - I'm guessing I'll have to replace the public ip address in my settings.py with a private IP? I can see a private IP address listed under Virtual machines(classic) > VMname > Settings > IP Addresses in the Azure portal. Is this the one to use? If so, it's dynamically assigned, thus wouldn't it change after a while? Looking forward to guidance on this.
In Classic (ASM) mode, the Cloud Service is the network security boundary and the Endpoints with ACLs are used to restrict access from the outside Internet.
A simple solution to secure access would be:
Ensure that the the DB port (5432) is removed from the cloud service endpoint (to avoid exposing it for the entire Internet).
Get at static private IP address for the DB server.
Use the private IP address of
the DB server in the connection string.
Keep the servers in the same Cloud Service.
You can find detailed instructions here:
https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-static-private-ip-classic-pportal/
This should work. But for future implementations, I would recommend the more modern Azure Resource Model (ARM), where you can benefit from many nice new features, including virtual networks (VNETs) where you get more fine-grained security.