BigQuery data transfer service does not work when using VPC - amazon-web-services

I have an issue when migrating Redshift to BigQuery. So what have I done so far?
Created VPN that connects GCP VPC and AWS VPC. (VPCs IPs are not overlapped)
VPN works excellent. (I tested: created EC2 instance and pinged through GCP Compute Engine VM to AWS EC2 instance private IP ---> it works excellent)
I created Redshift instance with publicly accessible option ----> then created BigQuery data transfer service ----> It works excellent
BUT, when I create a Redshift cluster with NO publicly accessible option ----> Then create BigQuery data transfer service, it brings me an error
ERROR:
Unable to proceed: Could not connect with provided parameters: No suitable driver found for jdbc:redshift://redshift-cluster-1.cbr8ra8jmxgm.us-east-1.redshift.amazonaws.com:5439/dev
Also I tried to ping to AWS Redshift IP address from GCP Compute Engine VM. -----> It does not ping.
What can be the reason?

Related

AWS MSK - Debezium Postgres Connector for AWS RDS - Failed to Connect

I'm currently facing the following issue when using AWS MSK Connector (Debezium Postgres Connector)
[Worker-0509fac07b9701a23] [2022-01-19 04:55:28,759] ERROR Failed testing connection for jdbc:postgresql://debezium-cdc.fac07b9701a2.ap-south-1.rds.amazonaws.com:5432/ecommerce with user 'debezium' (io.debezium.connector.postgresql.PostgresConnector:133)
I've test AWS MSK Connector using Kafka Clients on EC2, I'm able to produce & consume messages. I've also setup AWS MSK S3 Sink Connector, that is working as well.
I've double checked the security groups config for AWS RDS, I'm able to connect to it from EC2.
I'm not sure whats causing this issue.
Here's the Connector Configuration
connector.class=io.debezium.connector.postgresql.PostgresConnector
tasks.max=1
database.hostname=debezium-cdc.fac07b9701a2.ap-south-1.rds.amazonaws.com
database.port=5432
database.dbname=ecommerce
database.user=debezium
database.password=password
database.history.kafka.bootstrap.servers=b-2.awskafkatutorialclust.awskaf.c4.kafka.ap-south-1.amazonaws.com:9094,b1.awskafkatutorialclust.awskaf.c4.kafka.ap-south-1.amazonaws.com:9094,b-3.awskafkatutorialclust.awskaf.c4.kafka.ap-south-1.amazonaws.com:9094
database.server.id=1
database.server.name=debezium-cdc
database.whitelist=ecommerce
database.history.kafka.topic=dbhistory.ecommerce
include.schema.changes=true
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
You need to set up AWS RDS Database Publicly accessible: No.
Because your AWS MSK is in a private network (VPC) and it can not connect to public Databases (Read more: https://docs.aws.amazon.com/vpc/latest/userguide/how-it-works.html).
Please try to change your RDS Database Postgres Publicly accessible: No.
And create MSK connect again.
(make sure that your AWS RDS Database is the same VPC, Security Group as your AWS MSK.)
Anyway, If you want to connect with your private AWS RDS Database, you need to do about bastion host (Read more: https://aws.amazon.com/premiumsupport/knowledge-center/rds-connect-ec2-bastion-host/).

Whitelisting Azure Databricks Spark Cluster on AWS Redshift

Hi I am not sure if anyone has come across this situation before. I have both Azure and AWS environment. I have a Spark cluster running on Azure Databricks. I have a python/pyspark script that I want to run on the Azure Databricks Spark cluster. In this script I want to write some data into a AWS Redshift cluster which I plan to do using the psycopg2 library. Where can I find the IP address of the Azure Databricks Spark cluster so that I can whitelist it in the security group of the AWS Redshift cluster. I think at the moment I cannot write to the AWS Redshift cluster because the script is running on Azure Databricks Spark cluster and the AWS Redshift cluster does not recognize this request coming from Azure Databricks Spark cluster.
I have similar use case to connect from Azure Databricks to AWS RDS. Need to whitelist the Azure Databricks IPs in the AWS Security group connected to RDS. Databricks associate cluster with Dynamic Ip so it changes each time a cluster is restarted.
I am trying to get this solution
Create a public IP address in the Azure portal
Associate a public IP address to a virtual machine
https://learn.microsoft.com/en-us/azure/virtual-network/associate-public-ip-address-vm#azure-portal
Currently getting error that I do not have permission to update the databricks associated VNet.
This is the simplest solution I could come up with.
If this doesnt work, next option is to try Site to Site Connection to set up tunnel between Azure and AWS. This would allow all the dynamic IPs to be authorised for read and write operations on AWS.

Can a private Cloud data fusion connect to the internet?

Our application is made of a spring-bot app server deployed through "cloud run" and a "cloud sql postgres" database.
The database is private and connected to a private VPC .
The app server can connect to the database through a gateway to this private VPC provided by the "cloud run" configuration.
We'd like to feed this database with "cloud data fusion" (CDF) periodically.
CDF should fetch data from AWS S3 and push it into our database.
We've designed and validated a pipeline for that purpose but we're facing a network paradox :
Either CDF is public, can read from S3 over internet, but can't reach the cloud database
or CDF is private, can reach our database but can't reach internet for S3 fetching...
How can CDF both write to the private database and read data from the internet ?
I'm surprised that a CDF instance, even being private, can't establish an EGRES connection to an internet resouce.
Cloud Data fusion is a tool that help you to build pipeline (based on CDAP). If you set the Data Fusion private, it's the access to the tool that is private, not the runtime! On Google Cloud, the pipeline runs on Dataproc cluster.
So now, the question is: Can your Dataproc cluster reach internet and your database?
If your cluster run in the same VPC as your Cloud SQL database private IP connection, and there is no firewall rule that prevent the communication, it's OK
If your Compute Engines that compose your cluster have public IP, no problem, you can access to public URL. Else, as said by John Hanley, you can create a Cloud NAT to allow your Compute Engine to initiate call to external URL.

Sagemaker Jupyter Notebook Cannot Connect to RDS

I have tried connecting through Sagemaker notebook to RDS. However, to connect to RDS, my public IP needs to be allowed for security reasons. I can see when I run this command: "curl ifconfig.me" on Sagemaker Notebook instance that public IP keeps changing from time to time.
What is the correct way to connect to RDS with notebook on sagemaker? Do I need to crawl the RDS with AWS Glue and then use Athena on crawled tables and then take the query results from S3 with Sagemaker notebook?
RDS is just a managed database running on an EC2 instance. You can connect to that database in a very same way as you would connect from an application. For example, you can use a python based DB client library (depending on what DB flavor you're using, e.g. Postgres) and configure with the connection string, as you would connect any other application to your RDS instance.
I would not recommend to connect to the RDS instance through the public interface. You can place your Notebook instance to the same VPC where your RDS instance is, thus you can talk to RDS directly through the VPC.

Connecting an AWS EC2 to a Google Cloud SQL instance locally using VPN Gateway

I have an AWS account with an EC2 in it that I am trying to connect to a Cloud SQL Server (MySQL 5.6) inside of Google Cloud Platform.
I have successfully set up a VPN between AWS and GCP and can echo a message over nc between an ec2 on AWS and a vm on GCP.
As GCP managed DB's are not placed inside of a VPC of my choosing I followed this guide to give the DB a private IP and to then peer that with my google VPC. I tested this works by accessing the DB via pymsql from an VM in GCP using the private IP of the DB.
However my issues come from connecting the EC2 inside of AWS to the Cloud SQL DB in the same way, I have followed this guide to allow the use of the DB's private IP from an external source but I seem to be getting stuck with how to set the routing up to the peered network the DB is sitting in using AWS Routing.
The problem has been sorted!
In the Advertised routes Settings of my Cloud Router, I had misunderstood the function of Advertise all subnets visible to the Cloud Router (Default)
I needed to instead choose Create custom routes" And then the sub-option Advertise all subnets visible to the Cloud Router.
This then allowed me to add the Cloud SQL subnet to my router to that IP block propagate over to AWS.