Setting up Datafusion instance to connect with secured Dataproc cluster - google-cloud-platform

We have a secured Dataproc cluster, we are able to successfully SSH into it with individual user ID's with the command:
gcloud compute ssh cluster-name --tunnel-through-iap
But when we create a profile and attach it to Data Fusion instance and configure the pipeline to run it throws connection timeout:
java.io.IOException: com.jcraft.jsch.JSchException: java.net.ConnectException: Connection timed out (Connection timed out)
at io.cdap.cdap.common.ssh.DefaultSSHSession.<init>(DefaultSSHSession.java:88) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillPreparer.lambda$start$0(RemoteExecutionTwillPreparer.java:436) ~[na:na]
How can we configure Data Fusion pipeline to run with a secured Dataproc cluster? Kindly let me know.

Some information to give more context on this question:
From the option --tunnel-through-iap, most probably you are using Tunneling with SSH and cluster-name is the instance name into the Dataproc cluster you want to connect to. The link also provide information about the option --internal-ip that connect to an instance only through its internal IP.
Data Fusion explains the procedure to create private IP addresses to limit the access to your instance.
Hence, a private IP instance and the option --internal-ip could be a good combination to connect to your instance (keeping a secured cluster) once the firewall rules are correctly configured.

Related

How do I SSH tunnel to a remote server whilst remaining on my machine?

I have a Kubernetes cluster to administer which is in it's own private subnet on AWS. To allow us to administer it, we have a Bastion server on our public subnet. Tunnelling directly through to our cluster is easy. However, we need to have our deployment machine establish a tunnel and execute commands to the Kubernetes server, such as running Helm and kubectl. Does anyone know how to do this?
Many thanks,
John
In AWS
Scenario 1
By default, this API server endpoint is public to the internet, and access to the API server is secured using a combination of AWS Identity and Access Management (IAM) and native Kubernetes Role Based Access Control (RBAC).
if that's the case you can use the kubectl commands from your Concourse server which has internet access using the kubeconfig file provided, if you don't have the kubeconfig file follow these steps
Scenario 2
when you have private cluster endpoint enabled (which seems to be your case)
When you enable endpoint private access for your cluster, Amazon EKS creates a Route 53 private hosted zone on your behalf and associates it with your cluster's VPC. This private hosted zone is managed by Amazon EKS, and it doesn't appear in your account's Route 53 resources. In order for the private hosted zone to properly route traffic to your API server, your VPC must have enableDnsHostnames and enableDnsSupport set to true, and the DHCP options set for your VPC must include AmazonProvidedDNS in its domain name servers list. For more information, see Updating DNS Support for Your VPC in the Amazon VPC User Guide.
Either you can modify your private endpoint Steps here OR Follow these Steps
Probably there are more simple ways to get it done but the first solution which comes to my mind is setting simple ssh port forwarding.
Assuming that you have ssh access to both machines i.e. Concourse has ssh access to Bastion and Bastion has ssh access to Cluster it can be done as follows:
First make so called local ssh port forwarding on Bastion (pretty well described here):
ssh -L <kube-api-server-port>:localhost:<kube-api-server-port> ssh-user#<kubernetes-cluster-ip-address-or-hostname>
Now you can access your kubernetes api from Bastion by:
curl localhost:<kube-api-server-port>
however it isn't still what you need. Now you need to forward it to your Concourse machine. On Concource run:
ssh -L <kube-api-server-port>:localhost:<kube-api-server-port> ssh-user#<bastion-server-ip-address-or-hostname>
From now you have your kubernetes API available on localhost of your Concourse machine so you can e.g. access it with curl:
curl localhost:<kube-api-server-port>
or incorporate it in your .kube/cofig.
Let me know if it helps.
You can also make such tunnel more persistent. More on that you can find here.

cannot connect to Redis Instance in GCP

I created an instance on GCP, but I am not able to access it.
This is similar to this one, but the proposed solution isn't working for me:
Unable to telnet to GCP MemoryStore
I have tried to telnet to it, I am in the same project and region, but apparently I need to be in the same network as it's a private ip, but what if you want to connect using the cloud shell? Also, how would an application running on my local machine access it?
I also included a firewall rule to make sure incoming connections are allowed.
To connect a client to a Cloud Memorystore for Redis instance, the client and the instance must be located in the same region, in same project and in the same VPC network. Please check the “Networking” document where you’ll have information on Basic network settings, limited and unsupported networks, network peering, IP address range.
You can connect to Redis from different GCP products like Compute Engine VM, Google Kubernetes Engine Cluster or Google Kubernetes Engine pod, but you can’t connect directly from the Cloud shell or from your local machine since they are not in your VPC network.
It may also have to do with a missing peering connection to your network. Check in your console at https://console.cloud.google.com/networking/peering/ to see if the peering is set up properly.
Using terraform you can use the following docs: https://www.terraform.io/docs/providers/google/r/redis_instance.html

Unable to connect to ssh on Google Cloud Platform:

We are unable to connect to 'VM'via ssh instance on Google Cloud platform.
Here we are trying with the help of 'SSH' button available on the browser.
But following message is received:
We are unable to connect to the VM on the port 22.
We have tried to Stop and Start the VM but did not help.
You need to create a firewall rule that enables SSH access on port 22 for your VMs. It is better to make the 'Target' as a network tag instead of enabling SSH access for all of the machines on your VPC network.
You can use the CLI to perform this operation - using the default VPC
gcloud compute firewall-rules create <rule-name> --allow tcp:22 --network "default" --source-ranges "<source-range>"

problems connecting to AWS DocumentDB

I created a Cluster and an Instance of DocumentDB in amazon. When I try to connect to my Local SSH (MacOS) it displays the following message:
When I try for the MongoDB Compass Community:
mongodb://Mobify:<My-Password>#docdb-2019-04-07-23-28-45.cluster-cmffegva7sne.us-east-2.docdb.amazonaws.com:27017/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0
It loads many minutes and in the end it has this result:
After solving this problem, I would like to know if it is possible to connect a cluster of documentDB to an instance in another zone of availability ... I have my DocumentDB in Ohio and I have an EC2 in São Paulo ... is it possible?
Amazon DocumentDB clusters are deployed in a VPC to provide strong network isolation from the Internet. To connect to your cluster from outside of the VPC, please see the following: https://docs.aws.amazon.com/documentdb/latest/developerguide/connect-from-outside-a-vpc.html
AWS document DB is hosted on a VPC (virtual private cloud) which has its own specific subnets and security groups; basically, anything that resides in a VPC is not publicly accessible.
Document DB is deployed in a VPC. In order to access it, you need to create an EC2 instance or AWS Could9.
Let's access it from the EC2 instance and access AWS document DB using SSH tunneling.
Create an EC2 instance (preferably ubuntu) of any configuration and select the same VPC in which your document DB cluster is hosted.
After the EC2 is completely initialized, start an SSH tunnel and bind the local port # 27017 with document DB cluster host # 27017.
ssh -i "<ec2-private-key>" -L 27017:docdb-2019-04-07-23-28-45.cluster-cmffegva7sne.us-east-2.docdb.amazonaws.com:27017 ubuntu#<ec2-host> -N
Now your localhost is tunneled to ec2 on port 27017. Connect from mongosh or mongo, enter your cluster password and you will be logged in and execute any queries.
mongosh --sslAllowInvalidHostnames --ssl --sslCAFile rds-combined-ca-bundle.pem --username Mobify --password
Note: SSL will be deprecated. Use tls, just replace SSL with tls in the above command.

Not able to connect to AWS DocumentDB from my ubuntu EC2 machine

I can't connect to my Amazon DocumentDB from my amazon EC2 ubuntu machine? I've checked the security of the Amazon DocumentDB and it's currently assigned to the default which has "all traffic".
I've tried the following command, straight out of the AWS instances page although I receive the error message included below.
I've followed this aws guide https://docs.aws.amazon.com/documentdb/latest/developerguide/getting-started.connect.html
Mongo shell command from the EC2 ubuntu machine
mongo --ssl --host mydatabasename.23scnncsd3.eu-west-1.docdb.amazonaws.com:27017 --sslCAFile rds-combined-ca-bundle.pem --username webuser --password mypassword
The error message I receive from within the ubuntu command prompt is below
Error message
MongoDB shell version v3.6.11
connecting to: mongodb://mydatabasename.23scnncsd3.eu-west-1.docdb.amazonaws.com:27017/?gssapiServiceName=mongodb
2019-03-11T21:39:37.587+0000 W NETWOK [thread1] Failed to connect to 172.31.45.184:27017 after 5000ms milliseconds, giving up.
2019-03-11T21:39:37.595+0000 E QUERY [thread1] Error: couldn't connect to server mydatabasename.23scnncsd3.eu-west-1.docdb.amazonaws.com:27017, connection attempt failed :
connect#src/mongo/shell/mongo.js:263:13
#(connect):1:6
exception: connect failed
Am I doing something wrong? Any help appreciated!
Many thanks,
Update
Amazon DocumentDB deploys clusters within a VPC, which act as a strong network boundary to other VPCs and the Internet. When you are connecting to your cluster, ensure that the client machine is in the same region and the same VPC as the cluster.
Alternatively, if your development environment is in a different Amazon VPC, you can also use VPC Peering and connect to your Amazon DocumentDB cluster from another Amazon VPC in the same region or a different region.
For more information on troubleshooting: https://docs.aws.amazon.com/documentdb/latest/developerguide/troubleshooting.html
Connecting to an Amazon DocumentDB cluster from outside a VPC: https://docs.aws.amazon.com/documentdb/latest/developerguide/connect-from-outside-a-vpc.html
Had the same problem.
Availability Zone/VPC/Security Groups are the same for
EC2 instance and DocumentDB instance, but still failed to connect.
For some reason, the US documentation is missing one step that is present in CN documentation.
https://docs.amazonaws.cn/en_us/documentdb/latest/developerguide/connect-ec2.html
All you need to do is to add another inbound rule to the Secutity Group for TCP and 27017 port. This worked for me.
https://i.stack.imgur.com/lOqov.png