Neptune fails to bulk load (connection time-out) - amazon-web-services

I am trying to bulk load on Neptune db through Neptune notebooks. I initially tried %load but after entering all the information it fails after 5sec.
Then I tried using Curl and Request.post.
I even simply tried to just perform a requests.get(endpoint) where
https://your-neptune-endpoint:port/loader
The default port is 8182 but I noticed that my cluster has a 7999 so I’ve been using that one.
There is a magic command I run in notebook and I get port=7999 and port_proxy=8182.
Everything points to that I am unable to connect because there should be an Inbound rule in my security group but I already have that.
Not sure what else to do. Is there a way I can test every access and inbound rule if working as expected ?

Related

Cannot SSH into the GCP VM instances that used to work

I created a few GCP VM instances yesterday all using the same configuration but running different tasks.
I could SSH into those instances via the GCP console and they were all working fine.
Today I want to check if the tasks are done, but I cannot SSH into any of those instances via the browser anymore...The error message reads:
Connection via Cloud Identity-Aware Proxy Failed
Code: 4010
Reason: destination read failed
You may be able to connect without using the Cloud Identity-Aware Proxy.
So I retried with Cloud Identity-Award Proxy disabled. But then it reads:
Connection Failed
An error occurred while communicating with the SSH server. Check the server and the network configuration.
Running
gcloud compute instances list
displayed all my instances and the status is RUNNING.
But when I ran
gcloud compute instances get-serial-port-output [instance-name]
using the [instance-name] returned from the above command. (This is to check if the boot disk of the instance has run out of free space.)
It returned
(gcloud.compute.instances.get-serial-port-output) Could not fetch serial port output: The resource '...' was not found
Some extra info:
I'm accessing the VM instance from the same internet (my home internet) and everything else is the same
I'm the owner of the project
My account is using a GCP free trial with $300 credit
The instances have machine type c2-standard-4 and are using Linux Deep Learning
The gcloud config looks right to me:
$ gcloud config list
[component_manager]
disable_update_check = True
[compute]
gce_metadata_read_timeout_sec = 5
[core]
account = [my_account]
disable_usage_reporting = True
project = [my_project]
[metrics]
environment = devshell
Update:
I reset one of the instances and now I can successfully SSH into that instance. However the job running on the instance stopped after reset.
I want to keep the jobs running on the other instances. Is there a way to SSH into other instances without reset?
You issue is at the VM side. Task's you're running make the ssh service unable to accept incoming connection and only after the restart you were able to connect.
You should be able to see the instance's serial console output using gcloud compute instances get-serial-port-output [instance-name] but if for some reason you're not You may try instead using GCP console - go to the instance's details and click on Serial port 1 (console) and you will see the output.
You may even interact with your VM (login) via the console. This is particularily usefull if something stopped the ssh service but for that you need a login/password so first you have to access the VM or use the startup script to add a user with your password. But then again - this requires a restart.
In either case it seems that the restarting your VM's is the best option. But you may try to figure out what is causing ssh service to stop after some time by inspecting logs. Or you can create your own (disk space, memory, cpu etc) by using cron with df -Th /mountpoint/path | tail -n1 >> /name_of_the_log_file.log.
You can for example use cron for checking & starting ssh service.
And if something doesn't work as supposed to (according to documentation) - go to the IssueTracker and create a new issue to get more help.

AWS RDS pg_transport failed to download file data

When running the following command
SELECT transport.import_from_server(%s,5432,'My RDS ADMIN USER',%s,%s,%s,true);
I get the following response from the command:
AWS RDS pg_transport failed to download file data
Both RDS are in the same region, same vpc, both have security groups allowing the connection between them, SG only has inbound for 5432
Unable to find documentation or any further info on possible failure.
Steps followed were: https://aws.amazon.com/blogs/database/migrating-databases-using-rds-postgresql-transportable-databases/
With existing RDS instances, both are running Postgresql 11.5 and custom data instead of the one from the tutorial.
Any advice?
Could you please recheck if your source instance Security group allows connection from destination instance?
Recheck all the parameters that you have set in the source and destination param groups.
Had this before, it seems to be a bug within pg_transport.
Advice from AWS was to use a larger instance class on both the source and target instances. It seems to be stable using db.m5.4xlarge

Timeout when trying to retrieve EC2 instance-id metadata from within it

I'm launching a Windows 10 EC2 instance and trying to retrieve it's instance-id from the CMD with the command:
curl http://169.254.169.254/latest/meta-data/instance-id
This worked until yesterday, but now it fails every time, raising a Timeout error.
curl: (7) Failed to connect to 169.254.169.254 port 80: Timed out
I've looked up aws's documentation about retrieving EC2 metadata and didn't found anything regarding an expiration time for the retrieving attempt. Also, I've tried to create an AMI from my instance and launch a new instance based on this AMI to try some sort of "refresh" of a possible expiration time, and it didn't worked.
I've searched within the IAM Roles for something related to retrieving metadata permission, but nothing seems to fit my issue.
I've also tried the answers from here but nothing was specific enough to my problem.
What could have happened? This worked for about two months straight and suddenly it stopped working.
Workaround for fixing
Another post, regarding a similar problem, got an answer that fixed my problem.
I simply ran C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1 and the script applied the default specifications of a newborn EC2 windows instance. I still don't know why this problem happened, but this solution works for someone that doesn't have anything to loose on configuration specifications.
From the workaround that you shared, it seems the reason why you were not able to get the Instance ID was somehow the routes for your Instance got misconfigured. To retrieve Instance ID from the metadata, the route 169.254.169.254 must point to the right gateway of the Instance. This problem generally occurs with Windows Server 2016 or above when you try to launch an Instance from a custom AMI, in a subnet which is different from the parent Instance from which the AMI was created.
When you ran the command, it scheduled the InitializeInstance.ps1 script, and during the next boot it re-configured the routes.
In, future if you see any such issue, make sure the IP 169.254.169.254 is pointing to the correct gateway, which you can check using the command ipconfig /all and route print commands, in case you find that the routes are mis-configured, you can use the route delete and route add commands with proper parameters to make the routes correct or simply schedule the InitializeInstance.ps1 script, which will correct the routes when the Instance boots up the next time.
Please refer: https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2launch.html

Load data from S3 into Aurora Serverless using AWS Glue

According to Moving data from S3 -> RDS using AWS Glue
I found that an instance is required to add a connection to a data target. However, my RDS is a serverless, so there is no instance available. Does Glue support this case?
I have tried to connect Aurora MySql Serverless with AWS glue recently, and I failed. And I got a timeout error.
Check that your connection definition references your JDBC database with
correct URL syntax, username, and password. Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago.
The driver has not received any packets from the server.
I think the reason was Aurora serverless doesn't have any continuously running instances so in the connection URL you cannot give any instances, and that's why Glue cannot connect.
So, you need to make sure that DB instance is running. Only then your JDBC connection works.
If your DB runs in a private VPC, you can follow this link:
Nat Creation
EDIT:
Instead of NAT GW, you can also use the VPC endpoint for S3.
Here is a really good blog that explains step by step.
Or AWS documentation
AWS Glue supports the scenario, i.e., it works well to load data from S3 into Aurora Serverless using an AWS Glue job. The engine version I'm currently using is 8.0.mysql_aurora.3.02.0
Note: if you get an error saying Data source rejected establishment of connection, message from server: "Too many connections", you can increase ACUs (currently mine is set to min 4 - max 8 ACUs for your reference), as the maximum number of connections depends on the capacity of ACUs.
I can use JDBC build connection,
There is one thing very important is you should have at least one subnet open ALL TCP port, but you can point the port to the subnet.
With the setting, connection test pass, crawler also can create tables.

Setting up a second connection with AWS Glue to a target RDS/Mysql instance fails

I'm trying to setup an ETL job with AWS Glue that should pull data from the production database on RDS/Aurora, run some very light weight data manipulation (mainly: removing some columns) and then output to another RDS/Mysql instance for "data warehouse". Each component is in its own VPC. RDS/Aurora <> AWS Glue works however I'm having hard time figuring out what's wrong with AWS Glue <> RDS/Mysql connection: the error is a generic "Check that your connection definition references your JDBC database with correct URL syntax, username, and password. Could not create connection to database server."
I've been following this step-by-step guide https://aws.amazon.com/blogs/big-data/connecting-to-and-running-etl-jobs-across-multiple-vpcs-using-a-dedicated-aws-glue-vpc/ and - I think - I covered all points. To debug, I've also tried to spin a new EC2 instance in the same AWS Glue VPC and subnet and I tried to access the output database and it worked
Comparing the first working connection with the second one doesn't yield to any obvious difference and the fact I was able to connect from an EC2 instance makes me even more confused on where is the problem