I am trying to connect my ec2 installed/mysql with Glue, the purpose is to xtract some information and moved to redshift, but i am receiving the following error:
Check that your connection definition references your JDBC database with correct URL syntax, username, and password. Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
This is the format that i am using jdbc:mysql://host:3306/database
I am using the same vpc, same SG, same subnet for the instance.
i know the user/password are correct because i am connected to the database with sql developer.
What i need to check? Is it possible to use AWS Glue with mysql in my instance?
Thanks in advance.
In the JDBC connection url that you have mentioned, use the private ip of the ec2 instance(where mysql is installed) as host.
jdbc:mysql://ec2_instance_private_ip:3306/database
Yes, It is possible to use AWS Glue with your MySQL running in your EC2 Instance but Before, you should first use DMS to migrate your databases.
More over, your target database (Redshift) has a different schema than the source database (MySQL), that's what we call heterogeneous database migrations (the schema structure, data types, and database code of source databases are quite differents), you need AWS SCT.
Check this out :
As you can see, I'm not sure you can migrate straight from MySQL in an EC2 instance to Amazon Redshift.
Hope this helps
Related
We need to replicate data from an Amazon RDS (MySQL) databaes to an Aurora PostgreSQL database. Each database is on a different AWS account and region.
The data must be replicated in the PostgreSQL instance every 6 hours, so we need the guarantee that the data doesn't duplicate even if a field updated for a record.
Which method, tool, or design is better to do that? (Could be different to AWS.)
You could facilitate one of the following services provided by AWS:
Database Migration Service
Glue
The AWS Database Migration Service supports:
Using an Amazon-Managed MySQL-Compatible Database as a Source for AWS DMS
Using a PostgreSQL Database as a Target for AWS Database Migration Service
According to Moving data from S3 -> RDS using AWS Glue
I found that an instance is required to add a connection to a data target. However, my RDS is a serverless, so there is no instance available. Does Glue support this case?
I have tried to connect Aurora MySql Serverless with AWS glue recently, and I failed. And I got a timeout error.
Check that your connection definition references your JDBC database with
correct URL syntax, username, and password. Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago.
The driver has not received any packets from the server.
I think the reason was Aurora serverless doesn't have any continuously running instances so in the connection URL you cannot give any instances, and that's why Glue cannot connect.
So, you need to make sure that DB instance is running. Only then your JDBC connection works.
If your DB runs in a private VPC, you can follow this link:
Nat Creation
EDIT:
Instead of NAT GW, you can also use the VPC endpoint for S3.
Here is a really good blog that explains step by step.
Or AWS documentation
AWS Glue supports the scenario, i.e., it works well to load data from S3 into Aurora Serverless using an AWS Glue job. The engine version I'm currently using is 8.0.mysql_aurora.3.02.0
Note: if you get an error saying Data source rejected establishment of connection, message from server: "Too many connections", you can increase ACUs (currently mine is set to min 4 - max 8 ACUs for your reference), as the maximum number of connections depends on the capacity of ACUs.
I can use JDBC build connection,
There is one thing very important is you should have at least one subnet open ALL TCP port, but you can point the port to the subnet.
With the setting, connection test pass, crawler also can create tables.
I have a PostgreSQL database which is in effect "on premise" but I have credentials and a JDBC connection string. I want to read the table on AWS glue and use it in a job as a source, and write to S3.
But it is asking for VPC? I don't understand. I can hard code the connection in the Job? This seems like such a basic task for an ETL environment. What am I missing?
Glue can connect to any database using JDBC. This is a good toolbox to fast track pyspark coding.
Basically you need to understand where you are physically located in AWS environment. And identify or create a VPC. From there, establish your ACL and Security Group.
Good luck!
I'm trying to setup an ETL job with AWS Glue that should pull data from the production database on RDS/Aurora, run some very light weight data manipulation (mainly: removing some columns) and then output to another RDS/Mysql instance for "data warehouse". Each component is in its own VPC. RDS/Aurora <> AWS Glue works however I'm having hard time figuring out what's wrong with AWS Glue <> RDS/Mysql connection: the error is a generic "Check that your connection definition references your JDBC database with correct URL syntax, username, and password. Could not create connection to database server."
I've been following this step-by-step guide https://aws.amazon.com/blogs/big-data/connecting-to-and-running-etl-jobs-across-multiple-vpcs-using-a-dedicated-aws-glue-vpc/ and - I think - I covered all points. To debug, I've also tried to spin a new EC2 instance in the same AWS Glue VPC and subnet and I tried to access the output database and it worked
Comparing the first working connection with the second one doesn't yield to any obvious difference and the fact I was able to connect from an EC2 instance makes me even more confused on where is the problem
I have a local MySQL database and I want to migrate the data inside of it to a remote MySQL database (using RDS on AWS). How can I migrate my data between the two instances?
AWS DMS helps you migrate large, terabyte-scale databases to the AWS
Cloud easily and securely. During migration, the source database
remains fully operational, minimizing downtime.
But judging form your question you want homogeneous data migration and as per AWS Documentation:
If you're performing a homogeneous migration, use your engine’s native
tools, such as MySQL dump or MySQL replication.
Refer to this answer for using SQL Dump on larger data.
Thanks
Use AWS database migration service that is available in Aws. You need to provide your database end-point I,e your on premises data- base server end-point in it and also set your db engine parameters to your requirement and launch. it talks 10-15 minutes to migrate your data to cloud and from there you can continue accessing your database from the AWS it self.
The other method is, take a recent back up of your on premises database. Launch an instance in aws EC2 and install the db that you are using on premises.copy the back file in your system to cloud.using the backup file available launch the database.set up an RDS instance of the type that you have installed in EC2, and connect the end points.