How do I import a local MySQL db to RDS db instance? - amazon-web-services

I've created a RDS instance called realcardiodb (the engine is mysql)
and I've exported my database from my localhost. File is saved locally called localhostrealcardio.sql
Most research says to use mysqldump to import data from a local system to a web server, but my system doesn't even recognize mysqldump.
C:\xampp\mysql>mysqldump
'mysqldump' is not recognized as an internal or external command, operable program or batch file.
How do I resolve this error should I use mysqldump? (I definitely have mysql install on my system)
Is there a better utility I should use?
Any help is appreciated, especially if you have experience importing mysql to aws rds.
Thanks!
DK
Update 7/31/2012
So I got the error resolved. mysqldump is in the bin directory C:\xampp\mysql\bin>mysqldump
AWS provides the folloinwg instructions for uploading a local database to RDS:
mysqldump acme | mysql --host=hostname --user=username --password acme
Can someone break this down for me?
1) Is the first 'acme' (after mysqldump command) the name of my local database or the exported sql file I saved locally?
2)Is the hostname the IP address, Public DNS, RDS Endpoint or neither?
3)The username and password I assume is the RDS credentials and the second acme is the name of the database I created in RDS.
Thanks!

This is how I did it for a couple instances that had data in the MySQl tables.
The steps to creating an RDS database instance:
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.MySQL.html
Note: Make sure the RDS instance has a security group configured that relates to the EC2 security group.
http://docs.amazonwebservices.com/AmazonRDS/latest/UserGuide/USER_Workin...
Before we go forward, let me provide a list of what some of the following placeholders are:
host.address.for.rds.server = this will be what is referred to as the "end point" in your RDS description/settings page.
rdsusername = the master user account which you created during RDS setup.
rdsdatabase = a blank database which you created inside the server on your RDS instance.
backupfile.sql = the sql dump file your made of your pre-existing installation's database.
Once you've created a fresh RDS database instance, and have configured its security settings, log into this server (from within an ssh session to your EC2 server) and then create an empty database inside the instance using basic SQL commands.
mysql -h host.address.for.rds.server -P 3306 -u rdsusername -p
(enter your password)
create database rdsdatabase;
Then quit out of the MySQL environment inside your RDS server.
\q
This tutorial assumes you already have a backup from your old database. If you don't, go create one now. After that, you’re ready to import that sql dump file into the empty database waiting on your RDS server.
mysql -h host.address.for.rds.server -u rdsusername -p rdsdatabase < backupfile.sql
It might take a few seconds to complete, depending on the size of the sql dump file. Your indication that it is finished is that the bash command prompt reappears.
Note: the command “mysqlimport” is used when imported data directly into an existing table inside a database. It might seem like we’re “importing” data, but this is not what we’re actually doing in this situation. The database we are migrating to has no tables yet, and the sql dump file we’re using contains the sql commands to generate the tables it needs.
Confirm the Transfer
Now, if you didn't get any error messages, then your sql transfer probably worked. If you want, you can double check to see if it did by connecting to your RDS database server, looking up the database you created, and check to see if the tables are now present.
mysql -h host.address.for.rds.server -P 3306 -u rdsusername -p
(enter your password)
use rdsdatabase;
show tables;

I prefer using MySQL workbench. It's much more easier & user friendly than the command line way.
It provides a simple GUI.
MySQL workbench or SQL Yog.
These are the steps that I did.
1) Install MySQL Workbench.
2) In AWS console, there must be a security group for your RDS instance.
Add an inbound rule to that group for allowing connections from your machine.
It's simple. Add your IP-address.
3) Open MySQL workbench, Add a new connection.
4) Give the connection a name you prefer.
5) Choose connection method- Standard TCP/IP
6) Enter your RDS endpoint in the field of Hostname.
7) Port:3306
8) Username: master username (the one which which you created during the creation of your RDS instance)
9)Password: master password
10) Click Test Connection to check your connection.
11) If connection is successful, click OK.
12) Open the connection.
13) you will see your database 'realcardiodb' there.
14) Now you can export your mysqldump file to this database. Go to-> Server. Click Data Import.
15) You can check whether the data has been migrated by simply opening a blank SQL file & typing in basic SQL commands like use database, select * from table;
That's it. Viola.

If you have a backup.sql in your PC, No need to transfer to EC2. Just give below line on your terminal in your PC.
$ mysql -h rdsinstance-hostaddress-ending.rds.amazonaws.com -u rds_username -p rds_database < /path/to/your/backup.sql
Enter password: paswd_mysql_user
That's all.
Import backup directly from existing remote server
SSH connect to your remote server
Get the remote server mysql backup (backup/path/backupfile.sql)
Import backup file to RDS mysql while you in remote server shell
mysql -h your-mysql-instance.region.rds.amazonaws.com -u db_username -p db_name < backup/path/backupfile.sql
Note:
I have tried all the above criteria to import my existing backup to new RDS database, including through EC2 as in AWS documentation. It was a 10GB backup. So I have tried tables by tables as well. It shows process completed but some data were missing for large tables. So I had to write a DB to DB data migration script.

Using work bench :
setup connection
go to management tab and click on data import/restore
click on import from self contained file .
choose your mysqlbackup.sql file.
select default database.
click on start import button.
Using command line (On Windows ) :
mysqldump -u <localuser>
--databases world
--single-transaction
--compress
--order-by-primary
-p<localpassword> | mysql -u <rds-user-name>
--port=3306
--host=ednpoint
-p<rds-password>
For more detail please refer :
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.SmallExisting.html
or
https://docs.bitnami.com/aws/how-to/migrate-database-rds/#using-phpmyadmin-110
Hope it helps.

The step by step instruction on how to migrate already existing db on mysql/mariadb to already running RDS instance.

Here is the AWS RDS Mysql document to import customer data into RDS
http://aws.amazon.com/articles/2933
Create flat files containing the data to be loaded
Stop any applications accessing the target DB Instance
Create a DB Snapshot
Disable Amazon RDS automated backups
Load the data using mysqlimport
Enable automated backups again

Related

Reading And Writing to MYSQL in AWS Glue

enter image description hereI'm able to connect to MYSQL while running my Pyspark Code Locally in juypter notebook, but the same code I am getting Communication error in AWS Glue while running the code. I have added MySQL jar in jar files required while creating the job in AWS Glue.
Reading from MYSQL
dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost/read").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "student").option("user", "root").option("password", "root").load()
Writing to MYSQL
df = sc.parallelize([[25, 'Prem'],
[20, 'Kate'],
[20, 'Kate'],
[40, 'Cheng']]).toDF(["Depy_id","Dept_name"])
df.write.format('jdbc').options(
url='jdbc:mysql://localhost/test',
driver='com.mysql.jdbc.Driver',
dbtable='dept',
user='root',
password='root').mode('overwrite').save()
Please note that you have to provide a valid database URL, not a localhost. I believe your jupyter notebook was run locally on a laptop, in the same local environment where your mysql is running as well.
AWS Glue runs within an AWS environment, and behind the scene it would launch number of EC2 instances depending upon the DPU configuration. If your URL is configured as LOCALHOST, then the EC2 instance where the pyspark code is running, would look for a mysql database on the same node.
Please make sure you have a valid public IP for the mysql database, and try to setup a connection in the AWS Glue as suggested by bdcloud, and retry again. If you dont want to create a connection, you can hard code the connection parameters in the code and try again. If you cannot get a public IP for the installed mysql database, maybe you can try setup an RDS Mysql on AWS, and use it for testing.
Sample code snippet:
conn = mysql.connector.connect(host=url, user=uname, password=pwd, database=dbase)
cur = conn.cursor()
insertQry = "INSERT INTO emp (id, emp_name, dept, designation, address1, city, state, active_start_date, is_active) SELECT (SELECT coalesce(MAX(ID),0) + 1 FROM atlas.emp) id, tmp.emp_name, tmp.dept, tmp.designation, tmp.address1, tmp.city, tmp.state, tmp.active_start_date, tmp.is_active from EMP_STG tmp ON DUPLICATE KEY UPDATE dept=tmp.dept, designation=tmp.designation, address1=tmp.address1, city=tmp.city, state=tmp.state, active_start_date=tmp.active_start_date, is_active =tmp.is_active ;"
n = cur.execute(insertQry)
print (" CURSOR status :", n)
Please refer to the AWS Glue connections section:
yes that's true I am able to connect it as above by just adding the connection to the job as well as changing the local host to the respective

Unable to do a Restore from S3 with RDS SQL Server

I am attempting to do a restore from S3 in AWS RDS (SQL Server). On the page when I can select the engine, I select SQL Server. But the options to select the Edition are all grayed out and I cannot select one and move on. You can see this from the screen shot below. Note, this does not happen if I simply attempt to create an instance of SQL Server in RDS. I can then select an Edition.
It looks like you can't do it straight from the console and the Database has to exist already.
Restoring a Database
To restore your database, you call the rds_restore_database stored
procedure.
The following parameters are required:
#restore_db_name – The name of the database to restore.
#s3_arn_to_restore_from – The Amazon S3 bucket that contains the backup file, and the name of the file.
The following parameters are optional:
#kms_master_key_arn – If you encrypted the backup file, the key to use to decrypt the file.
Example Without Encryption
exec msdb.dbo.rds_restore_database
#restore_db_name='database_name',
#s3_arn_to_restore_from='arn:aws:s3:::bucket_name/file_name_and_extension';
Example With Encryption
exec msdb.dbo.rds_restore_database
#restore_db_name='database_name',
#s3_arn_to_restore_from='arn:aws:s3:::bucket_name/file_name_and_extension',
#kms_master_key_arn='arn:aws:kms:region:account-id:key/key-id';
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/SQLServer.Procedural.Importing.html

How to export SQL Output directly to CSV on Amazon RDS

Amazon doesn't give Access to RDS Server directly ( they expose it only through service RDS) hence, "select into outfile" doesn't work..
Even the master user does not have privileges of FILE.
I created ticket with Amazon; talked at length with them.. They suggested few work-around like using Data Pipeline etc.. but all are too complicated..
Surely one of the way could be to use tool like MYSql Workbench --> execute query --> Export to CSV.. Only problem with this approach is that you need to execute same query twice on server and is problematic if your output is having thousands of rows.
Just write the query in a file a.sql. The SQL Should be in this format:
select concat( '"',Product_id,'","', Subcategory,'","', ifnull(Product_type,''),'","', ifnull(End_Date,''), '"') as data from tablename
mysql -h xyz.abc7zdltfa3r.ap-southeast-1.rds.amazonaws.com -u query -pxyz < a.sql > deepak.csv
Output will be there in file deepak.csv

Connect IntelliJ to Amazon Redshift

I'm using the latest version of IntelliJ and I've just created a cluster in Amazon Redshift. How do I connect IntelliJ to Redshift so that I can query it from my favorite IDE?
Download a jdbc driver:
http://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html#download-jdbc-driver
On IntelliJ: View |Tool Windows | Database
Click on "Data Source
Properties" ()
Click Add (+) and select "Database Driver":
Uncheck "JDBC drivers", and add a jdbc driver, select a class from the dropdown and select a PostgreSQL dialect:
6.Add a new connection, and use this datasource for your connection: (+ | Data Source | RedShift).
7.Set URL templates:
jdbc:redshift://[{host::localhost}[:{port::5439}]][/{database::postgres}?][\?<&,user={user:param},password={password:param},{:identifier}={:param}>]
jdbc:redshift://\[{host:ipv6:\:\:1}\][:{port::5439}][/{database::postgres}?][\?<&,user={user:param},password={password:param},{:identifier}={:param}>]
jdbc:redshift:{database::postgres}[\?<&,user={user:param},password={password:param},{:identifier}={:param}>]
You can connect IntelliJ to Redshift by the using the JDBC Driver supplied by Amazon. In the Redshift Console, go to "Connect Client" to get the driver.
Then, in the IntelliJ Data Source window, add the JAR as a Driver file, and use the following settings:
Class: com.amazon.redshift.jdbc41.Driver
URL template: jdbc:redshift://{host}:{port}/{database}
Common Pitfalls:
If the driver file is not readable or marked as in quarantine by OS X, you won't be able to select the driver class.
For a more detailed guide, see this blog post: Connecting IntelliJ to Redshift
Note: There is no native Redshift support in IntelliJ yet. IntelliJ Issue DBE-1459
Update for 2019: I've just created a PostgreSQL connection and then filled the usual Redshift settings (don't forget port: 5439), no need to download Amazon's JDBC driver.
Only little issue is that the syntax check doesn't know Redshift specificities such as AS and some functions, but queries execute correctly.
Update for 2020: PyCharm (and possibly all other JetBrains IDEs) now supports connecting to Redshift through IAM AWS credentials without manual driver installation.
Here are the detailed setup instructions:
Grant a redshift:GetClusterCredentials permission to your AWS user. Either create and attach a new policy (docs) or use an existing one such as AmazonRedshiftFullAccess (not recommended: too permissive).
Create an AWS access key (access key id + secret access key pair) for your user (docs).
Create a text configuration file ~/.aws/credentials (no extension) with the following content (docs):
[default] # arbitrary profile name, will be used later
region = <your region>
aws_access_key_id = <your access key id> # created on the previous step
aws_secret_access_key = <your secret access key>
Create a new PyCharm database connection of type Amazon Redshift and set it up (docs):
Choose connection type = IAM cluster/region (right under the «General» tab of the connection settings window).
Authentication = AWS Profile
User = {your AWS login}
Profile = default or the one you have used in credentials file.
The credentials can possibly be provided through AccessKeyID/SecretAccessKey connection settings on the «Advanced» tab but it did not work for me (due to NullPointerException if Profile field is empty).
Database = {your database}, choose an existing one to not face non descriptive errors from the driver.
Region = {your region}
Cluster = {cluster name}, get it from Redshift AWS console.
Setup the connection:
Check necessary databases in the «Schemas» tab.
«Advanced» tab: AutoCreate = true (literal lowercase true as the setting value). This will automatically create a new database user with your AWS login.
Test connection.

ssh through python onto amazon ec2

I have two databases one on my local machine and one on my amazon ec2 instance.Now what I
do is I run a python program on my local machine which makes changes to the databse on my local machine.I want these changes to be reflected onto the database on amazon ec2 instance,
periodically.I want to do this in python.A script that logs onto the amazon server establishes a connection with the database there and makes the changes.
I came across some modules like pexcept,fabric and paramiko.But I am struggling with the
key authentication.
The way I ssh from my terminal is ssh -i my_rsa_file.pem username#ip_address.There is no password.How do I go about this ??
Also I want to know whether simply using Popen in subprocess to execute the login command work ?
The Boto EC2 documentation here describes the EC2 instance object, of which "key_pair" is an attribute. Look about 3/4 of the way down, under "boto.ec2.instance".
http://boto.readthedocs.org/en/latest/ref/ec2.html
So, e.g., you could run some instances as follows, and then store the first instance as "inst":
reservation = conn.run_instances(...)
inst = reservation.instances[0]
To retrieve your key-pair name as a unicode string, just use:
kp_name = inst.key_name
You can then retrieve the corresponding Boto object using get_key_pair:
kp_obj = conn.get_key_pair(kp_name)
Of course, this is a silly example, since I would have needed my key pair name to run_instances in the first place. May you find a more fruitful application!