MySQL Workbench low migration speed

MySQL Workbench low migration speed - amazon-web-services

I'm using MySQLWorkbench (8.0.25) in order to migrate data from a database to another database on the same server and it takes 10mins+ to do so, which seems like a very long time.
The database, on AWS, is a single AZ db.t2.micro database located in Paris, eu-west-3.
Using speedtest.net, it seems I have 25Mb/s download and around the same upload speed, and it does feel like it as I'm browsing. I am very far from the datacenter (I'm in Buenos Aires and it's in Paris) but I always have a VPN turned on whose t2.small server is located in the same datacenter as the database (eu-west-3). In particular, the VPN was on during the test, so the 25Mb/s should be representative of what I can expect during a migration of the eu-west-3 database.
The database is of reasonable size: when I export my database from MySQL workbench, the total dump size is 26Mb.
I'm currently living in Buenos Aires. When I was in France, this migration would take around 30s (I had 150Mb/s download back then and probably like 50Mb upload).
Could you help me understand why it takes so much time despite the still decent internet I have here? Thanks in advance.

I can't write all in a comment so I am gonna reply here.
I am not familair with MySQLWorkbench so I don't know how it does the migration. I would like you to do the migration through cli and see if the issue really is RDS.
I have this script(taken from this forum a while ago and don't remember the source) which does this migration very easily.
If the destination database is present, you need to either rename it(here is how) or delete it all together(drop database $dbname;).
Create a bash script in the instance from where you can access the RDS and copy the following into it.
#!/bin/bash
start=`date +%s`
set -e
mysqlconn="mysql -u $rootuser -ppassword -h $hostname"
olddb=$1
newdb=$2
$mysqlconn -e "CREATE DATABASE $newdb"
params=$($mysqlconn -N -e "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES \
WHERE table_schema='$olddb'")
for name in $params; do
$mysqlconn -e "RENAME TABLE $olddb.$name to $newdb.$name";
done;
$mysqlconn -e "DROP DATABASE $olddb"
end=`date +%s`
runtime=$((($(date +%s)-$start)/60))
echo "Total time taken is ${runtime} minutes"
You can run it as $scriptname.sh $olddb $newdb. If it takes more time then you try with nohup $scriptname.sh $olddb $newdb &.
This way you can be sure that the culprit is mysqlworkbenchand not RDS.

Related

Errors importing large CSV file to DynamoDB using Lambda

I want to import a large csv file (around 1gb with 2.5m rows and 50 columns) into a DynamoDb, so have been following this blog from AWS.
However it seems I'm up against a timeout issue. I've got to ~600,000 rows ingested, and it falls over.
I think from reading the CloudWatch log that the timeout is occurring due to the boto3 read on the CSV file (it opens the entire file first, iterates through and batches up for writing)... I tried to reduce the file size (3 columns, 10,000 rows as a test), and I got a timeout after 2500 rows.
Any thoughts here?!
TIA :)

I really appreciate the suggestions (Chris & Jarmod). After trying and failing to break things programmatically into smaller chunks, I decided to look at the approach in general.
Through research I understood there were 4 options:
Lambda Function - as per the above this fails with a timeout.
AWS Pipeline - Doesn't have a template for importing CSV to DynamoDB
Manual Entry - of 2.5m items? no thanks! :)
Use an EC2 instance to load the data to RDS and use DMS to migrate to DynamoDB
The last option actually worked well. Here's what I did:
Create an RDS database (I used the db.t2.micro tier as it was free) and created a blank table.
Create an EC2 instance (free Linux tier) and:
On the EC2 instance: use SCP to upload the CSV file to the ec2 instance
On the EC2 instance: Firstly Sudo yum install MySQL to get the tools needed, then use mysqlimport with the --local option to import the CSV file to the rds MySQL database, which took literally seconds to complete.
At this point I also did some data cleansing to remove some white spaces and some character returns that had crept into the file, just using standard SQL queries.
Using DMS I created a replication instance, endpoints for the source (rds) and target (dynamodb) databases, and finally created a task to import.
The import took around 4hr 30m
After the import, I removed the EC2, RDS, and DMS objects (and associated IAM roles) to avoid any potential costs.
Fortunately, I had a flat structure to do this against, and it was only one table. I needed the cheap speed of the dynamodb, otherwise, I'd have stuck to the RDS (I almost did halfway through the process!!!)
Thanks for reading, and best of luck if you have the same issue in the future.

How to skip slave replication errors on Google Cloud SQL 2nd Gen

I am in the process of migrating a database from an external server to cloud sql 2nd gen. Have been following the recommended steps and the 2TB mysqlsump process was complete and replication started. However, got an error:
'Error ''Access denied for user ''skip-grants user''#''skip-grants host'' (using password: NO)'' on query. Default database: ''mondovo_db''. Query: ''LOAD DATA INFILE ''/mysql/tmp/SQL_LOAD-0a868f6d-8681-11e9-b5d3-42010a8000a8-6498057-322806.data'' IGNORE INTO TABLE seoi_volume_update_tracker FIELDS TERMINATED BY ''^#^'' ENCLOSED BY '''' ESCAPED BY ''\'' LINES TERMINATED BY ''^|^'' (keyword_search_volume_id)'''
2 questions,
1) I'm guessing the error has come about because cloud sql requires LOAD DATA LOCAL INFILE instead of LOAD DATA INFILE? However am quite sure on the master we run only LOAD DATA LOCAL INFILE so not sure how it changes to remove LOCAL while in replication, is that possible?
2) I can't stop the slave to skip the error and restart since SUPER privileges aren't available and so am not sure how to skip this error and also avoid it for the future while the the final sync happens. Suggestions?

There was no way to work around the slave replication error in Google Cloud SQL, so had to come up with another way.
Since replication wasn't going to work, I had to do a copy of all the databases. However, because of the aggregate size of all my DBs being at 2TB, it was going to take a long time.
The final strategy that took the least amount of time:
1) Pre-requisite: You need to have at least 1.5X the amount of current database size in terms of disk space remaining on your SQL drive. So my 2TB DB was on a 2.7TB SSD, I needed to eventually move everything temporarily to a 6TB SSD before I could proceed with the steps below. DO NOT proceed without sufficient disk space, you'll waste a lot of your time as I did.
2) Install cloudsql-import on your server. Without this, you can't proceed and this took a while for me to discover. This will facilitate in the quick transfer of your SQL dumps to Google.
3) I had multiple databases to migrate. So if in a similar situation, pick one at a time and for the sites that access that DB, prevent any further insertions/updates. I needed to put a "Website under Maintenance" on each site, while I executed the operations outlined below.
4) Run the commands in the steps below in a separate screen. I launched a few processes in parallel on different screens.
screen -S DB_NAME_import_process
5) Run a mysqldump using the following command and note, the output is an SQL file and not a compressed file:
mysqldump {DB_NAME} --hex-blob --default-character-set=utf8mb4 --skip-set-charset --skip-triggers --no-autocommit --single-transaction --set-gtid-purged=off > {DB_NAME}.sql
6) (Optional) For my largest DB of around 1.2TB, I also split the DB backup into individual table SQL files using the script mentioned here: https://stackoverflow.com/a/9949414/1396252
7) For each of the files dumped, I converted the INSERT commands into INSERT IGNORE because didn't want any further duplicate errors during the import process.
cat {DB_OR_TABLE_NAME}.sql | sed s/"^INSERT"/"INSERT IGNORE"/g > new_{DB_OR_TABLE_NAME}_ignore.sql
8) Create a database by the same name on Google Cloud SQL that you want to import. Also create a global user that has permission to access all the databases.
9) Now, we import the SQL files using the cloudsql-import plugin. If you split the larger DB into individual table files in Step 6, use the cat command to combine a batch of them into a single file and make as many batch files as you see appropriate.
Run the following command:
cloudsql-import --dump={DB_OR_TABLE_NAME}.sql --dsn='{DB_USER_ON_GLCOUD}:{DB_PASSWORD}#tcp({GCLOUD_SQL_PUBLIC_IP}:3306)/{DB_NAME_CREATED_ON_GOOGLE}'
10) While the process is running, you can step out of the screen session using Ctrl+a
+ Ctrl+d (or refer here) and then reconnect to the screen later to check on progress. You can create another screen session and repeat the same steps for each of the DBs/batches of tables that you need to import.
Because of the large sizes that I had to import, I believe it did take me a day or two, don't remember now since it's been a few months but I know that it's much faster than any other way. I had tried using Google's copy utility to copy the SQL files to Cloud Storage and then use Cloud SQL's built-in visual import tool but that was slow and not as fast as cloudsql-import. I would recommend this method up until Google fixes the ability to skip slave errors.

How to get ec2 instance details with price details using aws cli

How to get ec2 instance details(like name,id,type,region,volume,platform,ondemand/reserved) with instance price details .
Using aws api in cli and write it as a csv file .
Thanks in advance .

similar to my answer here: get ec2 pricing programmatically?
you can do something similar to the following:
aws pricing get-products --service-code AmazonEC2 --filters "Type=TERM_MATCH,Field=instanceType,Value=m5.xlarge" "Type=TERM_MATCH,Field=location,Value=US East (N. Virginia)" --region us-east-1 | jq -rc '.PriceList[]' | jq -r '[ .product.attributes.servicecode, .product.attributes.location, .product.attributes.instancesku?, .product.attributes.instanceType, .product.attributes.usagetype, .product.attributes.operatingSystem, .product.attributes.memory, .product.attributes.physicalProcessor, .product.attributes.processorArchitecture, .product.attributes.vcpu, .product.attributes.currentGeneration, .terms.OnDemand[].priceDimensions[].unit, .terms.OnDemand[].priceDimensions[].pricePerUnit.USD, .terms.OnDemand[].priceDimensions[].description] | #csv'

I recommand you to use ansible with the ec2-inventory to do so.
Ansible will be able to take all thoses informations using request like:
Then you can have the platform like this for example :
ansible -i ec2.py -m debug -a "var=ec2_platform" all
You'll have to create a script in yaml to take the informations you need and write them in a csv file.
I don't know any easy way to get the exact price of the servers for amazon-ec2, there is a lot argument to take in account, the OS, the disk space, server type, is it reserved or not, etc ...
But I did a good approximation using what I told you above.
Here is the explanation for dynamic inventory with ansible and ec2:
http://docs.ansible.com/ansible/intro_dynamic_inventory.html
Hope it helped !

If your aim is not to automate the prizing of your servers, you can have a one shot from this URL:
https://aws.amazon.com/fr/ec2/pricing/on-demand/
You'll need to know :
server type (ex: m3.large)
Reservation type (reserved or on demand)
OS type (linux, windows, RHEL, ...)
the hour coverage (it depends if you shutdown your server or not during the night or else ...)
Then you'll have a good approximation of the prize.
If you want to have more details, you'll have to have a look at your network and data activity. And this is not that easy to calculate...
Another approach would be to go in your pricing menu, and look at your facturation to know what you paid for the past month. But this won't work if you want to estimate the prize of a new server.
Hope it helped.

Backup MySQL Amazon RDS

I am trying to setup Replica outside of AWS and master is running at AWS RDS. And I do not want any downtime at my master. So I setup my slave node and now I want to backup my current database which is at AWS.
mysqldump -h RDS ENDPOINT -u root -p --skip-lock-tables --single-transaction --flush-logs --hex-blob --master-data=2 --all-databases > /root/dump.sql
I tested it on my VM and it worked fine but when tying it with RDS it gives me error
mysqldump: Couldn't execute 'FLUSH TABLES WITH READ LOCK': Access denied for user 'root'#'%' (using password: YES) (1045)
Is it because i do not have super user privilege or how to I fix this problem? Please someone suggest me.

RDS does not allow even the master user the SUPER privilege, and this is required in order to execute FLUSH TABLES WITH READ LOCK. (This is an unfortunate limitation of RDS).
The failing statement is being generated by the --master-data option, which is, of course, necessary if you want to be able to learn the precise binlog coordinates where the backup begins. FLUSH TABLES WITH READ LOCK acquires a global read lock on all tables, which allows mysqldump to START TRANSACTION WITH CONSISTENT SNAPSHOT (as it does with --single-transaction) and then SHOW MASTER STATUS to obtain the binary log coordinates, after which it releases the global read lock because it has a transaction that will keep the visible data in a state consistent with that log position.
RDS breaks this mechanism by denying the SUPER privilege and providing no obvious workaround.
There are some hacky options available to properly work around this, none of which may be particularly attractive:
do the backup during a period of low traffic. If the binlog coordinates have not changed between the time you start the backup and after the backup has begin writing data to the output file or destination server (assuming you used --single-transaction) then this will work because you know the coordinates didn't change while the process was running.
observe the binlog position on the master right before starting the backup, and use these coordinates with CHANGE MASTER TO. If your master's binlog_format is set to ROW then this should work, though you will likely have to skip past a few initial errors, but should not have to subsequently have any errors. This works because row-based replication is very deterministic and will stop if it tries to insert something that's already there or delete something that's already gone. Once past the errors, you will be at the true binlog coordinates where the consistent snapshot actually started.
as in the previous item, but, after restoring the backup try to determine the correct position by using mysqlbinlog --base64-output=decode-rows --verbose to read the master's binlog at the coordinates you obtained, checking your new slave to see which of the events must have already been executed before the snapshot actually started, and using the coordinates determined this way to CHANGE MASTER TO.
use an external process to obtain a read lock on each and every table on the server, which will stop all writes; observe that the binlog position from SHOW MASTER STATUS has stopped incrementing, start the backup, and release those locks.
If you use any of these approaches other than perhaps the last one, it's especially critical that you do table comparisons to be certain your slave is identical to the master once it is running. If you hit subsequent replication errors... then it wasn't.
Probably the safest option -- but also maybe the most annoying since it seems like it should not be necessary -- is to begin by creating an RDS read replica of your RDS master. Once it is up and synchronized to the master, you can stop replication on the RDS read replica by executing an RDS-provided stored procedure, CALL mysql.rds_stop_replication which was introduced in RDS 5.6.13 and 5.5.33 which doesn't require the SUPER privilege.
With the RDS replica slave stopped, take your mysqldump from the RDS read replica, which will now have an unchanging data set on it as of a specific set of master coordinates. Restore this backup to your off-site slave and then use the RDS read replica's master coordinates from SHOW SLAVE STATUS Exec_Master_Log_Pos and Relay_Master_Log_File as your CHANGE MASTER TO coordinates.
The value shown in Exec_Master_Log_Pos on a slave is the start of the next transaction or event to be processed, and that's exactly where your new slave needs to start reading on the master.
Then you can decommission the RDS read replica once your external slave is up and running.

Thanks Michael, I think the most correct solution and the recommended by AWS is do the replication using a read replica as a source as explained here.
Having a RDS master, RDS read replica and an instance with MySQL ready, the steps to get an external slave are:
On master, increase binlog retention period.
mysql> CALL mysql.rds_set_configuration('binlog retention hours', 12);
On read replica stop replication to avoid changes during the backup.
mysql> CALL mysql.rds_stop_replication;
On read replica annotate the binlog status (Master_Log_File and Read_Master_Log_Pos)
mysql> SHOW SLAVE STATUS;
On server instance do a backup and import it (Using mydumper as suggested by Max can speed up the process).
mysqldump -h RDS_READ_REPLICA_IP -u root -p YOUR_DATABASE > backup.sql
mysql -u root -p YOUR_DATABASE < backup.sql
On server instance set it as slave of RDS master.
mysql> CHANGE MASTER TO MASTER_HOST='RDS_MASTER_IP',MASTER_USER='myrepladmin', MASTER_PASSWORD='pass', MASTER_LOG_FILE='mysql-bin-changelog.313534', MASTER_LOG_POS=1097;
Relace MASTER_LOG_FILE and MASTER_LOG_POS to the values of Master_Log_File Read_Master_Log_Pos you saved before, also you need an user in RDS master to be used by slave replication.
mysql> START SLAVE;
On server instance check if replication was success.
mysql> SHOW SLAVE STATUS;
On RDS read replica resume replication.
mysql> CALL mysql.rds_start_replication;

for RDS binlog position you can use mydumper with --lock-all-tables, it will use LOCK TABLES ... READ just to get the binlog coordinates and then realease it instead of FTWRL.

Michael's answer is extremely helpful and focuses on the main sticking point: you simply cannot GRANT the required SUPER privilege on RDS, and therefore you can't use the --master-data flag that would make things so much easier.
I read that it may be possible to work around this by creating or modifying a Database Parameter Group via the API, but I think using the RDS procedures is a better option.
The multi-tiered replication approach works well, though, and can include tiers outside RDS/VPC so it's possible to replicate from "Classic" EC2 to VPC using this method.
A lot of the necessary functionality is only in later releases of MySQL 5.5 and 5.6, and I strongly recommend you run the same version on all the DBs involved in the replication stack, so you may have to do an upgrade of your old DB before all of this, which means yet more tedium and replication and so on.

I had faced a similar problem a quick workaround to this is:
Create a EBS Volume to have an extra space or extend current EBS volume on EC2. (or if you have an extra space you can use that).
Use mysqldump command without --master-data or --flush-data directive to generate a complete (FULL) backup of db.
mysqldump -h hostname --routines -uadmin -p12344 test_db > filename.sql
admin is DB name and 12344 is the password
Above is for taking backup of one single DB, if required to take all DBs then specify --all-databases and also mention DB Names.
Create a Cron of this command to run once a day that will automatically generate the dump.
Please note that this will incur an extra cost if your DB Size is huge. as it creates a complete DB dump.
hope this helps

You need to
1- create a read replica on aws
2- make sure that this instance is catching up with the master
3- stop the replication and get the log_file and log_position parameters by
show slave status \G
4- dump the database and use the parameters logged in step 3 to start the replication on your own server.
5- start the slave.
Detailed instructions here

Either things have changed, since #Michael - sqlbot 's response, or there is a misunderstanding going on here (could be on my part),
You can use COPY to import a csv file into rds, at least on the postgres version, you just need to use FROM STDIN instead of directly naming the file,
which means you end up piping things like:
cat data.csv | psql postgresql://server:5432/mydb -U user -c "COPY \"mytable\" FROM STDIN DELIMITER ',' "

How do I import a local MySQL db to RDS db instance?

I've created a RDS instance called realcardiodb (the engine is mysql)
and I've exported my database from my localhost. File is saved locally called localhostrealcardio.sql
Most research says to use mysqldump to import data from a local system to a web server, but my system doesn't even recognize mysqldump.
C:\xampp\mysql>mysqldump
'mysqldump' is not recognized as an internal or external command, operable program or batch file.
How do I resolve this error should I use mysqldump? (I definitely have mysql install on my system)
Is there a better utility I should use?
Any help is appreciated, especially if you have experience importing mysql to aws rds.
Thanks!
DK
Update 7/31/2012
So I got the error resolved. mysqldump is in the bin directory C:\xampp\mysql\bin>mysqldump
AWS provides the folloinwg instructions for uploading a local database to RDS:
mysqldump acme | mysql --host=hostname --user=username --password acme
Can someone break this down for me?
1) Is the first 'acme' (after mysqldump command) the name of my local database or the exported sql file I saved locally?
2)Is the hostname the IP address, Public DNS, RDS Endpoint or neither?
3)The username and password I assume is the RDS credentials and the second acme is the name of the database I created in RDS.
Thanks!

This is how I did it for a couple instances that had data in the MySQl tables.
The steps to creating an RDS database instance:
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.MySQL.html
Note: Make sure the RDS instance has a security group configured that relates to the EC2 security group.
http://docs.amazonwebservices.com/AmazonRDS/latest/UserGuide/USER_Workin...
Before we go forward, let me provide a list of what some of the following placeholders are:
host.address.for.rds.server = this will be what is referred to as the "end point" in your RDS description/settings page.
rdsusername = the master user account which you created during RDS setup.
rdsdatabase = a blank database which you created inside the server on your RDS instance.
backupfile.sql = the sql dump file your made of your pre-existing installation's database.
Once you've created a fresh RDS database instance, and have configured its security settings, log into this server (from within an ssh session to your EC2 server) and then create an empty database inside the instance using basic SQL commands.
mysql -h host.address.for.rds.server -P 3306 -u rdsusername -p
(enter your password)
create database rdsdatabase;
Then quit out of the MySQL environment inside your RDS server.
\q
This tutorial assumes you already have a backup from your old database. If you don't, go create one now. After that, you’re ready to import that sql dump file into the empty database waiting on your RDS server.
mysql -h host.address.for.rds.server -u rdsusername -p rdsdatabase < backupfile.sql
It might take a few seconds to complete, depending on the size of the sql dump file. Your indication that it is finished is that the bash command prompt reappears.
Note: the command “mysqlimport” is used when imported data directly into an existing table inside a database. It might seem like we’re “importing” data, but this is not what we’re actually doing in this situation. The database we are migrating to has no tables yet, and the sql dump file we’re using contains the sql commands to generate the tables it needs.
Confirm the Transfer
Now, if you didn't get any error messages, then your sql transfer probably worked. If you want, you can double check to see if it did by connecting to your RDS database server, looking up the database you created, and check to see if the tables are now present.
mysql -h host.address.for.rds.server -P 3306 -u rdsusername -p
(enter your password)
use rdsdatabase;
show tables;

I prefer using MySQL workbench. It's much more easier & user friendly than the command line way.
It provides a simple GUI.
MySQL workbench or SQL Yog.
These are the steps that I did.
1) Install MySQL Workbench.
2) In AWS console, there must be a security group for your RDS instance.
Add an inbound rule to that group for allowing connections from your machine.
It's simple. Add your IP-address.
3) Open MySQL workbench, Add a new connection.
4) Give the connection a name you prefer.
5) Choose connection method- Standard TCP/IP
6) Enter your RDS endpoint in the field of Hostname.
7) Port:3306
8) Username: master username (the one which which you created during the creation of your RDS instance)
9)Password: master password
10) Click Test Connection to check your connection.
11) If connection is successful, click OK.
12) Open the connection.
13) you will see your database 'realcardiodb' there.
14) Now you can export your mysqldump file to this database. Go to-> Server. Click Data Import.
15) You can check whether the data has been migrated by simply opening a blank SQL file & typing in basic SQL commands like use database, select * from table;
That's it. Viola.

If you have a backup.sql in your PC, No need to transfer to EC2. Just give below line on your terminal in your PC.
$ mysql -h rdsinstance-hostaddress-ending.rds.amazonaws.com -u rds_username -p rds_database < /path/to/your/backup.sql
Enter password: paswd_mysql_user
That's all.
Import backup directly from existing remote server
SSH connect to your remote server
Get the remote server mysql backup (backup/path/backupfile.sql)
Import backup file to RDS mysql while you in remote server shell
mysql -h your-mysql-instance.region.rds.amazonaws.com -u db_username -p db_name < backup/path/backupfile.sql
Note:
I have tried all the above criteria to import my existing backup to new RDS database, including through EC2 as in AWS documentation. It was a 10GB backup. So I have tried tables by tables as well. It shows process completed but some data were missing for large tables. So I had to write a DB to DB data migration script.

Using work bench :
setup connection
go to management tab and click on data import/restore
click on import from self contained file .
choose your mysqlbackup.sql file.
select default database.
click on start import button.
Using command line (On Windows ) :
mysqldump -u <localuser>
--databases world
--single-transaction
--compress
--order-by-primary
-p<localpassword> | mysql -u <rds-user-name>
--port=3306
--host=ednpoint
-p<rds-password>
For more detail please refer :
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.SmallExisting.html
or
https://docs.bitnami.com/aws/how-to/migrate-database-rds/#using-phpmyadmin-110
Hope it helps.

The step by step instruction on how to migrate already existing db on mysql/mariadb to already running RDS instance.

Here is the AWS RDS Mysql document to import customer data into RDS
http://aws.amazon.com/articles/2933
Create flat files containing the data to be loaded
Stop any applications accessing the target DB Instance
Create a DB Snapshot
Disable Amazon RDS automated backups
Load the data using mysqlimport
Enable automated backups again

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js