I have got one server in Rackspace and i'm already running a cron job evry day night to process something...(some account related operation- that will send me email every mid night). my application is in groovy on grails. now i want to take mysql database (called myfleet) backup on every mid night and put that file in Amezon S3 . how can i do that? do i need to write any java or groovy file to process that? or is it can be done from Linux box itself? i have already got account in Amezon S3 (bucket name is fleetBucket)
You can also use STDOUT and the AWS CLI tool to pipe the output of your mysqldump straight to S3:
mysqldump -h [db_hostname] -u [db_user] -p[db_passwd] [databasename] | aws s3 cp - s3://[s3_bucketname]/[mysqldump_filename]
For example:
mysqldump -h localhost -u db_user -ppassword test-database | aws s3 cp - s3://database-mysqldump-bucket/test-database-dump.sql
The mysqldump command outputs to STDOUT by default. Using - as the input argument for aws s3 cp tells the AWS CLI tool to use STDIN for the input.
mysqldump --host=$HOST --user=$USER --password=$PASSWORD $DB_NAME --routines --single-transaction | gzip -9 | aws s3 cp - s3://bucket/database/filename.sql.gz
will directly store file to s3.
Should be pretty straightforward:
- backup your database using mysqldump
mysqldump -u [uname] -p[pass] myfleet | gzip -9 > myfleet.sql.gz
- upload your dump file to S3 using a command line client (e.g. http://s3tools.org/s3cmd:
s3cmd put myfleet.sql.gz s3://<bucketname>/myfleet.sql.gz
Just add this to your cron job (you might want to use some kind of numbering scheme for the dump files, in case you want to keep several versions).
If the source DB is on AWS and is of type Aurora.Mysql you can backup directly to S3 with a command like
SELECT * FROM employees INTO OUTFILE S3 's3-us-west-2://aurora-select-into-s3-pdx/sample_employee_data';
See https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html for details.
Related
we have an ERP system that uses SQL Server DB, which generates DB backup files on a Windows PC every day.
Our manager hope to realize that the DB backup file of the day can be uploaded and saved to an AWS S3 bucket automatically and regularly every day. We have already registered an AWS account for this and created a bucket specifically. Considering the storage cost of AWS S3, it is expected that only the ten most recent DB backup files should be kept in the bucket.
It seems that we need AWS S3 CLI and some script coding to realize this requirement. Should it be implemented in .bat code, or Windows Power Shell scripting code? How to check whether there are ten DB backup files in the S3 bucket already? If yes, how to delete the oldest one prior to uploading the latest one?
Is there any similar sample code? Thanks!
You can create a cron job for this. Also you need to install aws cli. If you are hosting your DB in linux then this code might help you:
#!/bin/bash
NOW=$(date +"%Y-%m-%d")
NOW_TIME=$(date +"%Y-%m-%d %T %p")
NOW_MONTH=$(date +"%Y-%m")
MYSQL_HOST="localhost"
MYSQL_PORT="3306"
MYSQL_DATABASE="wordpress"
MYSQL_USER="user"
MYSQL_PASSWORD="password"
BACKUP_DIR="/home/mkyong/backup/$NOW_MONTH"
BACKUP_FULL_PATH="$BACKUP_DIR/$MYSQL_DATABASE-$NOW.sql.gz"
AMAZON_S3_BUCKET="s3://mkyong/backup/linode/mysql/$NOW_MONTH/"
AMAZON_S3_BIN="/home/mkyong/.local/bin/aws"
#################################################################
mkdir -p ${BACKUP_DIR}
backup_mysql(){
mysqldump -h ${MYSQL_HOST} \
-P ${MYSQL_PORT} \
-u ${MYSQL_USER} \
-p${MYSQL_PASSWORD} ${MYSQL_DATABASE} | gzip > ${BACKUP_FULL_PATH}
}
upload_s3(){
${AMAZON_S3_BIN} s3 cp ${BACKUP_FULL_PATH} ${AMAZON_S3_BUCKET}
}
backup_mysql
upload_s3
To elaborate,
There is a tar.gz file on my AWS S3, let's call it example.tar.gz.
So, what I want to do is download the extracted contents of example.tar.gz to /var/home/.
One way to do it is to simply download the tar.gz, extract it, then delete the tar.gz.
However, I don't want to use space downloading the tar.gz file, I just want to download the extracted version or only store the extracted version.
Is this possible?
Thanks!
What you need is the following:
aws s3 cp s3://example-bucket/file.tar.gz - | tar -xz
This will stream the file.tar.gz from s3 and extract it directly (in-memory) to the current directory. No temporary files, no extra storage and no clean up after this one command.
Make sure you write the command exactly as above.
Today I tested with Python Boto 3 and aws cli and I noticed that tar.gz is extracted automatically when the file is downloaded
There isn't currently a way you can do this with S3.
You could create the following script though and just run it whenever you wish to download the tar. Just as long as you have the IAM role / access keys setup.
!#/bin/bash
aws s3 cp s3://$1/$2 $3
tar -xvf $3
rm $3
Then just call the script using ./myScript BUCKET_NAME FILE_LOCATION OUTPUT_FILE
I would like to ask about a processing task I am trying to complete using a data pipeline in AWS, but I have not been able to get it to work.
Basically, I have 2 data nodes representing 2 MySQL databases, where the data is supposed to be extracted from periodically and placed in an S3 bucket. This copy activity is working fine selecting daily every row that has been added, let's say today - 1 day.
However, that bucket containing the collected data as CSVs should become the input for an EMR activity, which will be processing those files and aggregating the information. The problem is that I do not know how to remove or move the already processed files to a different bucket so I do not have to process all the files every day.
To clarify, I am looking for a way to move or remove already processed files in an S3 bucket from a pipeline. Can I do that? Is there any other way I can only process some files in an EMR activity based on a naming convention or something else?
Even better, create a DataPipeline ShellCommandActivity and use the aws command line tools.
Create a script with these two lines:
sudo yum -y upgrade aws-cli
aws s3 rm $1 --recursive
The first line ensures you have the latest aws tools.
The second one removes a directory and all its contents. The $1 is an argument passed to the script.
In your ShellCommandActivity:
"scriptUri": "s3://myBucket/scripts/theScriptAbove.sh",
"scriptArgument": "s3://myBucket/myDirectoryToBeDeleted"
The details on how the aws s3 command works are at:
http://docs.aws.amazon.com/cli/latest/reference/s3/index.html
1) Create a script which takes input path and then deletes the files using hadoop fs -rmr s3path.
2) Upload the script to s3
In emr use the prestep -
1) hadoop fs -copyToLocal s3://scriptname .
2) chmod +x scriptname
3) run script
That pretty much it.
Another approach without using EMR is to install s3cmd tool through ShellCommandActivity in a small EC2 instance, then you can use s3cmd in pipeline to operate your S3 repo in whatever way you want.
A tricky part of this approach is to configure s3cmd through a configuration file safely (basically pass access key and secret), as you can't just ssh into the EC2 instance and use 's3cmd --configure' interactively in a pipeline.
To do that, you create a config file in the ShellCommandActivity using 'cat'. For example:
cat <<EOT >> s3.cfg
blah
blah
blah
EOT
Then use '-c' option to attach the config file every time you call s3cmd like this:
s3cmd -c s3.cfg ls
Sounds complicated, but works.
I have a gzipped file on a local machine and want to load it to Redshift.
My command looks like this:
\COPY tablename FROM 's3://redshift.manifests/copy_from_yb01_urlinfo.txt' REGION 'us-east-1' CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...' SSH GZIP;
But I get a message "s3:/redshift.manifests/copy_from_yb01_urlinfo.txt: No such file or directory".
But this file even public: https://s3.amazonaws.com/redshift.manifests/copy_from_yb01_urlinfo.txt.
Moreover, the user whose credentials I use have a full access to S3 and Redshift: http://c2n.me/iEnI5l.png
And even more weird is the fact that I could perfectly access that file with same credentials from AWS CLI:
> aws s3 ls redshift.manifests
2014-08-01 19:32:13 137 copy_from_yb01_urlinfo.txt
How to diagnose that further?
Just in case, I connect to my Redshift cluster via psql (PostgreSQL cli):
PAGER=more LANG=C psql -h ....us-east-1.redshift.amazonaws.com -p 5439 -U ... -d ...
edit:
Uploaded file to S3 - same error on COPY...
And again I uploaded it and ran COPY with same credentials.
\COPY url_info FROM 's3://redshift-datafiles/url_info_1.copy.gz' CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...' GZIP;
I am going to despair...
Since you are trying to copy to RedShift using a manifest file, you need to use the MANIFEST command at the end like :
\COPY tablename FROM 's3://redshift.manifests/copy_from_yb01_urlinfo.txt' REGION 'us-east-1' CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...' SSH GZIP MANIFEST;
Oh.
The fix was to remove backslash in the beginning of the command.
Can't remember why I started writing it... Actually I already began writing it when I exported data from local PostgreSQL installation.
This is so stupid) One small rubber duck could have saved me a day or two.
I am trying to resgister an amazon image, and I keep getting the error Client.null: null.
I am able to browse to the URL and see the xml file.
The command I execute is:
ec2-register output.raw.manifest.xml -U <URL>
Client.null: null
any idea what could be the problem?
Thanks!
Keep in mind that this command is used to register instance store images rather than EBS back images.
Usually the xml file with a series of 10GB files are uploaded to S3 prior to registering the AMI. Are you sure the bundle is in one of your S3 buckets?
Did you run something like this from the instance you want to create the image from?:
ec2-bundle-vol -d /<someplace-where-you-have-a-lot-of-space> -k YOUR_PRIVATE_KEY -c YOUR_CERTIFICATE -u YOUR_ACCOUNT_NUMBER
ec2-upload-bundle -b YOUR_BUCKET_NAME -m output.raw.manifest.xml -a YOUR_ACCESS_KEY -s YOUR_SECRET_KEY
Then you can run:
ec2-register output.raw.manifest.xml
You can also register your image from the AWS console once you have created the bundle like shown here:
There are several blogs that talk about how to do this too. For example:
http://www.ryannitz.org/tech-notes/2009/08/09/create-amazon-ec2-ami/
Finally, if you are registering and EBS backed AMI you can just simply use:
ec2-create-image <instance id>