Can't make COPY from remote host to Redshift work - amazon-web-services

I have a gzipped file on a local machine and want to load it to Redshift.
My command looks like this:
\COPY tablename FROM 's3://redshift.manifests/copy_from_yb01_urlinfo.txt' REGION 'us-east-1' CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...' SSH GZIP;
But I get a message "s3:/redshift.manifests/copy_from_yb01_urlinfo.txt: No such file or directory".
But this file even public: https://s3.amazonaws.com/redshift.manifests/copy_from_yb01_urlinfo.txt.
Moreover, the user whose credentials I use have a full access to S3 and Redshift: http://c2n.me/iEnI5l.png
And even more weird is the fact that I could perfectly access that file with same credentials from AWS CLI:
> aws s3 ls redshift.manifests
2014-08-01 19:32:13 137 copy_from_yb01_urlinfo.txt
How to diagnose that further?
Just in case, I connect to my Redshift cluster via psql (PostgreSQL cli):
PAGER=more LANG=C psql -h ....us-east-1.redshift.amazonaws.com -p 5439 -U ... -d ...
edit:
Uploaded file to S3 - same error on COPY...
And again I uploaded it and ran COPY with same credentials.
\COPY url_info FROM 's3://redshift-datafiles/url_info_1.copy.gz' CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...' GZIP;
I am going to despair...

Since you are trying to copy to RedShift using a manifest file, you need to use the MANIFEST command at the end like :
\COPY tablename FROM 's3://redshift.manifests/copy_from_yb01_urlinfo.txt' REGION 'us-east-1' CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...' SSH GZIP MANIFEST;

Oh.
The fix was to remove backslash in the beginning of the command.
Can't remember why I started writing it... Actually I already began writing it when I exported data from local PostgreSQL installation.
This is so stupid) One small rubber duck could have saved me a day or two.

Related

Unable to locate credentials aws cli

I am using AWS CLI version 2. I am using centos > Nginx > php 7.1, Following command works fine when I directly run on command line.
aws s3 cp files/abc.pdf s3://bucketname/
but when I run same command from index.php file using following code
echo exec("aws s3 cp files/abc.pdf s3://bucketname/ 2>&1");
then it gives error
upload failed: Unable to locate credentials
#Jass Add your credentials in "~/.aws/credentials" or "~/.aws/config" and make it [default] or else use profile_name incase you have multiple accounts.
Also verify, if you are using keys as Environment variables by export, then it will work for that terminal only. So try to execute the php from same terminal where you exported the keys or add it in ~/.aws/credentials.
I tried this and it worked for me and I believe should work for you as well. In your PHP code (index.php), try exporting the credential file location like below
echo exec("export AWS_SHARED_CREDENTIALS_FILE=/<path_to_aws_folder>/.credentials; aws s3 cp files/abc.pdf s3://bucketname/ 2>&1");
When you run from your command-line the AWS CLI picks up the credentials from your home directory i.e. ~/.aws/credentials (this is default). When the index.php is being executed it is looking for the above file in its home directory which appears is not the same as your home directory and hence cannot find the credentials. With the above change you are explicitly pointing it to your AWS credentials.

How can I dump a database directly to an s3 bucket? [duplicate]

I have got one server in Rackspace and i'm already running a cron job evry day night to process something...(some account related operation- that will send me email every mid night). my application is in groovy on grails. now i want to take mysql database (called myfleet) backup on every mid night and put that file in Amezon S3 . how can i do that? do i need to write any java or groovy file to process that? or is it can be done from Linux box itself? i have already got account in Amezon S3 (bucket name is fleetBucket)
You can also use STDOUT and the AWS CLI tool to pipe the output of your mysqldump straight to S3:
mysqldump -h [db_hostname] -u [db_user] -p[db_passwd] [databasename] | aws s3 cp - s3://[s3_bucketname]/[mysqldump_filename]
For example:
mysqldump -h localhost -u db_user -ppassword test-database | aws s3 cp - s3://database-mysqldump-bucket/test-database-dump.sql
The mysqldump command outputs to STDOUT by default. Using - as the input argument for aws s3 cp tells the AWS CLI tool to use STDIN for the input.
mysqldump --host=$HOST --user=$USER --password=$PASSWORD $DB_NAME --routines --single-transaction | gzip -9 | aws s3 cp - s3://bucket/database/filename.sql.gz
will directly store file to s3.
Should be pretty straightforward:
- backup your database using mysqldump
mysqldump -u [uname] -p[pass] myfleet | gzip -9 > myfleet.sql.gz
- upload your dump file to S3 using a command line client (e.g. http://s3tools.org/s3cmd:
s3cmd put myfleet.sql.gz s3://<bucketname>/myfleet.sql.gz
Just add this to your cron job (you might want to use some kind of numbering scheme for the dump files, in case you want to keep several versions).
If the source DB is on AWS and is of type Aurora.Mysql you can backup directly to S3 with a command like
SELECT * FROM employees INTO OUTFILE S3 's3-us-west-2://aurora-select-into-s3-pdx/sample_employee_data';
See https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html for details.

I am getting exception while copying file from a Linux machine to GCS

I am getting the exception when I am running the below command inside Linux machine.
I am trying to copy one text file from linux machine to Google Cloud Storage bucket.
I have created a service account.
Steps I followed-
export GOOGLE_APPLICATION_CREDENTIALS=/home/test/shubham_test/xyz.json
Here, xyz.json is the key file which was downloaded while creating service account.
gsutil cp test.txt gs://my-bucket/
I was getting the below exception.
ServiceException: 401 Anonymous caller does not have storage.objects.create access to my-bucket/test.txt.
I was going to advise to use gcloud auth but the answer is already present there ;)
Hope this does the trick: Automating gsutil commands

delete s3 files from a pipeline AWS

I would like to ask about a processing task I am trying to complete using a data pipeline in AWS, but I have not been able to get it to work.
Basically, I have 2 data nodes representing 2 MySQL databases, where the data is supposed to be extracted from periodically and placed in an S3 bucket. This copy activity is working fine selecting daily every row that has been added, let's say today - 1 day.
However, that bucket containing the collected data as CSVs should become the input for an EMR activity, which will be processing those files and aggregating the information. The problem is that I do not know how to remove or move the already processed files to a different bucket so I do not have to process all the files every day.
To clarify, I am looking for a way to move or remove already processed files in an S3 bucket from a pipeline. Can I do that? Is there any other way I can only process some files in an EMR activity based on a naming convention or something else?
Even better, create a DataPipeline ShellCommandActivity and use the aws command line tools.
Create a script with these two lines:
sudo yum -y upgrade aws-cli
aws s3 rm $1 --recursive
The first line ensures you have the latest aws tools.
The second one removes a directory and all its contents. The $1 is an argument passed to the script.
In your ShellCommandActivity:
"scriptUri": "s3://myBucket/scripts/theScriptAbove.sh",
"scriptArgument": "s3://myBucket/myDirectoryToBeDeleted"
The details on how the aws s3 command works are at:
http://docs.aws.amazon.com/cli/latest/reference/s3/index.html
1) Create a script which takes input path and then deletes the files using hadoop fs -rmr s3path.
2) Upload the script to s3
In emr use the prestep -
1) hadoop fs -copyToLocal s3://scriptname .
2) chmod +x scriptname
3) run script
That pretty much it.
Another approach without using EMR is to install s3cmd tool through ShellCommandActivity in a small EC2 instance, then you can use s3cmd in pipeline to operate your S3 repo in whatever way you want.
A tricky part of this approach is to configure s3cmd through a configuration file safely (basically pass access key and secret), as you can't just ssh into the EC2 instance and use 's3cmd --configure' interactively in a pipeline.
To do that, you create a config file in the ShellCommandActivity using 'cat'. For example:
cat <<EOT >> s3.cfg
blah
blah
blah
EOT
Then use '-c' option to attach the config file every time you call s3cmd like this:
s3cmd -c s3.cfg ls
Sounds complicated, but works.

ec2-register Client.null: null

I am trying to resgister an amazon image, and I keep getting the error Client.null: null.
I am able to browse to the URL and see the xml file.
The command I execute is:
ec2-register output.raw.manifest.xml -U <URL>
Client.null: null
any idea what could be the problem?
Thanks!
Keep in mind that this command is used to register instance store images rather than EBS back images.
Usually the xml file with a series of 10GB files are uploaded to S3 prior to registering the AMI. Are you sure the bundle is in one of your S3 buckets?
Did you run something like this from the instance you want to create the image from?:
ec2-bundle-vol -d /<someplace-where-you-have-a-lot-of-space> -k YOUR_PRIVATE_KEY -c YOUR_CERTIFICATE -u YOUR_ACCOUNT_NUMBER
ec2-upload-bundle -b YOUR_BUCKET_NAME -m output.raw.manifest.xml -a YOUR_ACCESS_KEY -s YOUR_SECRET_KEY
Then you can run:
ec2-register output.raw.manifest.xml
You can also register your image from the AWS console once you have created the bundle like shown here:
There are several blogs that talk about how to do this too. For example:
http://www.ryannitz.org/tech-notes/2009/08/09/create-amazon-ec2-ami/
Finally, if you are registering and EBS backed AMI you can just simply use:
ec2-create-image <instance id>