gsutil - issue with cp, rsync when using federated user AWS keys - amazon-web-services

I'm attempting a simple rsync (or cp) from AWS S3 to GCP Storage.
For e.g.
gsutil rsync -d -r -n s3://mycustomer-src gs://mycustomer-target
I get an error message as below when attempting this on a VM on GCP.
Note that if I install aws cli on the VM, then I can access / browse AWS S3 contents just fine. The AWS credentials are stored in ~/.aws/credentials file.
Building synchronization state...
Caught non-retryable exception while listing s3://musiclab-etl-dev/: AccessDeniedException: 403 InvalidAccessKeyId
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>ASIAJ3XGCQ7RGZYPD5UA</AWSAccessKeyId><RequestId>CE8919045C68DEC4</RequestId><HostId>i7oMBM61US3FyePJka8O+rjoHSo1rIZbRGnVZvIGkjEVPh6lXdbp03pZOtJ68F3pPdAAW1UvF5s=</HostId></Error>
CommandException: Caught non-retryable exception - aborting rsync
Is this a bug in gsutil ? Any workarounds or tips appreciated.
NOTE - The client's AWS account is setup for federated access and requires using AWS keys as obtained using a script similar to this-
https://aws.amazon.com/blogs/security/how-to-implement-a-general-solution-for-federated-apicli-access-using-saml-2-0/
The AWS keys are set to expire when the session token expires.
If I use a different AWS account (no federation) with typical AWS keys (non-expiring), the rsync (or cp) works fine.

It appears that gsutil still uses the legacy AWS_SECURITY_TOKEN instead of AWS_SESSION_TOKEN. If your script doesn't set it up automatically, you can do it manually like this:
export AWS_SECURITY_TOKEN=$AWS_SESSION_TOKEN
After this you should be able to use gsutil normally.

Related

gsutil rsync with s3 buckets gives InvalidAccessKeyId error

I am trying to copy all the data from an AWS S3 bucket to a GCS bucket. Acc. to this answer rsync command should have been able to do that. But I am receiving the following error when trying to do that
Caught non-retryable exception while listing s3://my-s3-source/: AccessDeniedException: 403 InvalidAccessKeyId
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>{REDACTED}</AWSAccessKeyId><RequestId>{REDACTED}</RequestId><HostId>{REDACTED}</HostId></Error>
CommandException: Caught non-retryable exception - aborting rsync
This is the command I am trying to run
gsutil -m rsync -r s3://my-s3-source gs://my-gcs-destination
I have the AWS CLI installed which is working fine with the same AccessKeyId and listing buckets as well as objects in the bucket.
Any idea what am I doing wrong here?
gsutil can work with both Google Storage and S3.
gsutil rsync -d -r s3://my-aws-bucket gs://example-bucket
You just need to configure it with both - Google and your AWS S3 credentials. For GCP you need to add the Amazon S3 credentials to ~/.aws/credentials or you can also store your AWS credentials in the .boto configuration file for gsutil. However, when you're accessing an Amazon S3 bucket with gsutil, the Boto library uses your ~/.aws/credentials file to override other credentials, such as any that are stored in ~/.boto.
=== 1st update ===
Also make sure you have to make sure you have the correct IAM permissions on the GCP side and the correct AWS IAM credentials. Also depending if you have a prior version of Migrate for Compute Engine (formerly Velostrata) use this documentation and make sure you set up the VPN, IAM credentials and AWS network. If you are using the current version (5.0), use the following documentation to check everything is configured correctly.

How to re-format cURL command to AWS CLI s3 sync call

I have a curl command:
curl -C - "https://fastmri-dataset.s3.amazonaws.com/knee_singlecoil_train.tar.gz?AWSAccessKeyId=<my-key>&Signature=<my-signature>&Expires=1634085391
I'm having trouble using AWS CLI sync, I'm doing this:
aws s3 sync . s3://fastmri-dataset/knee_singlecoil_train.tar.gz
I have the aws configure file setup with the access key, I set the secret up with the signature. I didn't otherwise get a secret.
Any help is appreciated.
Your first link is an Amazon S3 pre-signed URLs, which is a time-limited URL that provides temporary access to a private object. It can be accessed via an HTTP/S call.
The AWS CLI command you have shown instructs the AWS CLI to synchronize the local directory with a file on S3. (This is actually incorrect, since you cannot sync a directory to a file.)
These two commands are incompatible. The AWS CLI cannot use a pre-signed URL. It requires AWS credentials with an Access Key and Secret Key to make AWS API calls.
So, you cannot reformat the curl command to an AWS CLI command.

AWS CLI: Could not connect to the endpoint URL

Was able to set up a pull from an S3 bucket on a Mac seamlessly, but have been struggling with an identical process on a PC (Windows). Here is what I have done -- any help along the way would be much appreciated.
Installed awscli using pip
Ran aws configure in the command prompt and inputed the proper access key id and secret access key.
Ran the s3 code: G:\>aws s3 cp --recursive s3://url-index-given/ . (where the url was replaced with url-index-given for example purposes).
And got this error:
fatal error: Could not connect to the endpoint URL: "https://url-index-given.s3.None.amazonaws.com/?list-type=2&prefix=&encoding-type=url"
I have tried uninstalling the awscli package and followed this process recommended by Amazon without any errors.
The error indicates have you have given an invalid value for Region when using aws configure. (See the None in the URL? That is where the Region normally goes.)
You should run aws configure again and give it a valid region (eg us-west-2).

How to configure aws CLI to s3 cp with anonymous user

I need to download files recursively from a s3 bucket. The s3 bucket lets anonymous access.
How to list files and download them without providing AWS Access Key using an anonymous user?
My command is:
aws s3 cp s3://anonymous#big-data-benchmark/pavlo/text/tiny/rankings/uservisits uservisit --region us-east --recursive
The aws compains that:
Unable to locate credentials. You can configure credentials by running "aws configure"
You can use no-sign-request option
aws s3 cp s3://anonymous#big-data-benchmark/pavlo/text/tiny/rankings/uservisits uservisit --region us-east --recursive --no-sign-request
you probably have to provide an access keys and secret key, even if you're doing anonymous access. don't see an option for anonymous for the AWS cli.
another way to do this, it to hit the http endpoint and grab the files that way.
In your case: http://big-data-benchmark.s3.amazonaws.com
You will get and XML listing all the keys in the bucket. You can extract the keys and issues requests for each. Not the fastest thing out there but it will get the job done.
For example: http://big-data-benchmark.s3.amazonaws.com/pavlo/sequence-snappy/5nodes/crawl/000741_0
for getting the files curl should be enough. for parsing the xml depending on what you like you can go as lo-level as sed and as high-level as a proper language.
hope this helps.

Amazon S3 sync to local machine failed

I'm new to AWS and I'm trying to download a bunch of files from my S3 bucket to my local machine using aws s3 sync as described in http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html.
I used the following command:
aws s3 sync s3://outputbucket/files/ .
I got the following error:
A client error (AccessDenied) occurred when calling the ListObjects operation: Access Denied
Completed 1 part(s) with ... file(s) remaining
Even though I have configured my access key ID & secret access key as described in http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html
Where might the problem be?
Assuming that you are an Administrator and/or you have set your credentials properly, it is possible that you are using an old AWS CLI.
I encountered this while using the packaged AWS CLI with Ubuntu 14.04.
The solution that worked for me is to remove the AWS CLI prepackaged with Ubuntu, and download it from python-pip instead:
sudo apt-get remove awscli
sudo apt-get install python-pip
sudo pip install awscli
Many thanks to this link:
https://forums.aws.amazon.com/thread.jspa?threadID=173124
To perform a file sync, two sets of permissions are required:
ListObjects to obtain a list of files to copy
GetObjects to access the objects
If you are using your "root" user that comes with your AWS account, you will automatically have these permissions.
If you are using a user created within Identity and Access Management (IAM), you will need to assign these permissions to the User. The easiest way is to assign the AmazonS3FullAccess policy, which gives access to all S3 functions.
In my case the credentials stored in ~/.aws/config were being clobbered by a competing profile sourced in ~/.zshrc. Run env | grep AWS to check.