Ensure data is not lost during AWS S3 sync - amazon-web-services

I need to copy all contents of an S3 bucket to another S3 bucket. Planning to use s3 sync.
aws s3 sync s3://sourcebucket s3://destinationbucket
After this process, is there any way to verify if all data is migrated to the new bucket? (i.e no data is missed or lost)
Or is there any guarantee that data will not be lost (specified anywhere in official doc?)?

Assuming you want this verification done after the sync is terminated. S3 provides MD5 hash of objects as ETag. You can traverse through your local directory making sure that object does exists in the S3 bucket and integrity can be verified by comparing the local and remote MD5 hashes.
(https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html#RESTObjectGET-responses)

Related

Is there a way to use the CLI to copy data from a presigned URL to my own bucket?

I have a presigned URL for a file in a vendor's S3 bucket. I want to copy that file into my own bucket. I'd rather not copy it to the machine I'm running the copy from. My thought was to use the CLI s3 sync or cp commands to copy the file from one bucket to another. But those commands require s3:// URLs, not https://.
I tried converting the HTTP URL by replacing "https://bucketname.s3.region.amazonaws.com" with "s3://bucketname", but that gives an Access Denied error with s3 sync and a Bad Request with s3 cp. Is there any way to do this, or do I need to download it locally with HTTP, then upload to my bucket with the CLI?
Problem here is that you need to authenticate into two different accounts, the source to read and the destination to write. If you had access to both, i.e. the credentials you use to read could also write to your own bucket, you would be able to bypass the middle-man.
That's not the case here, so your best bet is to download it first, then authenticate with your own account and put the object there.
Amazon S3 has an in-built CopyObject command that can read from an S3 bucket and write to an S3 bucket without needing to download the data. To use this command, you require credentials that have GetObject permission on the source bucket and PutObject permissions on the destination bucket. The credentials themselves can be issued by either the AWS Account having the source bucket or the AWS Account having the destination bucket. Thus, you would need to work with the account admins who control the 'other' AWS Account.
If this is too difficult and your only way of accessing the source object is via a pre-signed URL, then you cannot use the CopyObject command. Instead, you would need to download the source file and then separately upload it to Amazon S3.

Copying data from one s3 bucket to another s3 bucket of different account in fast manner, just using access_id, secret_access_key cred of both

I have access_key, access_id for both of the aws bucket belong to a different account. I have to copy data from one location to another, is there a way to do it faster.
I have tried map-reduced-based distcp that does not provide satisfactory performance.
The best way to copy data between Amazon S3 buckets in different accounts is to use a single set of credentials that has permission to read from the source bucket and write to the destination bucket.
You can then use these credentials with the CopyObject() command, which will copy the object between the S3 buckets without the need to download and upload the objects. The copy will be fully managed by the Amazon S3 service, even if the buckets are in different accounts and even different regions. The copy will not involve transferring any data to/from your own computer.
If you use the AWS CLI aws s3 cp --recusive or aws s3 sync commands, the copies will be performed in parallel, making very fast copies of the objects.
There are two ways to perform a copy:
Push
Use a set of credentials from the Source account that has permission to read from the source bucket
Add a Bucket Policy on the destination bucket that permits Write access for these credentials
When performing the copy, use ACL=bucket-owner-full-control to assign ownership of the object to the destination account
OR
Pull
Use a set of credentials from the Destination account that has permission to write to the destination bucket
Add a Bucket Policy on the source bucket that permits Read access for these credentials
(No ACL is required because "pulling" the file will automatically give ownership to the account issuing the command)

AWS S3 SSE : s3 managed keys encrypted object upload using python

I am simply trying to upload encrypted object to S3 bucket.
I have gone through the AWS documentation on SSE.
Most confusing part is I am not clear on :
1. If we need to set default server side encryption option to AES256(I am assuming it is S3 managed key) for bucket before uploading object to s3
or
2. we can directly upload to s3 bucket without having any server side encryption option set for that bucket?
Assuming second point true, I have tried to upload object on S3 specifying extra arguments:
s3_con.upload_file('abc.txt','s3_key_path/abc.txt',ExtraArgs={"ServerSideEncryption": "AES256"})
I was able to upload file using above code line but the file was not encrypted.
So I guess I need to try first point before uploading to bucket.
How can I upload encrypted object using server side encryption using S3 managed key in python and what steps I need to do for this?
The file is encrypted. Look at the Properties > Encryption tab in the AWS console for that S3 object.
You can see the contents because SSE-S3 (AES-256) is transparent at-rest encryption. S3 encrypts the object as it's written to disk, and decrypts it as it's read from disk. Because you have permission to get the object, that process is transparent to you.
You also have other encryption options including KMS managed keys, your own managed keys, and doing client-side encryption prior to sending to S3.

Atomicity with AWS S3 Writes and Reads

I have a SFTP server that use AWS S3 for storage. There are folders on the SFTP server that are mapped to corresponding S3 Objects. Multiple applications write files to the SFTP Server/S3 at unknown intervals and there is a process that periodically reads the S3 objects. In a way, S3 is being used as a queue.
Lets say I start uploading a file/writing to a S3 Object. If there is an HTTP GET or READ request on S3Object while the write is in progress, will you see the partial object?
I performed a test where I was uploading a large file to SFTP server and ran aws s3 ls command and I could see the uploaded file listed although the upload was not completed.
I know S3 has read-after-write consistency with HTTP PUT methods but how does this work in general?

Import data from URL to Amazon S3

I have a file with a pre-signed URL.
I would like to upload that file directly to my S3 bucket without donwloading it first (I know how to do it with the intermediate step but I want to prevent it).
Any suggestion?
Thanks in advance
There is not a method supported by S3 that will accomplish what you are trying to do.
S3 does not support a request type that says, essentially, "go to this url and whatever you fetch from there, save it into my bucket under the following key."
The only option here is to fetch what you want, and then upload it. If the objects are large, and you don't want to dedicate the necessary disk space, you could fetch it in parts from the origin and upload it in parts using multipart upload... or if you are trying to save bandwidth somewhere, even the very small t1.micro instance located in the same region as the S3 bucket will likely give you very acceptable performance for doing the fetch and upload operation.
The single exception to this is where you are copying an object from S3, to S3, and the object is under 5 GB in size. In this case, you send a PUT request to the target bucket, accompanied by:
x-amz-copy-source: /source_bucket/source_object_key
That's not quite a "URL" and I assume you do not mean copying from bucket to bucket where you own both buckets, or you would have asked this more directly... but this is the only thing S3 has that resembles the behavior you are lookng for at all.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
You can't use the signed URL here... the credentials you use to send the PUT request have to have permission to both fetch and store.