How to copy an aws private bucket to azure storage - amazon-web-services

I want to copy a folder with large files in it to azure storage.
I found this article that shows how to copy a public aws bucket to azure: https://microsoft.github.io/AzureTipsAndTricks/blog/tip220.html
But how can I do this, if the aws bucket is private? How can I pass the credentials to azcopy for it to copy my files from aws bucket to azure directly?

From Copy data from Amazon S3 to Azure Storage by using AzCopy | Microsoft Docs:
AzCopy is a command-line utility that you can use to copy blobs or files to or from a storage account. This article helps you copy objects, directories, and buckets from Amazon Web Services (AWS) S3 to Azure blob storage by using AzCopy.
The article explains how to provide both Azure and AWS credentials.
One thing to note is that Amazon S3 cannot 'send' data to Azure Blob Storage. Therefore, something will need to call GetObject() on S3 to retrieve the data, and then send it to Azure. I'm assuming that Azure Blob Storage cannot directly request data from Amazon S3, so it means that the data will be 'downloaded' from S3, then 'uploaded' to Azure. To improve efficiency, run the AzCopy command either from an AWS or an Azure virtual machine, to reduce the latency of sending via your own computer.

One solution, albeit not an ideal one is that you could request an AWS Snowball with your bucket data on it, then use Azure Import/Export service to send the Snowball to Azure for ingestion.

Have you tried generating a pre-signed url with limited ttl on it for the duration of the copy?
https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html
that way you can just execute azcopy <aws_presigned_url> <azure_sas_url>
the url contains the information you need for authn/authz on both sides.

Related

Artifacts Migration from GCP non-China to AWS China region

I need to transfer my artifacts (zips and container images) stored in GCP us-west1 region on Cloud Storage and container registry to AWS China region S3 bucket and ECR.
Solution I found shows transfer of data from AWS non-China Account to AWS China account.
My question is :
Can I directly transfer artifacts using above solution (in link) from GCP to AWS (China) or do I have to transfer artifacts from GCP -> AWS (non-China) -> AWS (China)
Can the solution in link be implemented for any cloud or this is valid only for AWS ?
The gsutil tool on GCP could be used to perform this transfer, given that it supports transferring between cloud providers. According to the documentation, it should support any cloud provider storage service which uses HMAC authentication. After adding your credentials, you should be able to transfer files on Cloud Storage to AWS S3 using any available gsutil command combined with available wildcards. For example, this command should transfer every object from a GCP bucket to a S3 bucket:
gsutil cp gs://GCS_BUCKET_NAME/** s3://S3_BUCKET_NAME
While this should work for AWS buckets, I have not tested it with AWS buckets located in China. In case it does not work, you should first transfer the objects to a non-China AWS bucket, and then use the guide you have to move them to the China region.
The process would not be much different when dealing with container images, as they are also stored in an automatically created Cloud Storage bucket. You need to review the permissions you have over this bucket, in case you run into permission errors. Otherwise, you can pull images from Container Registry into a local directory, and use the gsutil tool to transfer them into your S3 bucket:
gsutil cp <source_dir> s3://S3_BUCKET_NAME

When is a file available to download from Amazon S3?

I can't find some information about Amazon S3, hope you will help me. When is a file available for user to download, after the POST upload? I mean some small JSON file that doesn't require much processing. Is it available to download immediately after uploading? Or maybe amazon s3 works in some sessions and it always takes a few hours?
According to the doc,
Amazon S3 provides strong read-after-write consistency for PUTs and DELETEs of objects in your Amazon S3 bucket in all AWS Regions.
This means that your objects are available to download immediately after it's uploaded.
An object that is uploaded to an Amazon S3 bucket is available right away. There is no time period that you have to wait. That means if you are writing a client app that uses these objects, you can access them as soon as they are uploaded.
In case anyone is wondering how to programmatically interact with objects located in an Amazon S3 bucket through code, here is an example of uploading and reading objects in an Amazon S3 bucket from a client web app....
Creating an example AWS photo analyzer application using the AWS SDK for Java

Ingest files from GCP Cloud Storage to AWS S3 bucket dynamically

I am working on a pet project based on multi-cloud (AWS and GCP) which is based on serverless architecture.
Now there are files generated by the business logic within GCP (using Cloud Functions and Pub/Sub) and they are stored in GCP Cloud storage. I want to ingest these files dynamically to AWS S3 bucket from the Cloud Storage.
One possible way is by using the gsutil library (Exporting data from Google Cloud Storage to Amazon S3) but this would require a compute instance, and run the gsutil commands manually which I want to avoid.
In answering this I'm reminded a bit of a Rube Goldberg type setup but I don't think this is too bad.
From the Google side you would create a Cloud Function that is notified when a new file is created. You would use the Object Finalize event. This function would get the information about the file and then call an AWS Lambda fronted by AWS API Gateway.
The GCP Function would pass the bucket and file information to the AWS Lambda. On the AWS side you would have your GCP credentials and the GCP API download the file and upload it to S3.
Something like:
All serverless on both GCP and AWS. Testing isn't bad as you can keep them separate - make sure that GCP is sending what you want and make sure that AWS is parsing and doing the correct thing. There is likely some authentication that needs to happen from the GCP cloud function to API gateway. Additionally, the API gateway can be eliminated if you're ok pulling AWS client libraries into the GCP function. Since you've got to pull GCP libraries into the AWS Lambda this shouldn't be much of a problem.

AWS S3 copy object downloads it locally?

I am interested to know whether, when copying an S3 object from 1 bucket to another, the object gets downloaded to the client, even temporarily?
I am using AWS javascript SDK: s3.copyObject(...)
Thanks
They are not downloaded localy. They are executed on the AWS side.
They are free within the same region:
Transfers between S3 buckets or from Amazon S3 to any service(s) within the same AWS Region are free.
Recent AWS blog post explains copying between buckets:
How can I copy objects between Amazon S3 buckets?

AWS Amplify Storage module - Angular 6

I'm in the process of developing a web application (using Angular6) that uses aws amplify.
The storage module provided by amplify lets you store your files in three protective levels (public, protected & private). I have a requirement to process an uploaded file via a lambda function.
My questions is whether the s3 buckets (and 'folders') created via the amplify available to Lambda functions (as the buckets are encrypted to use only via the app)??
would changing CORS on the S3 bucket do the trick?. Any help appreciated.
An S3 bucket that is created by Amplify CLI is like any other S3 bucket. You can access it provided that appropriate permissions are in place. https://docs.aws.amazon.com/AmazonS3/latest/user-guide/set-permissions.html