Artifacts Migration from GCP non-China to AWS China region - amazon-web-services

I need to transfer my artifacts (zips and container images) stored in GCP us-west1 region on Cloud Storage and container registry to AWS China region S3 bucket and ECR.
Solution I found shows transfer of data from AWS non-China Account to AWS China account.
My question is :
Can I directly transfer artifacts using above solution (in link) from GCP to AWS (China) or do I have to transfer artifacts from GCP -> AWS (non-China) -> AWS (China)
Can the solution in link be implemented for any cloud or this is valid only for AWS ?

The gsutil tool on GCP could be used to perform this transfer, given that it supports transferring between cloud providers. According to the documentation, it should support any cloud provider storage service which uses HMAC authentication. After adding your credentials, you should be able to transfer files on Cloud Storage to AWS S3 using any available gsutil command combined with available wildcards. For example, this command should transfer every object from a GCP bucket to a S3 bucket:
gsutil cp gs://GCS_BUCKET_NAME/** s3://S3_BUCKET_NAME
While this should work for AWS buckets, I have not tested it with AWS buckets located in China. In case it does not work, you should first transfer the objects to a non-China AWS bucket, and then use the guide you have to move them to the China region.
The process would not be much different when dealing with container images, as they are also stored in an automatically created Cloud Storage bucket. You need to review the permissions you have over this bucket, in case you run into permission errors. Otherwise, you can pull images from Container Registry into a local directory, and use the gsutil tool to transfer them into your S3 bucket:
gsutil cp <source_dir> s3://S3_BUCKET_NAME

Related

Using aws dms to sync 2 s3 buckets cross account

I want to use AWS DMS to sync an S3 bucket in one account to another S3 bucket that belongs to another account.
Can I do set it up to do it automatically?
I tried to look in the documentation but didn't find any explanation about syncing real-time S3 to S3 cross account.
S3 Replication does what you need, without needing to use DMS
Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets. Buckets that are configured for object replication can be owned by the same AWS account or by different accounts.
[ref]
Specifically, see the documentation on Configuring replication when source and destination buckets are owned by different accounts

S3 Objects created using UNLOAD command from the Redshift cluster are not accessible to the bucket owner user of the AWS account

I have one AWS S# and Redshift question:
A company uses two AWS accounts for accessing various AWS services. The analytics team has just configured an Amazon S3 bucket in AWS account A for writing data from the Amazon Redshift cluster provisioned in AWS account B. The team has noticed that the files created in the S3 bucket using UNLOAD command from the Redshift cluster are not accessible to the bucket owner user of the AWS account A that created the S3 bucket.
What could be the reason for this denial of permission for resources belonging to the same AWS account?
I tried to reproduce the scenario for the question, but I can't.
I don't get the S3 Object Ownership and Bucket Ownership.
You are not the only person confused by Amazon S3 object ownership. When writing files from one AWS Account to a bucket owned by a different AWS Account, is possible for the 'ownership' of objects to remain with the 'sending' account. This causes all types of problems.
Fortunately, AWS has introduced a new feature into S3 called Edit Object Ownership that overrides all these issues:
By setting "ACLs disabled" for an S3 Bucket, objects will always be owned by the AWS Account that owns the bucket.
So, you should configure this option for the S3 bucket in your AWS account B and it should all work nicely.
The problem is that the bucket owner in account A does not have access to files that were uploaded by the account B, usually that is solved by specifying acl parameter when uploading files --acl bucket-owner-full-control. Since the upload is done via Redshift you need to tell Redshift to assume a role in the account A for UNLOAD command so files don't change the ownership and continue to belong to account A. Check the following page for more examples on configuring cross account LOAD/UNLOAD https://aws.amazon.com/premiumsupport/knowledge-center/redshift-s3-cross-account/

How to copy an aws private bucket to azure storage

I want to copy a folder with large files in it to azure storage.
I found this article that shows how to copy a public aws bucket to azure: https://microsoft.github.io/AzureTipsAndTricks/blog/tip220.html
But how can I do this, if the aws bucket is private? How can I pass the credentials to azcopy for it to copy my files from aws bucket to azure directly?
From Copy data from Amazon S3 to Azure Storage by using AzCopy | Microsoft Docs:
AzCopy is a command-line utility that you can use to copy blobs or files to or from a storage account. This article helps you copy objects, directories, and buckets from Amazon Web Services (AWS) S3 to Azure blob storage by using AzCopy.
The article explains how to provide both Azure and AWS credentials.
One thing to note is that Amazon S3 cannot 'send' data to Azure Blob Storage. Therefore, something will need to call GetObject() on S3 to retrieve the data, and then send it to Azure. I'm assuming that Azure Blob Storage cannot directly request data from Amazon S3, so it means that the data will be 'downloaded' from S3, then 'uploaded' to Azure. To improve efficiency, run the AzCopy command either from an AWS or an Azure virtual machine, to reduce the latency of sending via your own computer.
One solution, albeit not an ideal one is that you could request an AWS Snowball with your bucket data on it, then use Azure Import/Export service to send the Snowball to Azure for ingestion.
Have you tried generating a pre-signed url with limited ttl on it for the duration of the copy?
https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html
that way you can just execute azcopy <aws_presigned_url> <azure_sas_url>
the url contains the information you need for authn/authz on both sides.

How to use Spark to read data from one AWS account and write to another AWS account?

I have spark jobs running on a EKS cluster to ingest AWS logs from S3 buckets.
Now I have to ingest logs from another AWS account. I have managed to use the below setting to successfully read in data from cross account with hadoop AssumedRoleCredentialProvider.
But how do I save the dataframe back to my own AWS account S3? It seems no way to set the Hadoop S3 config back to my own AWS account.
spark.sparkContext.hadoopConfiguration.set("fs.s3a.assumed.role.external.id","****")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.assumed.role.credentials.provider","com.amazonaws.auth.InstanceProfileCredentialsProvider")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.assumed.role.arn","****")
val data = spark.read.json("s3a://cross-account-log-location")
data.count
//change back to InstanceProfileCredentialsProvider not working
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider","com.amazonaws.auth.InstanceProfileCredentialsProvider")
data.write.parquet("s3a://bucket-in-my-own-aws-account")
As per the hadoop documentation different S3 buckets can be accessed with different S3A client configurations, having a per bucket configuration including the bucket name.
Eg: fs.s3a.bucket.<bucket name>.access.key
Check the below URL: http://hadoop.apache.org/docs/r2.8.0/hadoop-aws/tools/hadoop-aws/index.html#Configurations_different_S3_buckets

Sync AWS Commercial S3 bucket to AWS GovCloud

I need to sync file from a S3 bucket that resides in a Commercial/Open market region (such as us-east-1) into a S3 bucket on GovCloud. I can get sync to work if I'm trying to sync between none GovCloud regions us-east-1 to us-west-1. Does anyone know how to perform this kind of sync with GovCloud? Also, this sync needs to occur unattended and on a scheduled basis so a manual UI/tool will not suffice. It must be something that can be scripted such as from an SDK or CLI.
Thanks in advance.