Can I run S3 Batch copy operation job from source account - amazon-web-services

I am trying to run Batch Copy operation job to copy large amount of data from one s3 bucket to another.
Source Account: contains s3 bucket with objects.
Destination Account: contains s3 bucket with manifest, and destination s3 bucket for objects.
I need to run the Batch operation job in source account or a third account altogether.
So far, I am able to succeed in the following:
Run s3 batch job within same aws account https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-managing-jobs.html
Run s3 batch job from destination s3 bucket https://aws.amazon.com/blogs/storage/cross-account-bulk-transfer-of-files-using-amazon-s3-batch-operations/
However when I try to create a batch job at the source account, I get errors.
when I enter manifest file from destination account, I get error:
Unable to get the manifest object’s ETag. Specify a different object to continue.
when I enter the destination s3 bucket from destination account, I get error:
Insufficient permissions to access <s3 bucket>
Is there a way to change configurations to enable running batch job from source account?

Each Amazon S3 Batch Operation job is associated with an IAM Role.
The IAM Role would need permission to access the S3 bucket in the other AWS Account (or permission to access any S3 bucket).
In addition, the Destination Bucket (in the other AWS Account) will also need a Bucket Policy that permits that IAM Role to access the bucket (at minimum GetObject).

Related

is it possible to copy s3 bucket content from one bucket to another account s3 bucket without using bucket policy?

I want to copy the S3 bucket object to a different account, but the requirement can't use the Bucket policy,
then is it possible to copy content from one bucket to another without using the bucket policy?
You cannot use native S3 object replication between different accounts without using a bucket policy. As stated in the permissions documentation:
When the source and destination buckets aren't owned by the same accounts, the owner of the destination bucket must also add a bucket policy to grant the owner of the source bucket permissions to perform replication actions
You could write a custom application that uses IAM roles to replicate objects, but this will likely be quite involved as you'll need to track the state of the bucket and all of the objects written to it.
install AWS CLI,
run AWS configure set source bucket credentials as default and,
visit https://github.com/Shi191099/S3-Copy-old-data-without-Policy.git

Unable to configure SageMaker execution Role with access to S3 bucket in another AWS account

Requirement: Create SakeMaker GroundTruth labeling job with input/output location pointing to S3 bucket in another AWS account
High Level Steps Followed: Lets say, Account_A: SageMaker GroundTruth labeling job and Account_B: S3 bucket
Create role AmazonSageMaker-ExecutionRole in Account_A with 3 policies attached:
AmazonSageMakerFullAccess
Account_B_S3_AccessPolicy: Policy with necessary S3 permissions to access S3 bucket in Account_B
AssumeRolePolicy: Assume role policy for arn:aws:iam::Account_B:role/Cross-Account-S3-Access-Role
Create role Cross-Account-S3-Access-Role in Account_B with 1 policy and 1 trust relationship attached:
S3_AccessPolicy: Policy with necessary S3 permissions to access S3 bucket in the this Account_B
TrustRelationship: For principal arn:aws:iam::Account_A:role/AmazonSageMaker-ExecutionRole
Error: While trying to create SakeMaker GroundTruth labeling job with IAM role as AmazonSageMaker-ExecutionRole, it throws error AccessDenied: Access Denied - The S3 bucket 'Account_B_S3_bucket_name' you entered in Input dataset location cannot be reached. Either the bucket does not exist, or you do not have permission to access it. If the bucket does not exist, update Input dataset location with a new S3 URI. If the bucket exists, give the IAM entity you are using to create this labeling job permission to read and write to this S3 bucket, and try your request again.
In your high level step 2, the approach should change to using a Resource Policy on your S3 bucket that allows account A to write to it. Rather than expecting Account A to assume a role in Account B, which I don't believe Sagemeker will do. Therefore the general approach is to do the following:
Account A Sagemaker service is given has a iam policy with a that allows access to Account B bucket. (Basically what you've done).
Account B bucket is given a resource policy that allows Account A to access it.
The following article gives additional help on this topic: How can I provide cross-account access to objects that are in Amazon S3 buckets?
Reverted back to original approach where access to the SageMaker execution role was provided through direct S3 bucket policy.
While creating the GT job from console:
(i) Expects the user creating the job also to have access to the data in cross account S3 bucket; Updated bucket policy to have access for both SageMaker execution role as well as user
(ii) Expects the manifest in own account's S3 bucket; Fails with 403 if manifest is in cross account S3 bucket even though SageMaker execution role had access to the cross account S3 bucket
While creating the GT job from CLI: Above restrictions doesn't apply and was able to create the GT job.

I want my lambda code to directly upload files into an Amazon S3 bucket of a different account

So I have a lambda function that triggers an Amazon SageMaker processing job and this job currently writes a few files to my Amazon S3 bucket. I have mentioned my output_uri ='s3://outputbucket-in-my-acc/' Now I want the same files to be directly uploaded to a different AWS account and not in my account. How do i achieve this? I want no traces of the file to be stored in my account.
I found a similar solution here but this copies the file into the different account while the original files are still present in the source account:
AWS Lambda put data to cross account s3 bucket
Your Lambda Function (Account A) needs to assume a role in the other account (Account B) which has the permissions to write to the s3 location. For that you need to establish trust between the accounts with a cross account role.
Afterwards you assume the role in Account B from your Lambda function's code and execute the S3 command.
Find an example with boto3 here: https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-assume-iam-role/
The SageMaker APIs for job creation typically (always?) include a RoleARN which will be the IAM role that SageMaker assumes to do work on your behalf. That IAM role must have the necessary permissions so that Amazon SageMaker can successfully complete its task (e.g. have PutObject permission to the relevant S3 bucket) and must have the necessary trust policy allowing the SageMaker service (sagemaker.amazonaws.com) to assume the role.

How do I copy S3 objects from one AWS account to another? [duplicate]

I have read-only access to a source S3 bucket. I cannot change permissions or anything of the sort on this source account and bucket. I do not own this account.
I would like to sync all files from the source bucket to my destination bucket. I own the account that contains the destination bucket.
I have a separate sets of credentials for the source bucket that I do not own and the destination bucket that I do own.
Is there a way to use the AWS CLI to sync between buckets using two sets of credentials?
aws s3 sync s3://source-bucket/ --profile source-profile s3://destination-bucket --profile default
If not, how can I setup permissions on my owned destination bucket to that I can sync with the CLI?
The built-in S3 copy mechanism, at the API level, requires the request be submitted to the target bucket, identifying the source bucket and object inside the request, and using a single set of credentials that has both authorization to read from the source and write to the target.
This is the only supported way to copy from one bucket to another without downloading and uploading the files.
The standard solution is found at http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html.
You can grant their user access to write your bucket or they can grant your user access to their bucket... but copying from one bucket to another without downloading and re-uploading the files is impossible without the complicity of both account owners to establish a single set of credentials with both privileges.
Use rclone for this. It's convenient but it does download and upload the files I believe which makes it slow for large data volumes.
rclone --config=creds.cfg copy source:bucket-name1/path/ destination:bucket-name2/path/
creds.cfg:
[source]
type = s3
provider = AWS
access_key_id = AAA
secret_access_key = bbb
[target]
type = s3
provider = AWS
access_key_id = CCC
secret_access_key = ddd
For this use case, I would consider Cross-Region Replication Where Source and Destination Buckets Are Owned by Different AWS Accounts
... you set up cross-region replication on the source
bucket owned by one account to replicate objects in a destination
bucket owned by another account.
The process is the same as setting up cross-region replication when
both buckets are owned by the same account, except that you do one
extra step—the destination bucket owner must create a bucket policy
granting the source bucket owner permission for replication actions.

AWS CLI syncing S3 buckets with multiple credentials

I have read-only access to a source S3 bucket. I cannot change permissions or anything of the sort on this source account and bucket. I do not own this account.
I would like to sync all files from the source bucket to my destination bucket. I own the account that contains the destination bucket.
I have a separate sets of credentials for the source bucket that I do not own and the destination bucket that I do own.
Is there a way to use the AWS CLI to sync between buckets using two sets of credentials?
aws s3 sync s3://source-bucket/ --profile source-profile s3://destination-bucket --profile default
If not, how can I setup permissions on my owned destination bucket to that I can sync with the CLI?
The built-in S3 copy mechanism, at the API level, requires the request be submitted to the target bucket, identifying the source bucket and object inside the request, and using a single set of credentials that has both authorization to read from the source and write to the target.
This is the only supported way to copy from one bucket to another without downloading and uploading the files.
The standard solution is found at http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html.
You can grant their user access to write your bucket or they can grant your user access to their bucket... but copying from one bucket to another without downloading and re-uploading the files is impossible without the complicity of both account owners to establish a single set of credentials with both privileges.
Use rclone for this. It's convenient but it does download and upload the files I believe which makes it slow for large data volumes.
rclone --config=creds.cfg copy source:bucket-name1/path/ destination:bucket-name2/path/
creds.cfg:
[source]
type = s3
provider = AWS
access_key_id = AAA
secret_access_key = bbb
[target]
type = s3
provider = AWS
access_key_id = CCC
secret_access_key = ddd
For this use case, I would consider Cross-Region Replication Where Source and Destination Buckets Are Owned by Different AWS Accounts
... you set up cross-region replication on the source
bucket owned by one account to replicate objects in a destination
bucket owned by another account.
The process is the same as setting up cross-region replication when
both buckets are owned by the same account, except that you do one
extra step—the destination bucket owner must create a bucket policy
granting the source bucket owner permission for replication actions.