How to enable the bucket replication using boto3 'put_bucket_replication' method? - python-2.7

I need to enable the bucket replication for my s3 bucket.
I referred below 2 links for this and written the code (see below). But replication not enabled. But weirdly, a file created in the source bucket.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.put_bucket_replication
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTreplication.html
Could anyone help me with this?
source bucket name = tst;
Destination bucket name = rst
import boto3
s3 = boto3.client('s3',endpoint_url=S3_URL,aws_access_key_id=ACCESS_ID,aws_secret_access_key=SECRET_KEY,region_name=REGION)
s3.put_bucket_replication(Bucket='tst',ReplicationConfiguration={'Role': 'arn:aws:iam::10000003:root','Rules': [{'Status': 'Enabled','Destination': {'Bucket': 'arn:aws:s3:::rst'}}]})

Related

automatically move object from one s3 bucket to another

I want to automatically move objects from first s3 bucket to second bucket. As and when a file is created or uploaded to first bucket, that should be moved across to the second bucket. There shouldn't be any copy of the file on the source bucket after the transfer.
I have seen examples of aws s3 sync but that leaves a copy on the source bucket and it's not automated.
aws mv command from cli will move the files across but how to automate the process. Creating a lambda notification and send the files to second bucket could solve but I am looking for a more automated simpler solution. Not sure if there is anything we could do with SQS? Is there anything we can set on the source bucket which would automatically send the object to the second? Appreciate any ideas
There is no "move" command in Amazon S3. Instead, it involves CopyObject() and DeleteObject(). Even the AWS CLI aws mv command does a Copy and Delete.
The easiest way to achieve your object is:
Configure an Event on the source bucket to trigger an AWS Lambda function
Code the Lambda function to copy the object to the target bucket, then delete the source object
Here's some sample code:
import boto3
import urllib
TARGET_BUCKET = 'my-target-bucket'
def lambda_handler(event, context):
# Get incoming bucket and key
source_bucket = event['Records'][0]['s3']['bucket']['name']
source_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
# Copy object to different bucket
s3_resource = boto3.resource('s3')
copy_source = {
'Bucket': source_bucket,
'Key': source_key
}
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).copy(copy_source)
# Delete the source object
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).delete()
It will copy the object to the same path in the destination bucket and then delete the source object.
The Lambda function should be assigned an IAM Role that has sufficient permission to GetObject from the source bucket and CopyObject + PutObject in the destination bucket.

Are incoming files via AWS Transfer Family into S3 taggable?

At the moment I am facing a problem, that I can't determine if a file was PUT via the AWS Transfer Family or via the S3 GUI.
Is there any change to default tag files which are PUT on S3 via AWS Transfer Family?
Regards
Ribase
There is S3 object metadata described in the Transfer Family user guide for post upload processing, which indicates Transfer Family uploaded this.
One use case and application of using the metadata is when an SFTP user has an inbox and an outbox. For the inbox, objects are put by an SFTP client. For the outbox, objects are put by the post upload processing pipeline. If there is an S3 event notification, the downstream service on the processor side can do an S3 HeadObject call for the metadata, dismiss if it does not have the metadata, and only process incoming files.
You could also use Transfer Family managed workflows to apply a Tag step. An example of application of using the Tag step can be found in demo 1 of the AWS Transfer Family managed workflows demo video.
Configure the S3 bucket where Transfer Family is writing the files to trigger a Lambda using an Event Notification.
Use this Boto3 code in the Lambda. It will tag the file with the principal that placed the file in S3. If it is the Transfer Familiy then it is the role that was assigned to Transfer Family to write the files to the bucket. If it is a user uploading the files via the Console then it will be that users role.
import boto3
import json
import urllib.parse
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
principal = event['Records'][0]['userIdentity']['principalId']
try:
s3 = boto3.client('s3')
response = s3.put_object_tagging(
Bucket = bucket,
Key = key,
Tagging={
'TagSet': [
{
'Key': 'Principal',
'Value': str(principal)
},
]
}
)
except Exception as e:
print('Error {}.'.format(e))

<Error><Code>AuthorizationHeaderMalformed</Code>

I'm using lambda to modify some csv files from s3 bucket and writing it to a different s3 bucket using AWS Javascript SDK. The buckets for getObject and putObject are in different regions. The lambda is in the same region as the destination buckets. But the modified files in the destination buckets have this error in them
AuthorizationHeaderMalformedThe authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-west-2'us-west-2.
Whenever the source and destination buckets are in same region, I get the proper modified files.
What changes do I need to make this work when source and destination bucket are in different regions.
Thanks
S3 service is global, but bucket itself in regional, which means when you neet to use a bucket you need to do it using the same region where the bucket exists.
If I understood correct your source bucket is in us-west-2 and your destination bucket is in us-east-1.
So you need to use like this:
s3_source = boto3.client('s3', region_name='us-west-2')
... your logic to get and handle the file ...
s3_destination = boto3.client('s3', region_name='us-east-1')
... your logic to write the file ...

How to access Google Cloud Storage bucket using aws-cli

I have an access with both aws and Google Cloud Platform.
Is this possible to do the following,
List Google Cloud Storage bucket using aws-cli
PUT a CSV file to Google Cloud Storage bucket using aws-cli
GET an object(s) from Google Cloud Storage bucket using aws-cli
It is possible. Per GCP documentation
The Cloud Storage XML API is interoperable with ... services such as Amazon Simple Storage Service (Amazon S3)
To do this you need to enable Interoperability in the Settings screen in the Google Cloud Storage console. From there you can creates a storage access key.
Configure the aws cli with those keys. IE aws configure.
You can then use the aws s3 command with the --endpoint-url flag set to https://storage.googleapis.com.
For example:
MacBook-Pro:~$ aws s3 --endpoint-url https://storage.googleapis.com ls
2018-02-09 14:43:42 foo.appspot.com
2018-02-09 14:43:42 bar.appspot.com
2018-05-02 20:03:08 etc.appspot.com
aws s3 --endpoint-url https://storage.googleapis.com cp test.md s3://foo.appspot.com
upload: ./test.md to s3://foo.appspot.com/test.md
I had a requirement to copy objects from GC storage bucket to S3 using AWS Lambda.
Python boto3 library allows listing and downloading objects from GC bucket.
Below is sample lambda code to copy "sample-data-s3.csv" object from GC bucket to s3 bucket.
import boto3
import io
s3 = boto3.resource('s3')
google_access_key_id="GOOG1EIxxMYKEYxxMQ"
google_access_key_secret="QifDxxMYSECRETKEYxxVU1oad1b"
gc_bucket_name="my_gc_bucket"
def get_gcs_objects(google_access_key_id, google_access_key_secret,
gc_bucket_name):
"""Gets GCS objects using boto3 SDK"""
client = boto3.client("s3", region_name="auto",
endpoint_url="https://storage.googleapis.com",
aws_access_key_id=google_access_key_id,
aws_secret_access_key=google_access_key_secret)
# Call GCS to list objects in gc_bucket_name
response = client.list_objects(Bucket=gc_bucket_name)
# Print object names
print("Objects:")
for blob in response["Contents"]:
print(blob)
object = s3.Object('my_aws_s3_bucket', 'sample-data-s3.csv')
f = io.BytesIO()
client.download_fileobj(gc_bucket_name,"sample-data.csv",f)
object.put(Body=f.getvalue())
def lambda_handler(event, context):
get_gcs_objects(google_access_key_id,google_access_key_secret,gc_bucket_name)
You can loop through blob to download all objects from GC bucket.
Hope this helps someone who wants to use AWS lambda to transfer objects from GC bucket to s3 bucket.
~$ aws configure
AWS Access Key ID [****************2ZL8]:
AWS Secret Access Key [****************obYP]:
Default region name [None]: us-east-1
Default output format [None]:
~$aws s3 ls --endpoint-url=<east-region-url>
2019-02-18 12:18:05 test
~$aws s3 cp test.py s3://<bucket-name> --endpoint-url=<east-region-url>
~$aws s3 mv s3://<bucket-name>/<filename> test1.txt --endpoint-url=<east-region-url>
Unfortunately this is not possible,
Could you maybe update your question to why you want to do this, maybe we know of an alternative solution to your question?

AWS CLI syncing S3 buckets with multiple credentials

I have read-only access to a source S3 bucket. I cannot change permissions or anything of the sort on this source account and bucket. I do not own this account.
I would like to sync all files from the source bucket to my destination bucket. I own the account that contains the destination bucket.
I have a separate sets of credentials for the source bucket that I do not own and the destination bucket that I do own.
Is there a way to use the AWS CLI to sync between buckets using two sets of credentials?
aws s3 sync s3://source-bucket/ --profile source-profile s3://destination-bucket --profile default
If not, how can I setup permissions on my owned destination bucket to that I can sync with the CLI?
The built-in S3 copy mechanism, at the API level, requires the request be submitted to the target bucket, identifying the source bucket and object inside the request, and using a single set of credentials that has both authorization to read from the source and write to the target.
This is the only supported way to copy from one bucket to another without downloading and uploading the files.
The standard solution is found at http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html.
You can grant their user access to write your bucket or they can grant your user access to their bucket... but copying from one bucket to another without downloading and re-uploading the files is impossible without the complicity of both account owners to establish a single set of credentials with both privileges.
Use rclone for this. It's convenient but it does download and upload the files I believe which makes it slow for large data volumes.
rclone --config=creds.cfg copy source:bucket-name1/path/ destination:bucket-name2/path/
creds.cfg:
[source]
type = s3
provider = AWS
access_key_id = AAA
secret_access_key = bbb
[target]
type = s3
provider = AWS
access_key_id = CCC
secret_access_key = ddd
For this use case, I would consider Cross-Region Replication Where Source and Destination Buckets Are Owned by Different AWS Accounts
... you set up cross-region replication on the source
bucket owned by one account to replicate objects in a destination
bucket owned by another account.
The process is the same as setting up cross-region replication when
both buckets are owned by the same account, except that you do one
extra step—the destination bucket owner must create a bucket policy
granting the source bucket owner permission for replication actions.