s3 presign url between s3 buckets using Lambda - amazon-web-services

hope all is well. I am trying to upload a file that is situated in my s3 bucket to another bucket. However, I want to use a lambda function to upload it to another bucket using s3 presign URL as I want it to have an expiration feature in the new bucket. I passed the object file URL as key when uploading to destination bucket but does not seem to work. Some guidance would be appreciated.
import json
import time
import boto3
s3= boto3.client('s3')
time.sleep(10)
bucket_name_file='mybucketname'
#fetch last modified item from bucket
response = s3.list_objects_v2(Bucket=bucket_name2)
all = response2['Contents']
latest = max(all, key=lambda x: x['LastModified'])
my_file_name=latest['Key']
url_of_my_filename='https://'+bucket_name_file+'.s3.amazonaws.com/'+my_file_name
###################################################
destination_bucket_to_send='my_destination_bucket'
url=s3_client.generate_presigned_url('put_object',
Params={'Bucket': destination_bucket,
'Key':url_of_my_filename,
},
ExpiresIn=20000)

It would appear that your goal is to use Amazon Translate to translate some text via a Transcription Job. You then want to offer the resulting translation via a temporary URL.
To accomplish this, you can create an Amazon S3 pre-signed URL on the object that was created by the transcription job. This URL can then be used from the Internet to obtain the translation. Once the expiry period has passed, the URL will no longer provide access to the object.

Related

AWS S3 copy files from one bucket to other using boto3

I am planning to use AWS Python SDK (Boto3) to copy files from one bucket to other. Below is the same code I got from AWS documentation :
dest_object.copy_from(CopySource={
'Bucket': self.object.bucket_name,
'Key': self.object.key
})
My question is how do I trigger this code and where should I deploy this code?
I originally thought of Lambda function but I am looking for alternate options in case Lambda times out for larger files ( 1 TB etc.).
Can I use Airflow to trigger this code somehow? may be invoke through Lambda ? Looking for suggestions from AWS experts.
The easiest way to copy new files to another bucket is to use Amazon S3 Replication. It will automatically copy new objects to the selected bucket, no code required.
However, this will not meet your requirement of deleting the incoming file after it is copied. Therefore, you should create an AWS Lambda function and add a S3 trigger. This will trigger the Lambda function whenever an object is created in the bucket.
The Lambda function should:
Extract the bucket and object name from the event parameter
Copy the object to the target bucket
Delete the original object
The code would look something like:
import boto3
import urllib
TARGET_BUCKET = 'target_bucket' # Change this
def lambda_handler(event, context):
s3_resource = boto3.resource('s3')
# Loop through each incoming object
for record in event['Records']:
# Get incoming bucket and key
source_bucket = record['s3']['bucket']['name']
source_key = urllib.parse.unquote_plus(record['s3']['object']['key'])
# Copy object to different bucket
copy_source = {
'Bucket': source_bucket,
'Key': source_key
}
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).copy(copy_source)
# Delete original object
s3_resource.Bucket(source_bucket).Object(source_key).delete()
The copy process is unlikely to approach the 15-minute limit of AWS Lambda, but it is worth testing on large objects.

Are incoming files via AWS Transfer Family into S3 taggable?

At the moment I am facing a problem, that I can't determine if a file was PUT via the AWS Transfer Family or via the S3 GUI.
Is there any change to default tag files which are PUT on S3 via AWS Transfer Family?
Regards
Ribase
There is S3 object metadata described in the Transfer Family user guide for post upload processing, which indicates Transfer Family uploaded this.
One use case and application of using the metadata is when an SFTP user has an inbox and an outbox. For the inbox, objects are put by an SFTP client. For the outbox, objects are put by the post upload processing pipeline. If there is an S3 event notification, the downstream service on the processor side can do an S3 HeadObject call for the metadata, dismiss if it does not have the metadata, and only process incoming files.
You could also use Transfer Family managed workflows to apply a Tag step. An example of application of using the Tag step can be found in demo 1 of the AWS Transfer Family managed workflows demo video.
Configure the S3 bucket where Transfer Family is writing the files to trigger a Lambda using an Event Notification.
Use this Boto3 code in the Lambda. It will tag the file with the principal that placed the file in S3. If it is the Transfer Familiy then it is the role that was assigned to Transfer Family to write the files to the bucket. If it is a user uploading the files via the Console then it will be that users role.
import boto3
import json
import urllib.parse
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
principal = event['Records'][0]['userIdentity']['principalId']
try:
s3 = boto3.client('s3')
response = s3.put_object_tagging(
Bucket = bucket,
Key = key,
Tagging={
'TagSet': [
{
'Key': 'Principal',
'Value': str(principal)
},
]
}
)
except Exception as e:
print('Error {}.'.format(e))

AWS pre-signed URL returns Signature mismatch on new bucket

Have following code to generate pre-signed URL:
params = {'Bucket': bucket_name, 'Key': object_name}
response = s3_client.generate_presigned_url('get_object',
Params=params,
ExpiresIn=expiration)
that works fine on old one bucket I am using for last year:
https://old-bucket.s3.amazonaws.com/test_image.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIxxxxxxxxxxE%2F20210917%2Feu-north-1%2Fs3%2Faws4_request&X-Amz-Date=20210917T210448Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=54e173601fec5f140dd901b0eae1dafbcd8d7ee8b8f311fdc1b120ca447cdd0c
I can paste this URL to browser and download file. File is AWS-KMS encrypted.
But same AWS-KMS encrypted file uploaded to new one created bucket returns following URL:
https://new-bucket.s3.amazonaws.com/test_image.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIxxxxxxxxxxE%2F20210917%2Feu-north-1%2Fs3%2Faws4_request&X-Amz-Date=20210917T210500Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=2313e0131d4251f9fba522fc8e9880d960f674f3449e141848bd38ca19e1b528
returns SignatureDoesNotMatch error:
The request signature we calculated does not match the signature you provided. Check your key and signing method.
No any changes in source code - but just bucket name provided to generate_presigned_url function.
The IAM user I am providing to boto3.client has write/read permissions for both buckets.
Comparing properties and permissions for both buckets and for files I am requesting from buckets - everything looks the same.
GetObject and PutObject works fine for both buckets in a case of dealing with file directly. The issue is only in a case of using pre-signed URL.
So is any settings/permissions/rules/anything else need to be configured/enabled to make pre-signed URLs working with certain S3 bucket?

Not able to retrieve processed file from S3 Bucket

I'm an AWS newbie trying to use Textract API, their OCR service.
As far as I understood I need to upload files to a S3 bucket and then run textract on it.
I got the bucket on and the file inside it:
I got the permissions:
But when I run my code it bugs.
import boto3
import trp
# Document
s3BucketName = "textract-console-us-east-1-057eddde-3f44-45c5-9208-fec27f9f6420"
documentName = "ok0001_prioridade01_x45f3.pdf"
]\[\[""
# Amazon Textract client
textract = boto3.client('textract',region_name="us-east-1",aws_access_key_id="xxxxxx",
aws_secret_access_key="xxxxxxxxx")
# Call Amazon Textract
response = textract.analyze_document(
Document={
'S3Object': {
'Bucket': s3BucketName,
'Name': documentName
}
},
FeatureTypes=["TABLES"])
Here is the error I get:
botocore.errorfactory.InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the AnalyzeDocument operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
What am I missing? How could I solve that?
You are missing S3 access policy, you should add AmazonS3ReadOnlyAccess policy if you want a quick solution according to your needs.
A good practice is to apply the least privilege access principle and keep granting access when needed. So I'd advice you to create a specific policy to access your S3 bucket textract-console-us-east-1-057eddde-3f44-45c5-9208-fec27f9f6420 only and only in us-east-1 region.
Amazon Textract currently supports PNG, JPEG, and PDF formats. Looks like you are using PDF.
Once you have a valid format, you can use the Python S3 API to read the data of the object in the S3 object. Once you read the object, you can pass the byte array to the analyze_document method. TO see a full example of how to use the AWS SDK for Python (Boto3) with Amazon Textract to
detect text, form, and table elements in document images.
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/python/example_code/textract/textract_wrapper.py
Try following that code example to see if your issue is resolved.
"Could you provide some clearance on the params to use"
I just ran the Java V2 example and it works perfecly. In this example, i am using a PNG file located in a specific Amazon S3 bucket.
Here are the parameters that you need:
Make sure when implementing this in Python, you set the same parameters.

Getting S3 public policy using boto3

I want to get the bucket policy for the various buckets. I tried the following code snippet(picked from the boto3 documentation):
conn = boto3.resource('s3')
bucket_policy=conn.BucketPolicy('demo-bucket-py')
print(bucket_policy)
But here's the output I get :
s3.BucketPolicy(bucket_name='demo-bucket-py')
What shall I rectify here ? Or is there some another way to get the access policy for s3 ?
Try print(bucket_policy.policy). More information on that here.
this worked for me
import boto3
# Create an S3 client
s3 = boto3.client('s3')
# Call to S3 to retrieve the policy for the given bucket
result = s3.get_bucket_policy(Bucket='my-bucket')
print(result)
to perform this you need to configure or mention your keys like this s3=boto3.client("s3",aws_access_key_id=access_key_id,aws_secret_access_key=secret_key). BUT there is much better way to do this is by using aws configure command and enter your credentials. for setting up docs. Once you set up you wont need to enter your keys again in your code, boto3 or aws cli will automatically fetch it behind the scenes .https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html.
you can even set different profiles to work with different accounts