Copying existing files in a s3 Bucket to another s3 bucket - amazon-web-services

I have a existing s3 bucket which contains large amount of files. I want to run a lambda function every 1 minute and copy those files to another destination s3 bucket.
My function is:
s3 = boto3.resource('s3')
clientname=boto3.client('s3')
def lambda_handler(event, context):
bucket = 'test-bucket-for-transfer-check'
try:
response = clientname.list_objects(
Bucket=bucket,
MaxKeys=5
)
for record in response['Contents']:
key = record['Key']
copy_source = {
'Bucket': bucket,
'Key': key
}
try:
destbucket = s3.Bucket('serverless-demo-s3-bucket')
destbucket.copy(copy_source, key)
print('{} transferred to destination bucket'.format(key))
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. '.format(key, bucket))
raise e
except Exception as e:
print(e)
raise e
Now how can I make sure the function is copying new files each time it runs??

Suppose the two buckets in question are Bucket-A and Bucket-B
and task to be done is copy files from Bucket-A --> Bucket-B
Lamba-1 (code in the question) reads all files from Bucket-A and copy them one-by-one to Bucket-B in a loop.
In the same loop it also puts one entry in DynamoDb table say
"Copy_Logs" with column File_key and Flag.
Where File_Key is the object key of the file
and Flag is set to false telling the state of copy operation
Now configure events on Bucket-B to invoke a Lambda-2 on every put and multi-part upload
Now Lambda-2 will read the object key from S3 notification payload and updates its respective record in DynamoDb table with flag set to true.
Now you have all records in DynamoDb table "Copy-Logs" which files were copied successfully and which were not.
HIH
[Update]
I missed the last part. It's a one-way sync. you can also get the list/names of files from both buckets and then do a difference of these set(file names) and upload the rest to target bucket.

Related

unable to get object metadata from S3. Check object key, region and/or access permissions."

I have a Lambda function that scans for text and is triggered by an S3 bucket. I get this error when trying to upload a photo directly into s3 bucket using browser
Unable to get object metadata from S3. Check object key, region, and/or access permissions
However, if I hardcode the key (e.g., image01.jpg) which is in my bucket, there are no errors.
import json
import boto3
def lambda_handler(event, context):
# Get bucket and file name
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
location = key[:17]
s3Client = boto3.client('s3')
client = boto3.client('rekognition', region_name='us-east-1')
response=client.detect_text(Image={'S3Object':
{'Bucket':'myarrowbucket','Name':key}})
detectedText = response['TextDetections']
I am confused as it was working a few weeks ago but now i am getting that error
ANSWER
I have seen this question answered many times and i tried every solution , the one which worked for me was 'key' name . i was getting the metadata error when the filename contained special characters e.g - or _ but when i changed the names of the files uploaded it works . Hope this answer helps someone.

How to add s3 trigger function in the lambda function to read the file

I need to add a s3 trigger in the lambda function in the source code itself instead of creating a trigger in the aws console. I need that trigger to read a file when it is uploaded on a particular folder of S3 bucket. I have done this using the creating the s3 trigger in the console itself with help of prefix. Can someone help with of creating this s3 trigger in the lambda function source code itself. Below is the source code of the lambda function for reading the file.
import json
import urllib.parse
import boto3
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
response = s3.get_object(Bucket=bucket, Key=key)
print("CONTENT TYPE: " + response['ContentType'])
return response['ContentType']
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
The AWS Lambda function would presumably only be run when the trigger detects a new object.
However, you are asking how to create the trigger from the AWS Lambda function, but that function will only be run when a trigger exists.
So, it is a Cart before the horse - Wikipedia situation.
Instead, you can create the trigger in Amazon S3 from the AWS Lambda console or via an API call, but this is done as part of the definition of the Lambda function -- it can't be done from within the source code of the function, since it isn't run until the trigger already exists!

Grouping S3 Files Into Subfolders

I have a pipeline that moves approximately 1 TB of data, all CSV files. In this pipeline there are hundreds of files with different names. They have a date component, which is automatically partitioned. My question is how to use the CDK to automatically create subfolders based on the name of the file. In other words, the data comes in as broad category, but our data scientists need it at one more level of detail.
It appears that your requirement is to move incoming objects into folders based on information in their filename (Key).
This could be done by adding a trigger on the Amazon S3 bucket that triggers an AWS Lambda function when a new object is created.
Here is some code from Moving file based on filename with Amazon S3:
import boto3
import urllib
def lambda_handler(event, context):
# Get the bucket and object key from the Event
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
# Only copy objects that were uploaded to the bucket root (to avoid an infinite loop)
if '/' not in key:
# Determine destination directory based on Key
directory = key # Your logic goes here to extract the directory name
# Copy object
s3_client = boto3.client('s3')
s3_client.copy_object(
Bucket = bucket,
Key = f"{directory}/{key}",
CopySource= {'Bucket': bucket, 'Key': key}
)
# Delete source object
s3_client.delete_object(
Bucket = bucket,
Key = key
)
You would need to modify the code that determines the name of the destination directory based on the key of the new object.
It also assumes that new objects will come into the top-level (root) of the bucket and then be moved into sub-directories. If, instead, new objects are coming in a given path (eg incoming/) then only set the S3 trigger to operate on that path and remove the if '/' not in key logic.

How to replicate objects arriving in Amazon S3 via an AWS Lambda function

I want to replicate few prefixes into another bucket.
I do not want to use the sync and replication service. I want to do it with lambda only and I have ongoing migration running so data is coming in every 20 minutes, no transformation is required.
Can someone help me how I can use AWS Lambda for sync and copy?
You can configure a Trigger for an AWS Lambda function so that it is invoked when an object is added to the Amazon S3 bucket. The function will receive information about the object that triggered the function, including its Bucket and Key.
You should code the Lambda function to call CopyObject() to copy that object to the desired destination bucket. Here's an example in Python:
import boto3
import urllib
def lambda_handler(event, context):
TARGET_BUCKET = 'my-target-bucket'
# Get the bucket and object key from the Event
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
# Copy object
s3_client = boto3.client('s3')
s3_client.copy_object(
Bucket = TARGET_BUCKET,
Key = key,
CopySource= {'Bucket': bucket, 'Key': key}
)

I would like to know how to import data into the app by modifying this lambda code

import boto3
import json
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = "cloud-translate-output"
key = "key value"
try:
data = s3.get_object(Bucket=bucket, Key=key)
json_data = data["Body"].read()
return{
"response_code" : 200,
"data": str(json_data)
}
except Exception as e:
print (e)
raise e
I'm making ios app with xcode.
And I want to use aws to bring data from s3 to app in order of app-api gateway-lambda-s3. But is there a way to use the data of api using api in app, and if I upload this data to bucket number 1 of s3, the cloudformation will translate the uploaded text file and automatically save it to bucket number 2, and I want to import the text data file stored in bucket number 2 back to app through lambda, not key value, is there a way to use only the name of the bucket?
if I upload this data to bucket number 1 of s3, the cloudformation will translate the uploaded text file and automatically save it to bucket number 2
Sadly, this is not how CloudFormation work. It can't read or translate automatically any files from buckets, or upload them to new buckets.
I would stick with a lambda function. It is more suited to such tasks.