get the content of a .txt file from S3 and display its content in HTML with lambda - amazon-web-services

I have a question. I have connected my S3 to Lambda via AWS. I need to get the content of a .txt hosted on my S3 and display that content in HTML.Each time a file is created in S3, the content of the file must be displayed via HTML in Lambda.
That is, if the .txt has "Hello" written in it, I need to display in HTML "Hello".
How can I achieve this? So far, this is what I have
import json
import urllib.parse
import boto3
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
response = s3.get_object(Bucket=bucket, Key=key)
str_response = ""
for key in response:
str_response = str_response + "\t" + key + ":" + str(response[key]) + "\n"
print("Full response from S3: \n"+ str_response+"\n CONTENT TYPE: " + response['ContentType'])
return response['ContentType']
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
I have searched and I found this
bucket.put_object(Key='index.html', Body=data, ContentType='text/html')
I don't know where it goes or how to use it.
Thanks

Related

Lambda task timeout but no application log

I have a python lambda that triggers by S3 uploads to a specific folder. The lambda function is to process the uploaded file and outputs it to another folder on the same S3 bucket.
The issue is that when I do a bulk upload using AWS console, some files do not get processed. I ended up setting a dead letter queue to catch these invocations. While inspecting the message in the queue, there is a request ID which I tried to find it in the lambda logs.
These are the logs for the request ID:
Now the odd part is that in the python code, the first line after the imports is print('Loading function') which does not show up in the lambda log?
Added the python code here. It should still print the Processing file name: " + key which is inside the handler ya?
import urllib.parse
from datetime import datetime
import boto3
from constants import CONTENT_TYPE, XML_EXTENSION, VALIDATING
from xml_process import *
from s3Integration import download_file
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
print("Processing file name: " + key)
try:
response = s3.get_object(Bucket=bucket, Key=key)
xml_content = response["Body"].read()
content_type = response["ContentType"]
tree = ET.fromstring(xml_content)
key_file_name = key.split("/")[1]
# Creating a temporary copy by downloading file to get the namespaces
temp_file_name = "/tmp/" + key_file_name
download_file(key, temp_file_name)
namespaces = {node[0]: node[1] for _, node in ET.iterparse(temp_file_name, events=['start-ns'])}
for name, value in namespaces.items():
ET.register_namespace(name, value)
# Preparing path for file processing
processed_file = key_file_name.split(".")[0] + "_processed." + key_file_name.split(".")[1]
print(processed_file, "processed")
db_record = XMLMapping(file_path=key,
processed_file_path=processed_file,
uploaded_by="lambda",
status=VALIDATING, uploaded_date=datetime.now(), is_active=True)
session.add(db_record)
session.commit()
if key_file_name.split(".")[1] == XML_EXTENSION:
if content_type in CONTENT_TYPE:
xml_parse(tree, db_record, processed_file, True)
else:
print("Content Type is not valid. Provided value: ", content_type)
else:
print("File extension is not valid. Provided extension: ", key_file_name.split(".")[1])
return "success"
except Exception as e:
print(e)
raise e
I don't think its a permission issue as other files uploaded in the same batch were processed successfully.

Copying S3 objects from one account to other using Lambda python

I'm using boto3 to copy files from s3 bucket from one account to other. I need a similar functionality like aws s3 sync. Please see my code. My company has decided to 'PULL' from other S3 bucket (source account). Please don't suggest replication, S3 batch, S3 trigger Lambda..etc. We have gone through all these options and my management do not want to do any configuration at source side. Can you please review this code and let me know if this code works for thousands of objects. Source bucket has nearly 10000 objects. We will create this lambda function in destination account and create a cloudwatch event to trigger the lambda once in a day.
I am checking ETag so that modified files will be copied across when this function is triggered.
Edit: I simplified my code just to see pagination works. It's working if I don't add client.copy(). If I add this line in for loop after reading 3,4 objects it's throwing "errorMessage": "2021-08-07T15:29:07.827Z 82757747-7b72-4f29-ae9f-22e95f969d6c Task timed out after 3.00 seconds". Please advise. Please note that 'test/' folder in my source bucket has around 1100 objects.
import os
import logging
import botocore
logger = logging.getLogger()
logger.setLevel(os.getenv('debug_level', 'INFO'))
client = boto3.client('s3')
def handler(event, context):
main(event, logger)
def main(event, logger):
try:
SOURCE_BUCKET = os.environ.get('SRC_BUCKET')
DEST_BUCKET = os.environ.get('DST_BUCKET')
REGION = os.environ.get('REGION')
prefix = 'test/'
# Create a reusable Paginator
paginator = client.get_paginator('list_objects_v2')
print ('after paginator')
# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET,Prefix = prefix)
print ('after page iterator')
index = 0
for page in page_iterator:
for obj in page['Contents']:
index += 1
print ("I am looking for {} in the source bucket".format(obj['ETag']))
copy_source = {'Bucket': SOURCE_BUCKET, 'Key': obj['Key']}
client.copy(copy_source, DEST_BUCKET, obj['Key'])
logger.info("number of objects copied {}:".format(index))
except botocore.exceptions.ClientError as e:
raise
This version is working fine if I increase the Lambda timeout to 15 min and memory to 512MB. This checks if the source object already exists in destination before copying.
import boto3
import os
import logging
import botocore
from botocore.client import Config
logger = logging.getLogger()
logger.setLevel(os.getenv('debug_level', 'INFO'))
config = Config(connect_timeout=5, retries={'max_attempts': 0})
client = boto3.client('s3', config=config)
#client = boto3.client('s3')
def handler(event, context):
main(event, logger)
def main(event, logger):
try:
DEST_BUCKET = os.environ.get('DST_BUCKET')
SOURCE_BUCKET = os.environ.get('SRC_BUCKET')
REGION = os.environ.get('REGION')
prefix = ''
# Create a reusable Paginator
paginator = client.get_paginator('list_objects_v2')
print ('after paginator')
# Create a PageIterator from the Paginator
page_iterator_src = paginator.paginate(Bucket=SOURCE_BUCKET,Prefix = prefix)
page_iterator_dest = paginator.paginate(Bucket=DEST_BUCKET,Prefix = prefix)
print ('after page iterator')
index = 0
for page_source in page_iterator_src:
for obj_src in page_source['Contents']:
flag = "FALSE"
for page_dest in page_iterator_dest:
for obj_dest in page_dest['Contents']:
# checks if source ETag already exists in destination
if obj_src['ETag'] in obj_dest['ETag']:
flag = "TRUE"
break
if flag == "TRUE":
break
if flag != "TRUE":
index += 1
client.copy_object(Bucket=DEST_BUCKET, CopySource={'Bucket': SOURCE_BUCKET, 'Key': obj_src['Key']}, Key=obj_src['Key'],)
print ("source ETag {} and destination ETag {}".format(obj_src['ETag'],obj_dest['ETag']))
print ("source Key {} and destination Key {}".format(obj_src['Key'],obj_dest['Key']))
print ("Number of objects copied{}".format(index))
logger.info("number of objects copied {}:".format(index))
except botocore.exceptions.ClientError as e:
raise

Multiple Email Endpoints of a SNS Topic

I have a Lambda function that gets triggered by a PUT event in a S3 bucket. This function needs to send email to all subscribers of a SNS topic.
Lambda Function code is:
import json
import boto3
print('Loading function')
s3 = boto3.client('s3')
sns = boto3.client('sns')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
eventname = event['Records'][0]['eventName']
sns_message = str("A new file has been uploaded in our S3 bucket. Please find the details of file uploaded belown\n\nBUCKET NAME: "+ bucket +"\nFILE NAME: " + key)
try:
if eventname == "ObjectCreated:Put":
subject= "New data available in S3 Bucket [" + bucket +"]"
sns_response = sns.publish(TargetArn='<SNS ARN ID>',Message= str(sns_message),Subject= str(subject))
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
However, email is getting triggered only to one email endpoint subscribed to the SNS topic. Any idea what am I missing?
There was an error in my Code. Instead of using TopicArn, I was using TargetArn. Changing the code to TopicArn solved the issue.

Multiple Emails getting sent from SNS triggered by Lambda

I have a Lambda function that gets triggered by a new PUT request in a particular S3 bucket. And sends an email through a SNS topic.
Lambda function code is listed below:
import json
import boto3
print('Loading function')
s3 = boto3.client('s3')
sns = boto3.client('sns')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
eventname = event['Records'][0]['eventName']
sns_message = str("A new file has been uploaded in our S3 bucket. Please find the details of file uploaded belown\n\nBUCKET NAME: "+ bucket +"\nFILE NAME: " + key)
try:
if eventname == "ObjectCreated:Put":
subject= "New data available in S3 Bucket [" + bucket +"]"
sns_response = sns.publish(TargetArn='<SNS ARN ID>',Message= str(sns_message),Subject= str(subject))
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
However, 2 emails are getting triggered when a new object is uploaded to the S3 bucket. One with subject programmed via the lambda function, and other one with AWS Notification containing the JSON as listed below.
{"version":"1.0","timestamp":"2021-05-28T13:38:24.548Z","requestContext":{"requestId":"<Request ID>","functionArn":"<Lambda Function ARN>:$LATEST","condition":"Success","approximateInvokeCount":2},"requestPayload":{"Records":[{"eventVersion":"2.1","eventSource":"aws:s3","awsRegion":"us-east-1","eventTime":"2021-05-28T13:35:24.743Z","eventName":"ObjectCreated:Put","userIdentity":{"principalId":"<my ID>"},"requestParameters":{"sourceIPAddress":"<IP Address>"},"responseElements":{"x-amz-request-id":"ABCD","x-amz-id-2":"EFGH"},"s3":{"s3SchemaVersion":"1.0","configurationId":"<Config ID>","bucket":{"name":"<Bucket Name>","ownerIdentity":{"principalId":"<ID>"},"arn":"<Bucket ARN>"},"object":{"key":"<File Key>","size":632088,"eTag":"<Etag>","versionId":"<Version ID>","sequencer":"<Seq>"}}}]},"responseContext":{"statusCode":200,"executedVersion":"$LATEST"},"responsePayload":null}
I want only 1 email to get triggered. How to implement that?

Dump lambda output to csv and have it email as an attachment

I have a lambda function that generates a list of untagged buckets in AWS environment. Currently I send the output to a slack channel directly. Instead I would like to have my lambda dump the output to a csv file and send it as a report. Here is the code for it, let me know if you need any other details.
import boto3
from botocore.exceptions import ClientError
import urllib3
import json
http = urllib3.PoolManager()
def lambda_handler(event, context):
#Printing the S3 buckets with no tags
s3 = boto3.client('s3')
s3_re = boto3.resource('s3')
buckets = []
print('Printing buckets with no tags..')
for bucket in s3_re.buckets.all():
s3_bucket = bucket
s3_bucket_name = s3_bucket.name
try:
response = s3.get_bucket_tagging(Bucket=s3_bucket_name)
except ClientError:
buckets.append(bucket)
print(bucket)
for bucket in buckets:
data = {"text": "%s bucket has no tags" % (bucket)}
r = http.request("POST", "https://hooks.slack.com/services/~/~/~",
body = json.dumps(data),
headers = {"Content-Type": "application/json"})