Get information about a file uploaded to S3 - python-2.7

i have created a lambda function that sends emails whenever a file is uploaded on s3 bucket, but now i want to have all the informations related to that file as the name, size, date and time of upload, and if it's possible where it comes from.
I have all this infortmation on aws console, but want to have it in the email body.
i am using serverless framework. v 1.22.0
here is my code
import json
import boto3
import botocore
import logging
import sys
import os
import traceback
from botocore.exceptions import ClientError
from pprint import pprint
from time import strftime, gmtime
email_from = '********#*****.com'
email_to = '********#*****.com'
email_subject = 'new event on s3 '
email_body = 'a new file is uploaded'
#setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.INFO)
from botocore.exceptions import ClientError
def sthree(event, context):
"""Send email whenever a file is uploaded to S3"""
body = {}
status_code = 200
email_body = str(context)
try:
s3 = boto3.client('s3')
ses = boto3.client('ses')
ses.send_email(Source = email_from,
Destination = {'ToAddresses': [email_to,],},
Message = {'Subject': {'Data': email_subject}, 'Body':{'Text' : {'Data': email_body}}}
)
except Exception as e:
print(traceback.format_exc())
status_code = 500
body["message"] = json.dumps(e)
response = {
"statusCode": 200,
"body": json.dumps(body)
}
return response

Here is the event json structure sent by S3 upon object creation:
http://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html
You can get the file names, sizes and source ip like this:
for record in event['Records']:
filename = record['s3']['object']['key'];
filesize = record['s3']['object']['size'];
source = record['requestParameters']['sourceIPAddress'];
eventTime = record['eventTime'];

def lambda_handler(event, context):
s3 = boto3.client('s3')
email_from = 'XXXXXXXXX#XXX.com'
email_to = 'XXXXXXXXX#XXX.com'
email_subject = 'new event on s3'
email_body = "File Name :" + event[u'Records'][0][u's3'][u'object'][u'key'] + "\n" + "File Size :" + str(event[u'Records'][0][u's3'][u'object'][u'size']) + "\n" + "Upload Time :" + event[u'Records'][0][u'eventTime'] + "\n" + "User Details :" + event[u'Records'][0][u'userIdentity'][u'principalId']
ses = boto3.client('ses')
ses.send_email(Source = email_from,
Destination = {'ToAddresses': [email_to,],},
Message = {'Subject': {'Data': email_subject}, 'Body':{'Text' : {'Data': email_body}}}
)
print("Function execution Completed !!!")

Related

Calling opensearch service via lambda, but reqest not getting posted

Posting data on opensearch service via lambda, but when I am going to opensearch service Endpoint URL to check getting below error.
{
"error" : "no handler found for uri [/lambda-s3-index/lambda-type/_search] and method [GET]"
}
Tried printing the response while posting, getting 400. below is the code
import boto3
import requests
from requests_aws4auth import AWS4Auth
import os
import json
import datetime
region = 'us-east-1'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
index = 'lambda-s3-index'
type = 'lambda-type'
host = os.environ['ES_DOMAIN_URL']
url = host + '/' + index + '/' + type
headers = { "Content-Type": "application/json" }
s3 = boto3.client('s3')
bucket = os.environ['S3_BUCKET']
# Lambda execution starts here
def handler(event, context):
sensorID = event['sensorID']
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
temperature = event['temperature']
document = { "sensorID": sensorID, "timestamp": timestamp, "temperature": temperature }
print(document)
# post to S3 for storage
s3.put_object(Body=json.dumps(document).encode(), Bucket=bucket, Key=sensorID+"-"+timestamp+".json")
# post to amazon elastic search for indexing and kibana use
r = requests.post(url, auth=awsauth, json=document, headers=headers)
print(r)
response = "Data Uploaded"
return {
"Response" : response,
"sensorID" : sensorID,
"temperature": temperature
}

Lambda task timeout but no application log

I have a python lambda that triggers by S3 uploads to a specific folder. The lambda function is to process the uploaded file and outputs it to another folder on the same S3 bucket.
The issue is that when I do a bulk upload using AWS console, some files do not get processed. I ended up setting a dead letter queue to catch these invocations. While inspecting the message in the queue, there is a request ID which I tried to find it in the lambda logs.
These are the logs for the request ID:
Now the odd part is that in the python code, the first line after the imports is print('Loading function') which does not show up in the lambda log?
Added the python code here. It should still print the Processing file name: " + key which is inside the handler ya?
import urllib.parse
from datetime import datetime
import boto3
from constants import CONTENT_TYPE, XML_EXTENSION, VALIDATING
from xml_process import *
from s3Integration import download_file
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
print("Processing file name: " + key)
try:
response = s3.get_object(Bucket=bucket, Key=key)
xml_content = response["Body"].read()
content_type = response["ContentType"]
tree = ET.fromstring(xml_content)
key_file_name = key.split("/")[1]
# Creating a temporary copy by downloading file to get the namespaces
temp_file_name = "/tmp/" + key_file_name
download_file(key, temp_file_name)
namespaces = {node[0]: node[1] for _, node in ET.iterparse(temp_file_name, events=['start-ns'])}
for name, value in namespaces.items():
ET.register_namespace(name, value)
# Preparing path for file processing
processed_file = key_file_name.split(".")[0] + "_processed." + key_file_name.split(".")[1]
print(processed_file, "processed")
db_record = XMLMapping(file_path=key,
processed_file_path=processed_file,
uploaded_by="lambda",
status=VALIDATING, uploaded_date=datetime.now(), is_active=True)
session.add(db_record)
session.commit()
if key_file_name.split(".")[1] == XML_EXTENSION:
if content_type in CONTENT_TYPE:
xml_parse(tree, db_record, processed_file, True)
else:
print("Content Type is not valid. Provided value: ", content_type)
else:
print("File extension is not valid. Provided extension: ", key_file_name.split(".")[1])
return "success"
except Exception as e:
print(e)
raise e
I don't think its a permission issue as other files uploaded in the same batch were processed successfully.

Send POST request from AWS SES received email content

I'm working to forward received email on AWS SES to slack webhook.
The workflow I tried is:
my personal email -> SES -> S3 -> Lambda -> POST Request
I've been stuck on this lambda function since its not sending post request to webhook url
from ast import parse
import boto3
import ConfigParser
import urllib3
import json
from email.parser import FeedParser
from email.header import decode_header
http = urllib3.PoolManager()
def lambda_handler(event, context):
try:
record = event["Records"][0]
bucket_region = record["awsRegion"]
bucket_name = record["s3"]["bucket"]["name"]
mail_object_key = record["s3"]["object"]["key"]
s3 = boto3.client('s3', region_name=bucket_region)
mail_object = s3.get_object(Bucket = bucket_name, Key = mail_object_key)
mail_body = ''
try:
mail_body = mail_object["Body"].read().decode('utf-8')
except:
mail_body = mail_object["Body"].read()
parser = FeedParser()
parser.feed(mail_body)
parsed_mail = parser.close()
(d_sub, sub_charset) = decode_header(parsed_mail['Subject'])[0]
subject = d_sub.decode(sub_charset)
payload = parsed_mail.get_payload(decode=parsed_mail['Content-Transfer-Encoding'])
body_charset = parsed_mail.get_content_charset()
body = payload.decode(body_charset)
url = "MY_SLACK_WEBHOOK_URL"
msg = {
"Content": parsed_mail
}
encoded_msg = json.dumps(msg).encode('utf-8')
resp = http.request('POST',url, body=encoded_msg)
print({
"message": parsed_mail,
"status_code": resp.status,
"response": resp.data
})
except:
print('Mail received, but I got some error.')
Could anyone please look out to my code?
This is the cloudwatch log when lambda event triggered.
START RequestId: 00f3e7db-807e-48e9-a775-6f0117431b83 Version: $LATEST
Mail received, but I got some error.
END RequestId: 00f3e7db-807e-48e9-a775-6f0117431b83
REPORT RequestId: 00f3e7db-807e-48e9-a775-6f0117431b83 Duration: 1933.59 ms Billed Duration: 1934 ms Memory Size: 128 MB Max Memory Used: 74 MB Init Duration: 317.15 ms

I would like to export DynamoDB Table to S3 bucket in CSV format using Python (Boto3)

This question has been asked earlier in the following link:
How to write dynamodb scan data's in CSV and upload to s3 bucket using python?
I have amended the code as advised in the comments. The code looks like as follows:
import csv
import boto3
import json
dynamodb = boto3.resource('dynamodb')
db = dynamodb.Table('employee_details')
def lambda_handler(event, context):
AWS_BUCKET_NAME = 'session5cloudfront'
s3 = boto3.resource('s3')
bucket = s3.Bucket(AWS_BUCKET_NAME)
path = '/tmp/' + 'employees.csv'
try:
response = db.scan()
myFile = open(path, 'w')
for i in response['Items']:
csv.register_dialect('myDialect', delimiter=' ', quoting=csv.QUOTE_NONE)
with myFile:
writer = csv.writer(myFile, dialect='myDialect')
writer.writerows(i)
print(i)
except :
print("error")
bucket.put_object(
ACL='public-read',
ContentType='application/csv',
Key=path,
# Body=json.dumps(i),
)
# print("here")
body = {
"uploaded": "true",
"bucket": AWS_BUCKET_NAME,
"path": path,
}
# print("then here")
return {
"statusCode": 200,
"body": json.dumps(body)
}
I am a novice, please help me in fixing this code as it is having problem in inserting data in file created in S3 Bucket.
Thanks
I have revised the code to be simpler and to also handle paginated responses for tables with more than 1MB of data:
import csv
import boto3
import json
TABLE_NAME = 'employee_details'
OUTPUT_BUCKET = 'my-bucket'
TEMP_FILENAME = '/tmp/employees.csv'
OUTPUT_KEY = 'employees.csv'
s3_resource = boto3.resource('s3')
dynamodb_resource = boto3.resource('dynamodb')
table = dynamodb_resource.Table(TABLE_NAME)
def lambda_handler(event, context):
with open(TEMP_FILENAME, 'w') as output_file:
writer = csv.writer(output_file)
header = True
first_page = True
# Paginate results
while True:
# Scan DynamoDB table
if first_page:
response = table.scan()
first_page = False
else:
response = table.scan(ExclusiveStartKey = response['LastEvaluatedKey'])
for item in response['Items']:
# Write header row?
if header:
writer.writerow(item.keys())
header = False
writer.writerow(item.values())
# Last page?
if 'LastEvaluatedKey' not in response:
break
# Upload temp file to S3
s3_resource.Bucket(OUTPUT_BUCKET).upload_file(TEMP_FILENAME, OUTPUT_KEY)

AWS Lambda error No module named 'StringIO' or No module named 'StringIO'

I try to use AWS Lambda for mass email sending, the code we use as the link below:
https://aws.amazon.com/cn/premiumsupport/knowledge-center/mass-email-ses-lambda/
from __future__ import print_function
import StringIO
import csv
import json
import os
import urllib
import zlib
from time import strftime, gmtime
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
import boto3
import botocore
import concurrent.futures
__author__ = 'Said Ali Samed'
__date__ = '10/04/2016'
__version__ = '1.0'
# Get Lambda environment variables
region = os.environ['us-east-1']
max_threads = os.environ['10']
text_message_file = os.environ['email_body.txt']
html_message_file = os.environ['email_body.html']
# Initialize clients
s3 = boto3.client('s3', region_name=region)
ses = boto3.client('ses', region_name=region)
send_errors = []
mime_message_text = ''
mime_message_html = ''
def current_time():
return strftime("%Y-%m-%d %H:%M:%S UTC", gmtime())
def mime_email(subject, from_address, to_address, text_message=None, html_message=None):
msg = MIMEMultipart('alternative')
msg['Subject'] = subject
msg['From'] = from_address
msg['To'] = to_address
if text_message:
msg.attach(MIMEText(text_message, 'plain'))
if html_message:
msg.attach(MIMEText(html_message, 'html'))
return msg.as_string()
def send_mail(from_address, to_address, message):
global send_errors
try:
response = ses.send_raw_email(
Source=from_address,
Destinations=[
to_address,
],
RawMessage={
'Data': message
}
)
if not isinstance(response, dict): # log failed requests only
send_errors.append('%s, %s, %s' % (current_time(), to_address, response))
except botocore.exceptions.ClientError as e:
send_errors.append('%s, %s, %s, %s' %
(current_time(),
to_address,
', '.join("%s=%r" % (k, v) for (k, v) in e.response['ResponseMetadata'].iteritems()),
e.message))
def lambda_handler(event, context):
global send_errors
global mime_message_text
global mime_message_html
try:
# Read the uploaded csv file from the bucket into python dictionary list
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')
response = s3.get_object(Bucket=bucket, Key=key)
body = zlib.decompress(response['Body'].read(), 16+zlib.MAX_WBITS)
reader = csv.DictReader(StringIO.StringIO(body),
fieldnames=['from_address', 'to_address', 'subject', 'message'])
# Read the message files
try:
response = s3.get_object(Bucket=bucket, Key=text_message_file)
mime_message_text = response['Body'].read()
except:
mime_message_text = None
print('Failed to read text message file. Did you upload %s?' % text_message_file)
try:
response = s3.get_object(Bucket=bucket, Key=html_message_file)
mime_message_html = response['Body'].read()
except:
mime_message_html = None
print('Failed to read html message file. Did you upload %s?' % html_message_file)
if not mime_message_text and not mime_message_html:
raise ValueError('Cannot continue without a text or html message file.')
# Send in parallel using several threads
e = concurrent.futures.ThreadPoolExecutor(max_workers=max_threads)
for row in reader:
from_address = row['from_address'].strip()
to_address = row['to_address'].strip()
subject = row['subject'].strip()
message = mime_email(subject, from_address, to_address, mime_message_text, mime_message_html)
e.submit(send_mail, from_address, to_address, message)
e.shutdown()
except Exception as e:
print(e.message + ' Aborting...')
raise e
print('Send email complete.')
# Remove the uploaded csv file
try:
response = s3.delete_object(Bucket=bucket, Key=key)
if 'ResponseMetadata' in response.keys() and response['ResponseMetadata']['HTTPStatusCode'] == 204:
print('Removed s3://%s/%s' % (bucket, key))
except Exception as e:
print(e)
# Upload errors if any to S3
if len(send_errors) > 0:
try:
result_data = '\n'.join(send_errors)
logfile_key = key.replace('.csv.gz', '') + '_error.log'
response = s3.put_object(Bucket=bucket, Key=logfile_key, Body=result_data)
if 'ResponseMetadata' in response.keys() and response['ResponseMetadata']['HTTPStatusCode'] == 200:
print('Send email errors saved in s3://%s/%s' % (bucket, logfile_key))
except Exception as e:
print(e)
raise e
# Reset publish error log
send_errors = []
if __name__ == "__main__":
json_content = json.loads(open('event.json', 'r').read())
lambda_handler(json_content, None)
but it has problem when i choose python 2.7.the error is
module initialization error 'us-east-1'
when i choose python 3.6 the error is
Unable to import module 'lambda_function': No module named 'StringIO'
anyone can tell me what is the problem it is ?
From Python v3, the StringIO module has gone. Instead, import the io module and use io.StringIO.
The problem with the v27 version is presumably that the following statement is failing:
region = os.environ['us-east-1']
This will result in a KeyError if us-east-1 is not an available environment variable. Instead use AWS_REGION or AWS_DEFAULT_REGION. See the full list of Lambda environment variables.
Please set the environment variables as described in step 4 of the article:
"Configure Lambda environment variables appropriate to your usage scenario. For example, the following variables would be valid for a given use case:
REGION=us-east-1, MAX_THREADS=10, TEXT_MESSAGE_FILE=email_body.txt, HTML_MESSAGE_FILE=email_body.html."
What was done (as per the code provided in the question) is replacing names of environment variables with their values, which means that python is looking for e.g. 'us-east-1' environment variable which isn't there...
This is the original code
# Get Lambda environment variables
region = os.environ['REGION']
max_threads = os.environ['MAX_THREADS']
text_message_file = os.environ['TEXT_MESSAGE_FILE']
html_message_file = os.environ['HTML_MESSAGE_FILE']
You can also hard-code the values, like below:
# Get Lambda environment variables
region = 'us-east-1'
max_threads = '10'
text_message_file = 'email_body.txt'
html_message_file = 'email_body.html'
but I'd suggest to set the environment variables instead (and use the version of script provided by the article author). When it comes to setting environment variables in Lambda, see this article :)