Decode stream data from firehose and send to HTTP endpoint - amazon-web-services

I have setup a Kinesis firehose to stream MSK logs to a http endpoint. The http endpoint is a log collector (fluentd) application. The data I receive in the fluentd app is base64 encoded, I want to decode this data so that I can have additional parsing done in the log collector. For this reason I tried to include data transformation with lambda in the firehose stream.
This is the lambda code
import base64
import json
def lambda_handler(event, context):
output = []
for record in event['records']:
payload = base64.b64decode(record['data']).decode('utf-8')
data_dict = json.loads(payload)
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': data_dict
}
output.append(output_record)
print('Successfully processed {} records.'.format(len(event['records'])))
print(output)
return {'records': output}
I see the data decoded in the lambda logs however firehose logs indicate processing failed with the below error message
{.."errorMessage":"Check your function and make sure the output is in required format. In addition to that, make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed",..}
Can firehose handle decoded data? or should I encode this back in the lambda and handle the decoding in fluentd application?

Related

AWS SES python sqs.receive_message returning only one message output

I am using amazon ses python sdk to see how many messages are there in the queue for a given queue URL. in amazon GUI console i can see there are 3 messages within the queue for the queue URL. However i do not get more than 1 message as output everytime i run the command. Below is my code
import boto3
import json
from botocore.exceptions import ClientError
def GetSecretKeyAndAccesskey():
#code to pull secretkey and access key
return(aws_access_key,aws_secret_key)
# Create SQS client
aws_access_key_id,aws_secret_access_key = GetSecretKeyAndAccesskey()
sqs = boto3.client('sqs',aws_access_key_id=str(aws_access_key_id),aws_secret_access_key=str(aws_secret_access_key) ,region_name='eu-west-1')
response = sqs.receive_message(
QueueUrl='my_queue_url',
AttributeNames=[
'All',
],
MaxNumberOfMessages=10,
)
print(response["Messages"][0])
Every time i run the code i get a different message id, and if i change my print code to check for the next list i get list index out of bound meaning that there is only one message
print(response["Messages"][1])
C:\>python testing.py
d4e57e1d-db62-4fc5-8233-c5576cb2603d
C:\>python testing.py
857858e9-55dc-4d23-aead-3c6622feccc5
First, you need to add "WaitTimeSeconds" to turn on long polling and collect more messages during a single connection.
The other issue is that if you only put 3 messages on the queue, they get separated on the backend systems as part of the redundancy of the AWS SQS service. So when you call to SQS, it connects you to one of the systems and delivers the single message that's available. If you increase the number of total messages, you'll get more messages per request.
I wrote this code to demonstrate the functionality of SQS and allow you to play around with the concept and test.
import json
session = boto3.Session(region_name="us-east-2", profile_name="dev")
sqs = session.client('sqs')
def get_message():
response = sqs.receive_message(QueueUrl='test-queue', MaxNumberOfMessages=10, WaitTimeSeconds=10)
return len(response["Messages"])
def put_messages(seed):
for message_number in range(seed):
body = {"test": "message {}".format(message_number)}
sqs.send_message(QueueUrl='test-queue', MessageBody=json.dumps(body))
if __name__ == '__main__':
put_messages(2)
print(get_message())

How to retrieve delivery_attempt from event triggered cloud function?

I a writing a Python Cloud Function and I would like to retrieve the "delivery_attempt" attribute.
Based on the documentation, the Cloud Function gets only 2 parameters: event (of type PubsubMessage) and context.
def hello_pubsub(event, context):
"""Background Cloud Function to be triggered by Pub/Sub.
Args:
event (dict): The dictionary with data specific to this type of
event. The `#type` field maps to
`type.googleapis.com/google.pubsub.v1.PubsubMessage`.
The `data` field maps to the PubsubMessage data
in a base64-encoded string. The `attributes` field maps
to the PubsubMessage attributes if any is present.
context (google.cloud.functions.Context): Metadata of triggering event
including `event_id` which maps to the PubsubMessage
messageId, `timestamp` which maps to the PubsubMessage
publishTime, `event_type` which maps to
`google.pubsub.topic.publish`, and `resource` which is
a dictionary that describes the service API endpoint
pubsub.googleapis.com, the triggering topic's name, and
the triggering event type
`type.googleapis.com/google.pubsub.v1.PubsubMessage`.
Returns:
None. The output is written to Cloud Logging.
"""
How can I retrieve the delivery_attempt on a message?
When trigging a Cloud Function via Pub/Sub, one does not have access to the delivery attempt field. Instead, the way to get access is to set up an HTTP-based Cloud Function and use the trigger URL as the push endpoint in a subscription you create separately.
Then, you can access the delivery attempt as follows:
def hello_world(request):
"""Responds to any HTTP request.
Args:
request (flask.Request): HTTP request object.
Returns:
The response text or any set of values that can be turned into a
Response object using
`make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
"""
request_json = request.get_json()
print(request_json["deliveryAttempt"])

Send emails by reading email address from S3

I am trying the following -
Read the email address from a csv file in S3, first column has email address, second column has the subject, third column has the body of the email.
Send email with the subject and body to the email address read from S3.
I was able to read the file in S3 into a DF using Lambda, but unable to send the email. Any ideas on how to do this using AWS services
you can use same lambda function to create smtp server of your own to send emails. e.g. while parsing data from the S3 csv file, for each entry in csv send email.
#!/usr/bin/env python
import smtplib
from email.mime.text import MIMEText
sender = 'xx#xx.com' # parsed data
receivers = ['yy#yy.com'] # parsed data
port = 1025
msg = MIMEText('email text') # parsed data
msg['Subject'] = 'Test mail'
msg['From'] = sender
msg['To'] = receivers
with smtplib.SMTP('localhost', port) as server:
server.sendmail(sender, receivers, msg.as_string())
print("email sent")
You can send emails from within a Lambda function by invoking the SES service. There is an example of creating a Lambda function (implemented in Java) that shows how to send an email message as part of a larger workflow created using AWS Step Functions. See this example:
Create AWS serverless workflows by using the AWS SDK for Java

Sagemaker boto3 invoke_endpoint - I keep getting type errors for payload. using Blazingtext model endpoint

Let me frame the issue. I have trained a blazingtext model and have an endpoint deployed.
Within my Notebook instance I can call model.predict and get inferences from the endpoint.
I am now trying to set up a lambda and an API gateway for the endpoint. I am having trouble trying to figure out what the payload is supposed to be for Invoke_endpoint(endpoint_name = mymodel,
body = payload)
I keep getting invalid payload format errors
This is what my payload looks like when testing the lambda
{"instances":"string of text"}
the documentation says the body take b'bytes or file like objects. i have tinkered around with IO with no luck. No good blogs or tutorials out there for this particular issue. Only a bunch of videos going over the cookie cutter examples that are out there.
import io
import boto3
import json
import csv
# grab environment variables
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))
data = json.loads(json.dumps(event))
payload = data["instances"]
print(data)
#print(payload)
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='application/json',
Body=payload.getvalue())
#print(response)
#result = json.loads(response['Body'].read().decode())
#print(result)
#pred = int(result['predictions'][0]['score'])
#predicted_label = 'M' if pred == 1 else 'B'
return ```
"errorMessage": "An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (406) from model with message \"Invalid payload format\"
If your payload is what you describe, i.e.:
payload = {"instances":"string of text"}
then you can get it in the form of json string using:
json.dumps(payload)
# which gives:
'{"instances": "string of text"}'
If you want it in bate array, then you can do as follows:
json.dumps(payload).encode()
# which gives:
b'{"instances": "string of text"}'

Making AWS API Gatway's response the lambda function response

Im trying to create a simple API gateway in which, with a POST method to a certain endpoint, a lambda function is executed.
Setting that up was easy enough, but I'm having some trouble with the request/response handling. Im sending the following request to the API Gateway (Im using python 3.7).
payload = {
"data": "something",
"data2": "sometsadas"
}
response = requests.post('https://endpoint.com/test', params = payload)
That endpoint activates a lambda function when accesed. That function just returns the same event it received.
import json
def lambda_handler(event, context):
# TODO implement
return event
How can I make it so the return value of my lambda function is actually the response from the request? (Or at least a way in which the return value can be found somewhere inside the response)
Seems it was a problem with how the information is sent, json format is required. Solved it by doing the following in the code.
payload{'data': 'someData'}
config_response = requests.post(endpointURL, data = json.dumps(config_payload))