Lambda1 > SNS > SQS > Lambda2: Receiving oddly formatted events - amazon-web-services

I have the following architecture on AWS:
A Lambda function (let's call it Lambda1), which publishes message onto...
An SNS topic, which is subscribed to by...
An SQS queue, which is subscribed to by...
Another Lambda function (let's call it Lambda2).
The event being received by Lambda2 looks like this:
{
"message": "{'Records': [{'messageId': 'REDACTED', 'receiptHandle': 'REDACTED', 'body': '{\\n \"Type\" : \"Notification\",\\n \"MessageId\" : \"REDACTED\",\\n \"SequenceNumber\" : \"10000000000000054000\",\\n \"TopicArn\" : \"arn:aws:sns:REDACTED\",\\n \"Subject\" : \"Blackout\",\\n \"Message\" : \"{\\\\\"channel\\\\\": \\\\\"REDACTED\\\\\", \\\\\"blackout\\\\\": {\\\\\"id\\\\\": 2452, \\\\\"name\\\\\": \\\\\"Name Goes Here\\\\\", \\\\\"location_id\\\\\": 2, \\\\\"reason\\\\\": \\\\\"Approaching capacity (9/1)\\\\\", \\\\\"start_date\\\\\": \\\\\"2022-08-05 00:00:00\\\\\", \\\\\"end_date\\\\\": \\\\\"2022-08-06 00:00:00\\\\\", \\\\\"blackout_scope\\\\\": \\\\\"product\\\\\", \\\\\"status\\\\\": \\\\\"pending create\\\\\", \\\\\"inventory_id\\\\\": 2444894}}\",\\n \"Timestamp\" : \"2022-08-03T19:40:15.540Z\",\\n \"UnsubscribeURL\" : \"https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=REDACTED\"\\n}', 'attributes': {'ApproximateReceiveCount': '3', 'SentTimestamp': '1659555615567', 'SequenceNumber': '18871590311294705152', 'MessageGroupId': '2452_create', 'SenderId': 'REDACTED', 'MessageDeduplicationId': 'REDACTED', 'ApproximateFirstReceiveTimestamp': '1659555615567'}, 'messageAttributes': {}, 'md5OfBody': '1be1a5ee5146faf13cb7cb5bb7f678d0', 'eventSource': 'aws:sqs', 'eventSourceARN': 'arn:aws:sqs:us-west-2:659966244640:BlackoutsWorkerSpotHeroQueueNew.fifo', 'awsRegion': 'us-west-2'}]}",
"levelname": "INFO",
"name": "REDACTED",
"asctime": "2022-08-03 20:46:58,732",
"trace_id": null
}
Note the body in the message. It's been JSON-stringified and then escaped (which is expected – right?). But the body contains its own nested message key – which has also been JSON-stringified and then escaped. So the whole thing is a riot of backslashes.
Is this normal and expected? Or does it suggest a problem with the configuration of the various AWS resources? This is my first time implementing something like this.
Thanks!

Related

PubSub messages not showing event type for Cloud storage notification

As mentioned in this PubSub Notifications Attributes Documentation, I should be able to retrieve the Attributes such as eventType for all notifications sent by Cloud Storage to Pub/Sub topic.
However, in my case, I am not seeing any of the payload attributes when an object is added/removed from a cloud storage bucket that has been configured to send notifications.
Below is the message I am getting when an object is added to a cloud storage bucket:
{
"kind": "storage#object",
"id": "coral-ethos-xxxx.appspot.com/part-00000-of-00001.avro/1642783080470217",
"selfLink": "https://www.googleapis.com/storage/v1/b/coral-ethos-xxxxx.appspot.com/o/part-00000-of-00001.avro",
"name": "part-00000-of-00001.avro",
"bucket": "coral-ethos-xxxxx.appspot.com",
"generation": "1642783080470217",
"metageneration": "1",
"contentType": "application/octet-stream",
"timeCreated": "2022-01-21T16:38:00.624Z",
"updated": "2022-01-21T16:38:00.624Z",
"storageClass": "STANDARD",
"timeStorageClassUpdated": "2022-01-21T16:38:00.624Z",
"size": "202521",
"md5Hash": "w0QzRMUCOHj42vpME2P/Ww==",
"mediaLink": "https://www.googleapis.com/download/storage/v1/b/coral-ethos-xxxxxx.appspot.com/o/part-00000-of-00001.avro?generation=1642783080470217&alt=media",
"contentLanguage": "en",
"crc32c": "XdhSwg==",
"etag": "CMmt0e+jw/UCEAE="
}
eventType and notificationConfiguration attributes are not available on the above message.
I cannot identify if the object was added or removed without the 2 mentioned attributes.
Below is the code I used to display the message:
ProjectSubscriptionName subscriptionName =
ProjectSubscriptionName.of(projectId, subscriptionId);
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
// Handle incoming message, then ack the received message.
System.out.println("Id: " + message.getMessageId());
System.out.println("Data: " + message.getData().toStringUtf8());
consumer.ack();
};
Can someone let me know if I am missing something in configuring the storage bucket?

S3 key not in S3 event notification

Hello I currently have an event notification set up with my s3 bucket. This notification is sent to a SNS topic, then a SQS Queue, and finally a lambda.
My ultimate goal is for my lambda to read the event notification json and parse out the bucket and key.
The problem is that I see the bucket name in the json but not the key when I print out the 'event' object using python. How/where should I go debug to figure out what is going on? I do remember seeing the key in the json from previous implementations
the json looks like:
{
'Records': [{
'messageId': '15d42178-c59c-4f3a-8efa-cce8a20acd5b',
'receiptHandle': 'AQEBnPN7q4+jLFQfExOytZYH69w4kvI4ohjJGFqUqOAvCjRHMfbFFvgEeLVjonZ5q4GAYyzLzDSRQmZv3+YTvE3VYqKmU+Nt0rgX824LoMMkKKMuSWBT6c1a0X5dXRJRzFaOKjpniONRg5Gdm1V9I/7mW0x+Zfi0PXr5cQZXVA1NNUdJ4tIkwtpuC+Rh/dbGFQmAo6fQDuCnpzRW1NKGGda440t3ivtUQMvrniwY8ILKVoX9pnS1rAVgVPGBUo8mXyH9ec9p/Er9O9N5Kxc3xQE44MhHUygD1iJbRROBHG9m0Mj6qbKx4uI7S4KQVWRK8hHkxYFUtP4NzhzcGP1LfY91+zG4mNweGzQkfDbvn0LG9+6guxv9dW+uGz1c3f9My7272s+ABfksvfbNRgPSgwJecg==',
'body': '{\n "Type" : "Notification",\n "MessageId" : "78cadcbb-f349-5bae-b39b-85504866b186",\n "TopicArn" : "<topic arn>",\n "Subject" : "Amazon S3 Notification",\n "Message" : "{\\"Service\\":\\"Amazon S3\\",\\"Event\\":\\"s3:TestEvent\\",\\"Time\\":\\"2021-10-21T19:01:03.083Z\\",\\"Bucket\\":\\"<s3 bucket>\\",\\"RequestId\\":\\"TCJP8AZ6S75XXXPN\\",\\"HostId\\":\\"VYNq+Jh5Hkg+Vykp2RcIy9lSca7uJyhzLPfE8tcgnt3Je9kH0I+H3zvzvJkd6IvfZKZm2jYqu4Q=\\"}",\n "Timestamp" : "2021-10-21T19:01:03.285Z",\n "SignatureVersion" : "1",\n "Signature" : "EE9xsZx8hezxh8Yhyj8DLc+VSGYowl641kHgqr8tWq2msNwOBv4KEZoTtHJ/hdnfNYLEBsR7imsfv5ZrX7nKRKL2kR8xax57tcih7GRbifIuFyrs9wAhtcuclf2NJQG4eY9OrOHHxPN3fSvNI9xduPeBrxB2TAfbTcWq4AeN0C4KriV18J2dU28ecMJGtmqK0JM+2KLEuwQe/dyYiEnEnWu5EfGweDhYCRvmB1aUPRcW4s3yOHIckklmHhBLkbmufl1me/hdO7GEGa1ju8wJDF33hmmCCSE6M7ITl9niWICBtvWlFz1Md5OiswyriRyN4LZjmvEjzRZtNwy/qMkDYA==",\n "SigningCertURL" : "<cert url>",\n "UnsubscribeURL" : "<unsubscribe url>"\n}',
'attributes': {
'ApproximateReceiveCount': '34',
'SentTimestamp': <timestamp>,
'SenderId': <senderid>,
'ApproximateFirstReceiveTimestamp': <timestamp>
},
'messageAttributes': {},
'md5OfBody': <md5>,
'eventSource': 'aws:sqs',
'eventSourceARN': <queue-arn>,
'awsRegion': 'us-east-1'
}]
}
In your Amazon SNS subscription, activate Amazon SNS raw message delivery.
This will pass-through the S3 Event in a cleaner form, with the body containing a string-version of the JSON from S3. You'll need to use JSON.parse() to convert it to an object.

Regex filtering of messages in SNS

Is there a way to filter messages based on Regex or substring in AWS SNS?
AWS Documentation for filtering messages mentions three types of filtering for strings:
Exact matching (whitelisting)
Anything-but matching (blacklisting)
Prefix matching
I want to filter out messages based on substrings in the messages, for example
I have a S3 event that sends a message to SNS when a new object is added to S3, the contents of the message are as below:
{
"Records": [
{
"s3": {
"bucket": {
"name": "images-bucket"
},
"object": {
"key": "some-key/more-key/filteringText/additionaldata.png"
}
}
}
]
}
I want to keep the messages if only filteringText is present in key field.
Note: The entire message is sent as text by S3 notification service, so Records is not a json object but string.
From what I've seen in the documentation, you can't do regex matches or substrings, but you can match prefixes and create your own attributes in the MessageAttributes field.
To do this, I send the S3 event to a simple Lambda that adds MessageAttributes and then sends to SNS.
In effect, S3 -> Lambda -> SNS -> other consumers (with filtering).
The Lambda can do something like this (where you'll have to programmatically decide when to add the attribute):
let messageAttributes = {
myfilterkey: {DataType: "String", StringValue:"filteringText"}
};
let params = {
Message: JSON.stringify(payload),
MessageAttributes: messageAttributes,
MessageStructure: 'json',
TargetArn: SNS_ARN
};
await sns.publish(params).promise();
Then in SNS you can filter:
{"myfilterkey": ["filtertext"]}
It seems a little convoluted to put the Lambda in there, but I like the idea of being able to plug and unplug consumers from SNS on the fly and use filtering to determine who gets what.

How to I index the transformed log records into AWS Elasticsearch?

TLDR
The lambda function is not able to index the firehose logs into the AWS managed ES due to an "encoding problem".
Actual Error Response
I do not get any error when I base64 encode a single logEvent from a firehose record and send the collected records to the AWS managed ES.
See the next section for more details.
The base 64 encoded compressed payload is being sent to ES as the resulting json transformation is too big for ES to index - see this ES link.
I get the following error from the AWS managed ES:
{
"deliveryStreamARN": "arn:aws:firehose:us-west-2:*:deliverystream/*",
"destination": "arn:aws:es:us-west-2:*:domain/*",
"deliveryStreamVersionId": 1,
"message": "The data could not be decoded as UTF-8",
"errorCode": "InvalidEncodingException",
"processor": "arn:aws:lambda:us-west-2:*:function:*"
}
If the output record is not compressed, the body size is too long (as small as 14MB). Without compression and a simple base64 encoded payload, I get the following error in the Lambda logs:
{
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "not_x_content_exception",
"reason": "Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"
}
}
Description
I have Cloudwatch logs that are getting buffered by size / interval which gets fed into a Kinesis Firehose. The firehose transports the logs into a lambda function which transforms the log into a json record which should then send it over to the AWS managed Elasticsearch cluster.
The lambda function gets the following JSON structure:
{
"invocationId": "cf1306b5-2d3c-4886-b7be-b5bcf0a66ef3",
"deliveryStreamArn": "arn:aws:firehose:...",
"region": "us-west-2",
"records": [{
"recordId": "49577998431243709525183749876652374166077260049460232194000000",
"approximateArrivalTimestamp": 1508197563377,
"data": "some_compressed_data_in_base_64_encoding"
}]
}
The lambda function then extracts .records[].data and decodes the data as base64 and decompresses the data which results in the following JSON:
{
"messageType": "DATA_MESSAGE",
"owner": "aws_account_number",
"logGroup": "some_cloudwatch_log_group_name",
"logStream": "i-0221b6ec01af47bfb",
"subscriptionFilters": [
"cloudwatch_log_subscription_filter_name"
],
"logEvents": [
{
"id": "33633929427703365813575134502195362621356131219229245440",
"timestamp": 1508197557000,
"message": "Oct 16 23:45:57 some_log_entry_1"
},
{
"id": "33633929427703365813575134502195362621356131219229245441",
"timestamp": 1508197557000,
"message": "Oct 16 23:45:57 some_log_entry_2"
},
{
"id": "33633929427703365813575134502195362621356131219229245442",
"timestamp": 1508197557000,
"message": "Oct 16 23:45:57 some_log_entry_3"
}
]
}
Individual item from .logEvents[] gets transformed into a json structure where the keys are the desired columns when searching logs within Kibana - something like this:
{
'journalctl_host': 'ip-172-11-11-111',
'process': 'haproxy',
'pid': 15507,
'client_ip': '172.11.11.111',
'client_port': 3924,
'frontend_name': 'http-web',
'backend_name': 'server',
'server_name': 'server-3',
'time_duration': 10,
'status_code': 200,
'bytes_read': 79,
'#timestamp': '1900-10-16T23:46:01.0Z',
'tags': ['haproxy'],
'message': 'HEAD / HTTP/1.1'
}
The transformed json gets collected into an array which gets zlib compressed and base64 encoded string which is then transformed into a new json payload as the final lambda result:
{
"records": [
{
"recordId": "49577998431243709525183749876652374166077260049460232194000000",
"result": "Ok",
"data": "base64_encoded_zlib_compressed_array_of_transformed_logs"
}
]}
Cloudwatch configuration
13 log entries (~4kb) can get transformed to about 635kb.
I have also decreased the thresholds for the awslogs, hoping that the size of the logs that are being sent to Lambda function is going to small:
buffer_duration = 10
batch_count = 10
batch_size = 500
Unfortunately, when there is a burst - the spike can be upwards of 2800 lines where the size is upwards of 1MB.
When the resulting payload from the lambda function is "too big" (~13mb of transformed logs), an error is logged in the lambda cloudwatch logs - "body size is too long". There doesn't seem to be any indication where this error is coming from or whether there is a size limit on the lambda fn's response payload.
So, the AWS support folks have told me that the following limitations can't be mitigated to solve this flow:
lambda payload size
compressed firehose payload incoming into lambda which is directly proportional to the lambda output.
Instead, I have modified the architecture to the following:
Cloudwatch logs are backed up in S3 via Firehose.
S3 events are processed by the lambda function.
The lambda function returns a success code if the lambda transforms and is able to successfully bulk index the logs into ES.
If the lambda function fails, a Dead Letter Queue (AWS SQS) is configured with a cloudwatch alarm. A sample cloudformation snippet can be found here.
If there are SQS messages, one could manually invoke the lambda function with those messages or set up a AWS batch job to process the SQS messages with the lambda function. However, one should be careful, that the lambda function doesn't failover again into the DLQ. Check the lambda cloudwatch logs to check why that message was not processed and sent over to the DLQ.

AWS IoT Rule results in empty Payload

My weather-station is publishing its status via MQTT to AWS IoT.
The message is published to topic
$aws/things/my-weather-station-001/shadow/update and looks like this:
{
"state": {
"reported": {
"temperature" : 22,
"humidity" : 70,
....
"wind" : 234,
"air" : 345
}
}
After message is received I have create a rule to store it in AWS DynamoDB the rules select statement is:
SELECT state.reported.* FROM $aws/things/+/shadow/update/accepted
And when this works well, whilst I am sending messages containing state.reported field.
However sometimes to the topic $aws/things/weather-station-0001/shadow/update are sent "control" messages telling device to switch on an LED or some other part. These messages would be usually sent by an app or a controlling server and look like this notice that instead of reported field it hasdesired
{
"state": {
"desired": {
"led1" : "on",
"locked" : true
}
}
So when these messages are arriving, they ARE STILL processed by the rule and arrive to the DynamoDb table with {} empty payload.
Is there any way to force the Rule to ignore messages not containing state.reported element?
You can add a where clause to your SQL statement. Try
SELECT state.reported.* FROM $aws/things/+/shadow/update/accepted WHERE state.reported <> ''