I am attempting to load a simple transactions.txt table into a S3 bucket where a Lambda function reads the file and populates DynamoDB tables for Customers and Transactions. This all works fine. However, I also have a Lambda function that is supposed to read the Transactions table as they populate the table and sum up the transaction totals by customer and insert them into another DynamoDB table--TransactionTotal.
My TotalNotifier Lambda function throws a "KeyError" regarding a "New Image". I believe the code is fine, and I have tried changing the type of Streams from 'New and Old' to just 'New' for the Transactions table and still encounter same error.
from __future__ import print_function
import json, boto3
# Connect to SNS
sns = boto3.client('sns')
alertTopic = 'HighBalanceAlert'
snsTopicArn = [t['TopicArn'] for t in sns.list_topics()['Topics'] if t['TopicArn'].endswith(':' + alertTopic)][0]
# Connect to DynamoDB
dynamodb = boto3.resource('dynamodb')
transactionTotalTableName = 'TransactionTotal'
transactionsTotalTable = dynamodb.Table(transactionTotalTableName);
# This handler is executed every time the Lambda function is triggered
def lambda_handler(event, context):
# Show the incoming event in the debug log
print("Event received by Lambda function: " + json.dumps(event, indent=2))
# For each transaction added, calculate the new Transactions Total
for record in event['Records']:
customerId = record['dynamodb']['NewImage']['CustomerId']['S']
transactionAmount = int(record['dynamodb']['NewImage']['TransactionAmount']['N'])
# Update the customer's total in the TransactionTotal DynamoDB table
response = transactionsTotalTable.update_item(
Key={
'CustomerId': customerId
},
UpdateExpression="add accountBalance :val",
ExpressionAttributeValues={
':val': transactionAmount
},
ReturnValues="UPDATED_NEW"
)
Here is a sample error from the CloudWatch log:
'NewImage': KeyError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 30, in lambda_handler
customerId = record['dynamodb']['NewImage']['CustomerId']['S']
KeyError: 'NewImage'
To elaborate on Oluwafemi's comment, you're likely experiencing this error when receiving a REMOVE event. Regardless of whether your stream is new and old images, or just new, you won't receive a NEW_IMAGE on a REMOVE event, since there is no new image. Check out the example events on aws docs.
A check on the value of record['eventName'] should solve the issue.
Related
I have a use case where I have to filter incoming data from Kinesis Firehose based on the type of the event. I should write only certain events to S3 and ignore the rest of the events. I am using lambda to filter the records. I am using following python code to achieve this:
def lambda_handler(event, context):
# TODO implement
output = []
for record in event['records']:
payload = base64.b64decode(record["data"])
payload_json = json.loads(payload)
event_type = payload_json["eventPayload"]["operation"]
if event_type == "create" or event_type == "update":
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(payload)}
output.append(output_record)
else:
output_record = {
'recordId': record['recordId'],
'result': 'Dropped'}
output.append(output_record)
return {'records': output}
I am only trying to process "create" and "update" events and dropping the rest of the events. I got the sample code from AWS docs and built it from there.
This is giving the following error:
{"attemptsMade":1,"arrivalTimestamp":1653289182740,"errorCode":"Lambda.MissingRecordId","errorMessage":"One or more record Ids were not returned. Ensure that the Lambda function returns all received record Ids.","attemptEndingTimestamp":1653289231611,"rawData":"some data","lambdaArn":"arn:$LATEST"}
I am not able to get what this error means and how to fix it.
Bug: The return statement needs to be outside of the for loop. This is the cause of the error. The function is processing multiple recordIds, but only 1 recordId is returned. Unindent the return statement.
The data key must be included in output_record, even if the event is being dropped. You can base64 encode the original payload with no transformations.
Additional context: event['records'] and output must be the same length (length validation). Each dictionary in output must have a recordId key whose value equals a recordId value in a dictionary in event['record'] (recordId validation).
From AWS documentation:
The record ID is passed from Kinesis Data Firehose to Lambda during the invocation. The transformed record must contain the same record ID. Any mismatch between the ID of the original record and the ID of the transformed record is treated as a data transformation failure.
Reference: Amazon Kinesis Data Firehose Data Transformation
Hi Stackoverflow I'm trying to conditionally put an item within a DynamoDB table. The DynamoDB table has the following attributes.
ticker - Partition Key
price_date - Sort Key
price - Attribute
Every minute I'm calling an API which gives me a minute by minute list of dictionaries for all stock prices within the day so far. However, the data I receive from the API sometimes can be behind by a minute or two. I don't particularly want to overwrite all the records within the DynamoDB table every time I get new data. To achieve this I've tried to create a conditional expression to only use put_item when there is a match on ticker but there is a new price_date
I've created a simplification of my code below to better illustrate my problem.
import boto3
from boto3.dynamodb.conditions import Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('stock-intraday')
data = [
{'ticker': 'GOOG', 'price_date': '2021-10-08T9:30:00.000Z', 'price': 100},
{'ticker': 'GOOG', 'price_date': '2021-10-08T9:31:00.000Z', 'price': 101}
]
for item in data:
dynamodb_response = table.put_item(Item=item,
ConditionExpression=Attr("ticker").exists() & Attr("price_date").not_exists())
However when I run this code I get this error...
What is wrong with my conditional expression?
Found an answer to my own problem. DynamoDB was throwing an error because my code WAS working but with some minor changes.
There needed to be a TRY EXCEPT block but also since the partition key is already evaluated only the price_date needed to be included within the condition expression
import boto3
from boto3.dynamodb.conditions import Attr
from botocore.exceptions import ClientError
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('stock-intraday')
data = [
{'ticker': 'GOOG', 'price_date': '2021-10-08T9:30:00.000Z', 'price': 100},
{'ticker': 'GOOG', 'price_date': '2021-10-08T9:31:00.000Z', 'price': 101}]
for item in data:
try:
dynamodb_response = table.put_item(Item=item,
ConditionExpression=Attr("price_date").not_exists())
except ClientError as e:
if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
pass
My Requirement
I want to create a CloudWatch-Metric from Athena query results.
Example
I want to create a metric like user_count of each day.
In Athena, I will write an SQL query like this
select date,count(distinct user) as count from users_table group by 1
In the Athena editor I can see the result, but I want to see these results as a metric in Cloudwatch.
CloudWatch-Metric-Name ==> user_count
Dimensions ==> Date,count
If I have this cloudwatch metric and dimensions, I can easily create a Monitoring Dashboard and send send alerts
Can anyone suggest a way to do this?
You can use CloudWatch custom widgets, see "Run Amazon Athena queries" in Samples.
It's somewhat involved, but you can use a Lambda for this. In a nutshell:
Setup your query in Athena and make sure it works using the Athena console.
Create a Lambda that:
Runs your Athena query
Pulls the query results from S3
Parses the query results
Sends the query results to CloudWatch as a metric
Use EventBridge to run your Lambda on a recurring basis
Here's an example Lambda function in Python that does step #2. Note that the Lamda function will need IAM permissions to run queries in Athena, read the results from S3, and then put a metric into Cloudwatch.
import time
import boto3
query = 'select count(*) from mytable'
DATABASE = 'default'
bucket='BUCKET_NAME'
path='yourpath'
def lambda_handler(event, context):
#Run query in Athena
client = boto3.client('athena')
output = "s3://{}/{}".format(bucket,path)
# Execution
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': DATABASE
},
ResultConfiguration={
'OutputLocation': output,
}
)
#S3 file name uses the QueryExecutionId so
#grab it here so we can pull the S3 file.
qeid = response["QueryExecutionId"]
#occasionally the Athena hasn't written the file
#before the lambda tries to pull it out of S3, so pause a few seconds
#Note: You are charged for time the lambda is running.
#A more elegant but more complicated solution would try to get the
#file first then sleep.
time.sleep(3)
###### Get query result from S3.
s3 = boto3.client('s3');
objectkey = path + "/" + qeid + ".csv"
#load object as file
file_content = s3.get_object(
Bucket=bucket,
Key=objectkey)["Body"].read()
#split file on carriage returns
lines = file_content.decode().splitlines()
#get the second line in file
count = lines[1]
#remove double quotes
count = count.replace("\"", "")
#convert string to int since cloudwatch wants numeric for value
count = int(count)
#post query results as a CloudWatch metric
cloudwatch = boto3.client('cloudwatch')
response = cloudwatch.put_metric_data(
MetricData = [
{
'MetricName': 'MyMetric',
'Dimensions': [
{
'Name': 'DIM1',
'Value': 'dim1'
},
],
'Unit': 'None',
'Value': count
},
],
Namespace = 'MyMetricNS'
)
return response
return
I have created a Lambda that subscribes to a specific log group and gets triggered everytime the log group is updated.
However, for some reason the Lambda gets triggered three times instead of just one. The Lambda is supposed to export log files to a S3 Bucket, and since it's triggered three times it exports the same logs three times. My first thought was that the Lambda was timing out and therefor was triggered multiple times but I've checked the logs and the execution is successful every time, and every execution has a unique RequestId.
Any thoughts about this? Any help is appreciated.
This is what my Lambda looks like:
import boto3
from datetime import timedelta, datetime
def lambda_handler(event, context):
startTime = datetime.utcnow() - timedelta(hours = 2)
endTime = datetime.utcnow()
cloudwatch = boto3.client('logs')
response = cloudwatch.create_export_task(
taskName = 'LogExport',
logGroupName = '/aws/lambda/logGroupName',
fromTime = int(round(startTime.timestamp() * 1000)),
to = int(round(endTime.timestamp() * 1000)),
destination='s3Bucket')
return {
'status': 200,
'body': 'Lambda executed succesfully!'
}
This can be considered a follow-up to this thread, but I need more help with moving things along. Hopefully someone can have a look over my attempts below and provide further guidance.
To summarize, I need a cloud function that
Is triggered by a PubSub message being published in topic A (this can be done in UI).
reads a messy object change notification message in "push" PubSub topic A.
"parse" it
publish a message in PubSub topic B, with the original message ID as data, and other metadata (e.g. file name, size, time) as attributes.
. 1:
Example of a messy object change notification:
\n "kind": "storage#object",\n "id": "bucketcfpubsub/test.txt/1544681756538155",\n "selfLink": "https://www.googleapis.com/storage/v1/b/bucketcfpubsub/o/test.txt",\n "name": "test.txt",\n "bucket": "bucketcfpubsub",\n "generation": "1544681756538155",\n "metageneration": "1",\n "contentType": "text/plain",\n "timeCreated": "2018-12-13T06:15:56.537Z",\n "updated": "2018-12-13T06:15:56.537Z",\n "storageClass": "STANDARD",\n "timeStorageClassUpdated": "2018-12-13T06:15:56.537Z",\n "size": "1938",\n "md5Hash": "sDSXIvkR/PBg4mHyIUIvww==",\n "mediaLink": "https://www.googleapis.com/download/storage/v1/b/bucketcfpubsub/o/test.txt?generation=1544681756538155&alt=media",\n "crc32c": "UDhyzw==",\n "etag": "CKvqjvuTnN8CEAE="\n}\n
To clarify, is this a message with blank "data" field, and all the information above are in attribute pairs (like "attribute name": "attribute data")? Or is it just a long string stuffed into the "data" field, with no "attributes"?
. 2:
In the above thread, a "pull" subscription is used. Is it better than using a "push" subscription? Push sample below:
def create_push_subscription(project_id,
topic_name,
subscription_name,
endpoint):
"""Create a new push subscription on the given topic."""
# [START pubsub_create_push_subscription]
from google.cloud import pubsub_v1
# TODO project_id = "Your Google Cloud Project ID"
# TODO topic_name = "Your Pub/Sub topic name"
# TODO subscription_name = "Your Pub/Sub subscription name"
# TODO endpoint = "https://my-test-project.appspot.com/push"
subscriber = pubsub_v1.SubscriberClient()
topic_path = subscriber.topic_path(project_id, topic_name)
subscription_path = subscriber.subscription_path(
project_id, subscription_name)
push_config = pubsub_v1.types.PushConfig(
push_endpoint=endpoint)
subscription = subscriber.create_subscription(
subscription_path, topic_path, push_config)
print('Push subscription created: {}'.format(subscription))
print('Endpoint for subscription is: {}'.format(endpoint))
# [END pubsub_create_push_subscription]
Or do I need further code after this to receive messages?
Also, doesn't this create a new subscriber every time the Cloud Function is triggered by a pubsub message being published? Should I add a subscription delete code at the end of the CF, or are there more efficient ways to do this?
. 3:
Next, to parse the code, this sample code doing a few attributes as follows:
def summarize(message):
# [START parse_message]
data = message.data
attributes = message.attributes
event_type = attributes['eventType']
bucket_id = attributes['bucketId']
object_id = attributes['objectId']
Will this work with my above notification in 1:?
. 4:
How do I separate the topic_name? Steps 1 and 2 use topic A, while this step is to publish into topic B. Is is as simple as re-writing the topic_name in the below code example?
# TODO topic_name = "Your Pub/Sub topic name"
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project_id, topic_name)
for n in range(1, 10):
data = u'Message number {}'.format(n)
# Data must be a bytestring
data = data.encode('utf-8')
# Add two attributes, origin and username, to the message
publisher.publish(
topic_path, data, origin='python-sample', username='gcp')
print('Published messages with custom attributes.')
Source where I got most of the sample code from (besides the above thread):python-docs-samples. Will adapting and stringing the above code samples together produce useful code? Or will I still be missing stuff like "import ****"?
You should not attempt to manually create a Subscriber running in Cloud Functions. Instead, follow the documentation here for setting up a Cloud Function which will be called with all messages sent to a given topic by passing the --trigger-topic command line parameter.
To address some of your other concerns:
“Should I add a subscription delete code at the end of the CF”- Subscriptions are long-lived resources corresponding to a specific backlog of messages. If the subscription is created and deleted at the end of the cloud function, messages sent when it does not exist will not be received.
“How do I separate the topic_name”- The ‘topic_name’ in this example refers to the last part of the string formatted like this projects/project_id/topics/topic_name that will appear on this page in the cloud console for your topic after it has been created.