AWS Lambda gets executed 3 times - amazon-web-services

I have created a Lambda that subscribes to a specific log group and gets triggered everytime the log group is updated.
However, for some reason the Lambda gets triggered three times instead of just one. The Lambda is supposed to export log files to a S3 Bucket, and since it's triggered three times it exports the same logs three times. My first thought was that the Lambda was timing out and therefor was triggered multiple times but I've checked the logs and the execution is successful every time, and every execution has a unique RequestId.
Any thoughts about this? Any help is appreciated.
This is what my Lambda looks like:
import boto3
from datetime import timedelta, datetime
def lambda_handler(event, context):
startTime = datetime.utcnow() - timedelta(hours = 2)
endTime = datetime.utcnow()
cloudwatch = boto3.client('logs')
response = cloudwatch.create_export_task(
taskName = 'LogExport',
logGroupName = '/aws/lambda/logGroupName',
fromTime = int(round(startTime.timestamp() * 1000)),
to = int(round(endTime.timestamp() * 1000)),
destination='s3Bucket')
return {
'status': 200,
'body': 'Lambda executed succesfully!'
}

Related

GCP Cloud Tasks: shorten period for creating a previously created named task

We are developing a GCP Cloud Task based queue process that sends a status email whenever a particular Firestore doc write-trigger fires. The reason we use Cloud Tasks is so a delay can be created (using scheduledTime property 2-min in the future) before the email is sent, and to control dedup (by using a task-name formatted as: [firestore-collection-name]-[doc-id]) since the 'write' trigger on the Firestore doc can be fired several times as the document is being created and then quickly updated by backend cloud functions.
Once the task's delay period has been reached, the cloud-task runs, and the email is sent with updated Firestore document info included. After which the task is deleted from the queue and all is good.
Except:
If the user updates the Firestore doc (say 20 or 30 min later) we want to resend the status email but are unable to create the task using the same task-name. We get the following error:
409 The task cannot be created because a task with this name existed too recently. For more information about task de-duplication see https://cloud.google.com/tasks/docs/reference/rest/v2/projects.locations.queues.tasks/create#body.request_body.FIELDS.task.
This was unexpected as the queue is empty at this point as the last task completed succesfully. The documentation referenced in the error message says:
If the task's queue was created using Cloud Tasks, then another task
with the same name can't be created for ~1hour after the original task
was deleted or executed.
Question: is there some way in which this restriction can be by-passed by lowering the amount of time, or even removing the restriction all together?
The short answer is No. As you've already pointed, the docs are very clear regarding this behavior and you should wait 1 hour to create a task with same name as one that was previously created. The API or Client Libraries does not allow to decrease this time.
Having said that, I would suggest that instead of using the same Task ID, use different ones for the task and add an identifier in the body of the request. For example, using Python:
from google.cloud import tasks_v2
from google.protobuf import timestamp_pb2
import datetime
def create_task(project, queue, location, payload=None, in_seconds=None):
client = tasks_v2.CloudTasksClient()
parent = client.queue_path(project, location, queue)
task = {
'app_engine_http_request': {
'http_method': 'POST',
'relative_uri': '/task/'+queue
}
}
if payload is not None:
converted_payload = payload.encode()
task['app_engine_http_request']['body'] = converted_payload
if in_seconds is not None:
d = datetime.datetime.utcnow() + datetime.timedelta(seconds=in_seconds)
timestamp = timestamp_pb2.Timestamp()
timestamp.FromDatetime(d)
task['schedule_time'] = timestamp
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
print(response)
#You can change DOCUMENT_ID with USER_ID or something to identify the task
create_task(PROJECT_ID, QUEUE, REGION, DOCUMENT_ID)
Facing a similar problem of requiring to debounce multiple instances of Firestore write-trigger functions, we worked around the default Cloud Tasks task-name based dedup mechanism (still a constraint in Nov 2022) by building a small debounce "helper" using Firestore transactions.
We're using a helper collection _syncHelper_ to implement a delayed throttle for side effects of write-trigger fires - in the OP's case, send 1 email for all writes within 2 minutes.
In our case we are using Firebease Functions task queue utils and not directly interacting with Cloud Tasks but thats immaterial to the solution. The key is to determine the task's execution time in advance and use that as the "dedup key":
async function enqueueTask(shopId) {
const queueName = 'doSomething';
const now = new Date();
const next = new Date(now.getTime() + 2 * 60 * 1000);
try {
const shouldEnqueue = await getFirestore().runTransaction(async t=>{
const syncRef = getFirestore().collection('_syncHelper_').doc(<collection_id-doc_id>);
const doc = await t.get(syncRef);
let data = doc.data();
if (data?.timestamp.toDate()> now) {
return false;
}
await t.set(syncRef, { timestamp: Timestamp.fromDate(next) });
return true;
});
if (shouldEnqueue) {
let queue = getFunctions().taskQueue(queueName);
await queue.enqueue({
timestamp: next.toISOString(),
},
{ scheduleTime: next }); }
} catch {
...
}
}
This will ensure a new task is enqueued only if the "next execution" time has passed.
The execution operation (also a cloud function in our case) will remove the sync data entry if it hasn't been changed since it was executed:
exports.doSomething = functions.tasks.taskQueue({
retryConfig: {
maxAttempts: 2,
minBackoffSeconds: 60,
},
rateLimits: {
maxConcurrentDispatches: 2,
}
}).onDispatch(async data => {
let { timestamp } = data;
await sendYourEmailHere();
await getFirestore().runTransaction(async t => {
const syncRef = getFirestore().collection('_syncHelper_').doc(<collection_id-doc_id>);
const doc = await t.get(syncRef);
let data = doc.data();
if (data?.timestamp.toDate() <= new Date(timestamp)) {
await t.delete(syncRef);
}
});
});
This isn't a bullet proof solution (if the doSomething() execution function has high latency for example) but good enough for 99% of our use cases.

Lambda function not triggered by dynamodb update: Key Error

I am attempting to load a simple transactions.txt table into a S3 bucket where a Lambda function reads the file and populates DynamoDB tables for Customers and Transactions. This all works fine. However, I also have a Lambda function that is supposed to read the Transactions table as they populate the table and sum up the transaction totals by customer and insert them into another DynamoDB table--TransactionTotal.
My TotalNotifier Lambda function throws a "KeyError" regarding a "New Image". I believe the code is fine, and I have tried changing the type of Streams from 'New and Old' to just 'New' for the Transactions table and still encounter same error.
from __future__ import print_function
import json, boto3
# Connect to SNS
sns = boto3.client('sns')
alertTopic = 'HighBalanceAlert'
snsTopicArn = [t['TopicArn'] for t in sns.list_topics()['Topics'] if t['TopicArn'].endswith(':' + alertTopic)][0]
# Connect to DynamoDB
dynamodb = boto3.resource('dynamodb')
transactionTotalTableName = 'TransactionTotal'
transactionsTotalTable = dynamodb.Table(transactionTotalTableName);
# This handler is executed every time the Lambda function is triggered
def lambda_handler(event, context):
# Show the incoming event in the debug log
print("Event received by Lambda function: " + json.dumps(event, indent=2))
# For each transaction added, calculate the new Transactions Total
for record in event['Records']:
customerId = record['dynamodb']['NewImage']['CustomerId']['S']
transactionAmount = int(record['dynamodb']['NewImage']['TransactionAmount']['N'])
# Update the customer's total in the TransactionTotal DynamoDB table
response = transactionsTotalTable.update_item(
Key={
'CustomerId': customerId
},
UpdateExpression="add accountBalance :val",
ExpressionAttributeValues={
':val': transactionAmount
},
ReturnValues="UPDATED_NEW"
)
Here is a sample error from the CloudWatch log:
'NewImage': KeyError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 30, in lambda_handler
customerId = record['dynamodb']['NewImage']['CustomerId']['S']
KeyError: 'NewImage'
To elaborate on Oluwafemi's comment, you're likely experiencing this error when receiving a REMOVE event. Regardless of whether your stream is new and old images, or just new, you won't receive a NEW_IMAGE on a REMOVE event, since there is no new image. Check out the example events on aws docs.
A check on the value of record['eventName'] should solve the issue.

list automated RDS snapshots created today and copy to other region using boto3

We are building an automated DR cold site on other region, currently are working on retrieving a list of RDS automated snapshots created today, and passed them to another function to copy them to another AWS region.
The issue is with RDS boto3 client where it returned a unique format of date, making filtering on creation date more difficult.
today = (datetime.today()).date()
rds_client = boto3.client('rds')
snapshots = rds_client.describe_db_snapshots(SnapshotType='automated')
harini = "datetime("+ today.strftime('%Y,%m,%d') + ")"
print harini
print snapshots
for i in snapshots['DBSnapshots']:
if i['SnapshotCreateTime'].date() == harini:
print(i['DBSnapshotIdentifier'])
print (today)
despite already converted the date "harini" to the format 'SnapshotCreateTime': datetime(2015, 1, 1), the Lambda function still unable to list out the snapshots.
The better method is to copy the files as they are created by invoking a lambda function using a cloud watch event.
See step by step instruction:
https://geektopia.tech/post.php?blogpost=Automating_The_Cross_Region_Copy_Of_RDS_Snapshots
Alternatively, you can issue a copy for each snapshot regardless of the date. The client will raise an exception and you can trap it like this
# Written By GeekTopia
#
# Copy All Snapshots for an RDS Instance To a new region
# --Free to use under all conditions
# --Script is provied as is. No Warranty, Express or Implied
import json
import boto3
from botocore.exceptions import ClientError
import time
destinationRegion = "us-east-1"
sourceRegion = 'us-west-2'
rdsInstanceName = 'needbackups'
def lambda_handler(event, context):
#We need two clients
# rdsDestinationClient -- Used to start the copy processes. All cross region
copies must be started from the destination and reference the source
# rdsSourceClient -- Used to list the snapshots that need to be copied.
rdsDestinationClient = boto3.client('rds',region_name=destinationRegion)
rdsSourceClient=boto3.client('rds',region_name=sourceRegion)
#List All Automated for A Single Instance
snapshots = rdsSourceClient.describe_db_snapshots(DBInstanceIdentifier=rdsInstanceName,SnapshotType='automated')
for snapshot in snapshots['DBSnapshots']:
#Check the the snapshot is NOT in the process of being created
if snapshot['Status'] == 'available':
#Get the Source Snapshot ARN. - Always use the ARN when copying snapshots across region
sourceSnapshotARN = snapshot['DBSnapshotArn']
#build a new snapshot name
sourceSnapshotIdentifer = snapshot['DBSnapshotIdentifier']
targetSnapshotIdentifer ="{0}-ManualCopy".format(sourceSnapshotIdentifer)
targetSnapshotIdentifer = targetSnapshotIdentifer.replace(":","-")
#Adding a delay to stop from reaching the api rate limit when there are large amount of snapshots -
#This should never occur in this use-case, but may if the script is modified to copy more than one instance.
time.sleep(.2)
#Execute copy
try:
copy = rdsDestinationClient.copy_db_snapshot(SourceDBSnapshotIdentifier=sourceSnapshotARN,TargetDBSnapshotIdentifier=targetSnapshotIdentifer,SourceRegion=sourceRegion)
print("Started Copy of Snapshot {0} in {2} to {1} in {3} ".format(sourceSnapshotIdentifer,targetSnapshotIdentifer,sourceRegion,destinationRegion))
except ClientError as ex:
if ex.response['Error']['Code'] == 'DBSnapshotAlreadyExists':
print("Snapshot {0} already exist".format(targetSnapshotIdentifer))
else:
print("ERROR: {0}".format(ex.response['Error']['Code']))
return {
'statusCode': 200,
'body': json.dumps('Opearation Complete')
}
The code below will take automated snapshots created today.
import boto3
from datetime import date, datetime
region_src = 'us-east-1'
client_src = boto3.client('rds', region_name=region_src)
date_today = datetime.today().strftime('%Y-%m-%d')
def get_db_snapshots_src():
response = client_src.describe_db_snapshots(
SnapshotType = 'automated',
IncludeShared=False,
IncludePublic=False
)
snapshotsInDay = []
for i in response["DBSnapshots"]:
if i["SnapshotCreateTime"].strftime('%Y-%m-%d') == date.isoformat(date.today()):
snapshotsInDay.append(i)
return snapshotsInDay

Trying to disable all the Cloud Watch alarms in one shot

My organization is planning for a maintenance window for the next 5 hours. During that time, I do not want Cloud Watch to trigger alarms and send notifications.
Earlier, when I had to disable 4 alarms, I have written the following code in AWS Lambda. This worked fine.
import boto3
import collections
client = boto3.client('cloudwatch')
def lambda_handler(event, context):
response = client.disable_alarm_actions(
AlarmNames=[
'CRITICAL - StatusCheckFailed for Instance 456',
'CRITICAL - StatusCheckFailed for Instance 345',
'CRITICAL - StatusCheckFailed for Instance 234',
'CRITICAL - StatusCheckFailed for Instance 123'
]
)
But now, I was asked to disable all the alarms which are 361 in number. So, including all those names would take a lot of time.
Please let me know what I should do now?
Use describe_alarms() to obtain a list of them, then iterate through and disable them:
import boto3
client = boto3.client('cloudwatch')
response = client.describe_alarms()
names = [[alarm['AlarmName'] for alarm in response['MetricAlarms']]]
disable_response = client.disable_alarm_actions(AlarmNames=names)
You might want some logic around the Alarm Name to only disable particular alarms.
If you do not have the specific alarm arns, then you can use the logic in the previous answer. If you have a specific list of arns that you want to disable, you can fetch names using this:
def get_alarm_names(alarm_arns):
names = []
response = client.describe_alarms()
for i in response['MetricAlarms']:
if i['AlarmArn'] in alarm_arns:
names.append(i['AlarmName'])
return names
Here's a full tutorial: https://medium.com/geekculture/terraform-structure-for-enabling-disabling-alarms-in-batches-5c4f165a8db7

Handling EC2 Description Rate Limiting In Boto3 Lambda?

I'm creating a Lambda function with the intent of backing up my EC2 instances with their snapshots. However, I noticed reading the boto documentation the call to ec2.describe_instances is rate limited with MaxResults/NextToken. How can I combine the two of these to safely iterate through the list 50 at a time? Below is my work in progress:
import boto3
import datetime
import time
ec2 = boto3.client('ec2')
def lambda_handler(event, context):
try:
print("Creating snapshots on " + str(datetime.datetime.today()) + ".")
maxResults = 50
schedulers = ec2.describe_instances(Filters=[{'Name':'tag:GL-sub-purpose', 'Values':[Schedule]}], MaxResults=maxResults)
nextToken = schedulers['NextToken']
totalSchedulers = len(schedulers)
while totalSchedulers == maxResults:
schedulers = ec2.describe_instances(Filters=[{'Name':'tag:GL-sub-purpose', 'Values':[Schedule]}], MaxResults=maxResults, NextToken=nextToken)
nextToken = result['NextToken']
totalSchedulers = len(schedulers)
print("Performing backup on " + str(len(schedulers)) + " schedules.")
successful = []
failed = []
for s in schedulers:
#[...] More operations here, done 50 at a time.
I'm not really sure if I'm using the MaxResults/NextToken parameters correctly or efficiently here. Is this the best way to achieve my desired result/am I on the right track?
Just iterate through until NextToken is not returned. Here is a sample code to iterate through a batch of instances. Change it to suit your needs.
import boto3
ec2 = boto3.client('ec2')
insts = ec2.describe_instances(MaxResults=50)
while True:
#
# Process Instances (insts)
#
if 'NextToken' not in insts: break
next_token = insts['NextToken']
insts = ec2.describe_instances(MaxResults=50, NextToken=next_token)