Delete message from SQS DLQ - amazon-web-services

How to delete a message whose MessageId is known from a SQS DLQ which is having around 10k messsages?
I tried to do it using lambda function but the max messages that can be received using that are only 10.
What are the best ways to do this?

It seems your question is for just one message, but in any case, you can use deleteMessageBatch method. In a node server, you would have this:
const AWS = require("aws-sdk");
const messageIds = ["1", "2", "3"];
const removeNotificationFromQueueData = messageIds.map((id) => {
return {
ReceiptHandle: id,
Id: id,
};
});
const sqs = new AWS.SQS();
sqs
.deleteMessageBatch({
QueueUrl: "http://myQueueUrlEndpoint",
Entries: removeNotificationFromQueueData,
})
.promise();

Another way of using python script
#!/usr/bin/env python
import boto3
def delete_msg_id(queue_url, message_id):
"""Delete message based on message ID
:param queue_url: URL of the SQS queue to read.
:param message_id: message id to be deleted
"""
sqs_client = boto3.client("sqs")
while True:
resp = sqs_client.receive_message(
QueueUrl=queue_url, AttributeNames=["All"], MaxNumberOfMessages=10, WaitTimeSeconds=1
)
for msg in resp["Messages"]:
try:
if msg["MessageId"] == message_id:
sqs_client.delete_mesage(QueueUrl=queue_url, ReceiptHandle=msg["ReceiptHandle"])
except KeyError:
print('failed')
continue
if __name__ == '__main__':
delete_msg_id('my_sqs_url', 'my_msg_id')
Ref to Moving Big Number Of Messages Between SQS Queues

Related

subFolder is empty when using a Google IoT Core gateway and Pub/Sub

I have a device publishing through a gateway on the events topic (/devices/<dev_id>/events/motion) to PubSub. It's landing in PubSub correctly but subFolder is just an empty string.
On the gateway I'm publishing using the code below. f"mb.{device_id}" is the device_id (not the gateway ID and attribute could be anything - motion, temperature, etc
def report(self, device_id, attribute, value):
topic = f"/devices/mb.{device_id}/events/{attribute}"
timestamp = datetime.utcnow().timestamp()
client.publish(topic, json.dumps({"v": value, "ts": timestamp}))
And this is the cloud function listening on the PubSub queue.
def iot_to_bigtable(event, context):
payload = json.loads(base64.b64decode(event["data"]).decode("utf-8"))
timestamp = payload.get("ts")
value = payload.get("v")
if not timestamp or value is None:
raise BadDataException()
attributes = event.get("attributes", {})
device_id = attributes.get("deviceId")
registry_id = attributes.get("deviceRegistryId")
attribute = attributes.get("subFolder")
if not device_id or not registry_id or not attribute:
raise BadDataException()
A sample of the event in Pub/Sub:
{
#type: 'type.googleapis.com/google.pubsub.v1.PubsubMessage',
attributes: {
deviceId: 'mb.26727bab-0f37-4453-82a4-75d93cb3f374',
deviceNumId: '2859313639674234',
deviceRegistryId: 'mb-staging',
deviceRegistryLocation: 'europe-west1',
gatewayId: 'mb.42e29cd5-08ad-40cf-9c1e-a1974144d39a',
projectId: 'mb-staging',
subFolder: ''
},
data: 'eyJ2IjogImxvdyIsICJ0cyI6IDE1OTA3NjgzNjcuMTMyNDQ4fQ=='
}
Why is subFolder empty? Based on the docs I was expecting it to be the attribute (i.e. motion or temperature)
This issue has nothing to do with Cloud IoT Core. It is instead caused by how Pub/Sub handles failed messages. It was retrying messages from ~12 hours ago that had failed (and didn't have an attribute).
You fix this by purging the Subscription in Pub/Sub.

SQS deleting automatically messages after receiving them by Lambda

I have an SQS that triggers a Lambda function. The Lambda function is just receiving the messsage and putting it in a DynamoDB.
It works fine, but the problem is that i noted that the message is deleted from the SQS without the need to add delete() statement in my code.
But in the code it's clearly mentionned that the message should be manually deleted by the consumer otherwise it wil be putted again in the SQS.
What's going on here ?
I want to deal with situation where there will be a problem with the process and in that case the message should reappear again in the SQS so another Lambda can try to process it.
Here is my Lambda code :
import json
import time
import boto3
def lambda_handler(event, context):
message_id = event['Records'][0]['messageId']
message_receipt_handle = event['Records'][0]['receiptHandle']
message_body = event['Records'][0]['body']
print('Message received :')
print(message_body)
print('Processing message ...')
dynamo_db = boto3.client('dynamodb')
response_db = dynamo_db.put_item(
TableName='sqs-test-sbx',
Item={
'id': {
'S': message_id,
},
'Message': {
'S': message_body,
}
}
)
print('dynamodb response :')
print(response_db)
# Simulate a proceesing ...
time.sleep(10)
print('Message processed')
return {
'statusCode': 200,
'message_id': message_id,
'message_body': message_body,
'event': json.dumps(event)
}
That is normal behavior, when you trigger the lambda directly from SQS
https://docs.aws.amazon.com/en_gb/lambda/latest/dg/with-sqs.html
When your function successfully processes a batch, Lambda deletes its
messages from the queue.
You need to delete the message, when you fetch the messages by your own from SQS for instancde from a EC2 instance.

How to pull specific information out of a alarm event in Lambda

I set up a CPU alarm for an EC2 instance that triggers an SNS Topic that has an endpoint that is a Lambda function. The Lambda function will then send ma an email and slack message telling me that an instance is in the alarm start and tell me exactly what instance it came from. I have the email and slack working and now I just need to get the instance ID from the event that my Lambda received from the alarm.
I get the following event in the Lambda function. I want to just pull out the instance ID from it, which in this example would be "i-07db9e2f61d100". It is located in "Dimensions".
How about also pulling out the "AlarmName" (which would be "cpu-mon" in this example)?
Here is all the data in the event I receive:
{'Records': [{'EventSource': 'aws:sns', 'EventVersion': '1.0', 'EventSubscriptionArn': 'arn:aws:sns:us-east-2:Alarm-test:db99f3fe-1c4b', 'Sns': {'Type': 'Notification', 'MessageId': '9921c85a-6f59-50c0', 'TopicArn': 'arn:aws:sns:us-east-2:4990:Alarm-test', 'Subject': 'ALARM: "cpu-mon" in US East (Ohio)', 'Message': '{"AlarmName":"cpu-mon","AlarmDescription":"Alarm when CPU exceeds 70 percent","AWSAccountId":"000000000","NewStateValue":"ALARM","NewStateReason":"Threshold Crossed: 2 out of the last 2 datapoints [99.8333333333333 (26/08/19 19:19:00), 99.1803278688525 (26/08/19 19:18:00)] were greater than the threshold (70.0) (minimum 2 datapoints for OK -> ALARM transition).","StateChangeTime":"2019-08-26T19:20:52.350+0000","Region":"US East (Ohio)","OldStateValue":"OK","Trigger":{"MetricName":"CPUUtilization","Namespace":"AWS/EC2","StatisticType":"Statistic","Statistic":"AVERAGE","Unit":"Percent","Dimensions":[{"value":"i-07db9e2f61d100","name":"InstanceId"}],"Period":60,"EvaluationPeriods":2,"ComparisonOperator":"GreaterThanThreshold","Threshold":70.0,"TreatMissingData":"","EvaluateLowSampleCountPercentile":""}}', 'Timestamp': '2019-08-26T19:20:52.403Z', 'SignatureVersion': '1', 'Signature': 'UeWhS==', 'SigningCertUrl': 'https://sns.us-east-2.amazonaws.com/SimpleNotificationService-63f9.pem', 'UnsubscribeUrl': 'https://sns.us-east-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east-2:49:Alarm-test:dfe-1c4b-4db9', 'MessageAttributes': {}}}]}
Here is my Lambda function (python) -
# Sends Slack and text message
import json
import subprocess
import boto3
session = boto3.Session(
region_name="us-east-1"
)
sns_client = session.client('sns')
def lambda_handler(event, context):
print("THIS IS THE EVENT - " + str(event))
data = json.dumps({'text': str(event)})
# Send text alerts
alertNumbers = ["1-xxx-xxx-xxxx"]
# Send text message
for i in range(len(alertNumbers)):
sns_client.publish(
PhoneNumber=alertNumbers[i],
Message=msg,
MessageAttributes={
'AWS.SNS.SMS.SenderID': {
'DataType': 'String',
'StringValue': 'SENDERID'
},
'AWS.SNS.SMS.SMSType': {
'DataType': 'String',
'StringValue': 'Promotional'
}
}
)
# Send Slack message
subprocess.call([
'curl',
'-X', 'POST',
'-H', 'Content-type: application/json',
'--data', data,
'https://hooks.slack.com/services/000000'
Thanks for any help!
You simply need to access the data of the event and put it where you want it.
Inside your lambda_handler add this as the first line:
message = json.loads(event['Records'][0]['Sns']['Message'])
Now the SNS message is available as message. To get the AlarmName is as simple as message['AlarmName'] and the instance id is at message['Trigger']['Dimensions'][0]['value']

boto3: can't find queue that was immediately created before

I create an SQS queue in boto3 and immediately look for it via sqs.list_queues but it won't return anything.
when I input the SQS queue name into the console, it won't return anything until I input it again the second time.
So does this mean I need to call list_queues twice? Why is this happening? Why isn't AWS return queues that was immediately created before?
sqs = boto3.client('sqs')
myQ = sqs.create_queue(QueueName='just_created')
response = sqs.list_queues(
QueueNamePrefix='just_created'
)
response does not contain the usual array of QueueUrls
Just like many AWS services, SQS control plane is eventually consistent, meaning that it takes a while to propagate the data accross the systems.
If you need the URL of the queue you just created, you can find it in the return value of the create_queue call.
The following operation creates an SQS queue named MyQueue.
response = client.create_queue(
QueueName='MyQueue',
)
print(response)
Expected Output:
{
'QueueUrl': 'https://queue.amazonaws.com/012345678910/MyQueue',
'ResponseMetadata': {
'...': '...',
},
}

Best way to move messages off DLQ in Amazon SQS?

What is the best practice to move messages from a dead letter queue back to the original queue in Amazon SQS?
Would it be
Get message from DLQ
Write message to queue
Delete message from DLQ
Or is there a simpler way?
Also, will AWS eventually have a tool in the console to move messages off the DLQ?
Here is a quick hack. This is definitely not the best or recommended option.
Set the main SQS queue as the DLQ for the actual DLQ with Maximum Receives as 1.
View the content in DLQ (This will move the messages to the main queue as this is the DLQ for the actual DLQ)
Remove the setting so that the main queue is no more the DLQ of the actual DLQ
On Dec 1 2021 AWS released the ability to redrive messages from a DLQ back to the source queue(or custom queue).
With dead-letter queue redrive to source queue, you can simplify and enhance your error-handling workflows for standard queues.
Source:
Introducing Amazon Simple Queue Service dead-letter queue redrive to source queues
There are a few scripts out there that do this for you:
npm / nodejs based: http://github.com/garryyao/replay-aws-dlq
# install
npm install replay-aws-dlq;
# use
npx replay-aws-dlq [source_queue_url] [dest_queue_url]
go based: https://github.com/mercury2269/sqsmover
# compile: https://github.com/mercury2269/sqsmover#compiling-from-source
# use
sqsmover -s [source_queue_url] -d [dest_queue_url]
Don't need to move the message because it will come with so many other challenges like duplicate messages, recovery scenarios, lost message, de-duplication check and etc.
Here is the solution which we implemented -
Usually, we use the DLQ for transient errors, not for permanent errors. So took below approach -
Read the message from DLQ like a regular queue
Benefits
To avoid duplicate message processing
Better control on DLQ- Like I put a check, to process only when the regular queue is completely processed.
Scale up the process based on the message on DLQ
Then follow the same code which regular queue is following.
More reliable in case of aborting the job or the process got terminated while processing (e.g. Instance killed or process terminated)
Benefits
Code reusability
Error handling
Recovery and message replay
Extend the message visibility so that no other thread process them.
Benefit
Avoid processing same record by multiple threads.
Delete the message only when either there is a permanent error or successful.
Benefit
Keep processing until we are getting a transient error.
I wrote a small python script to do this, by using boto3 lib:
conf = {
"sqs-access-key": "",
"sqs-secret-key": "",
"reader-sqs-queue": "",
"writer-sqs-queue": "",
"message-group-id": ""
}
import boto3
client = boto3.client(
'sqs',
aws_access_key_id = conf.get('sqs-access-key'),
aws_secret_access_key = conf.get('sqs-secret-key')
)
while True:
messages = client.receive_message(QueueUrl=conf['reader-sqs-queue'], MaxNumberOfMessages=10, WaitTimeSeconds=10)
if 'Messages' in messages:
for m in messages['Messages']:
print(m['Body'])
ret = client.send_message( QueueUrl=conf['writer-sqs-queue'], MessageBody=m['Body'], MessageGroupId=conf['message-group-id'])
print(ret)
client.delete_message(QueueUrl=conf['reader-sqs-queue'], ReceiptHandle=m['ReceiptHandle'])
else:
print('Queue is currently empty or messages are invisible')
break
you can get this script in this link
this script basically can move messages between any arbitrary queues. and it supports fifo queues as well as you can supply the message_group_id field.
That looks like your best option. There is a possibility that your process fails after step 2. In that case you'll end up copying the message twice, but you application should be handling re-delivery of messages (or not care) anyway.
here:
import boto3
import sys
import Queue
import threading
work_queue = Queue.Queue()
sqs = boto3.resource('sqs')
from_q_name = sys.argv[1]
to_q_name = sys.argv[2]
print("From: " + from_q_name + " To: " + to_q_name)
from_q = sqs.get_queue_by_name(QueueName=from_q_name)
to_q = sqs.get_queue_by_name(QueueName=to_q_name)
def process_queue():
while True:
messages = work_queue.get()
bodies = list()
for i in range(0, len(messages)):
bodies.append({'Id': str(i+1), 'MessageBody': messages[i].body})
to_q.send_messages(Entries=bodies)
for message in messages:
print("Coppied " + str(message.body))
message.delete()
for i in range(10):
t = threading.Thread(target=process_queue)
t.daemon = True
t.start()
while True:
messages = list()
for message in from_q.receive_messages(
MaxNumberOfMessages=10,
VisibilityTimeout=123,
WaitTimeSeconds=20):
messages.append(message)
work_queue.put(messages)
work_queue.join()
DLQ comes into play only when the original consumer fails to consume message successfully after various attempts. We do not want to delete the message since we believe we can still do something with it (maybe attempt to process again or log it or collect some stats) and we do not want to keep encountering this message again and again and stop the ability to process other messages behind this one.
DLQ is nothing but just another queue. Which means we would need to write a consumer for DLQ that would ideally run less frequently (compared to original queue) that would consume from DLQ and produce message back into the original queue and delete it from DLQ - if thats the intended behavior and we think original consumer would be now ready to process it again. It should be OK if this cycle continues for a while since we now also get an opportunity to manually inspect and make necessary changes and deploy another version of original consumer without losing the message (within the message retention period of course - which is 4 days by default).
Would be nice if AWS provides this capability out of the box but I don't see it yet - they're leaving this to the end user to use it in way they feel appropriate.
There is a another way to achieve this without writing single line of code.
Consider your actual queue name is SQS_Queue and the DLQ for it is SQS_DLQ.
Now follow these steps:
Set SQS_Queue as the dlq of SQS_DLQ. Since SQS_DLQ is already a dlq of SQS_Queue. Now, both are acting as the dlq of the other.
Set max receive count of your SQS_DLQ to 1.
Now read messages from SQS_DLQ console. Since message receive count is 1, it will send all the message to its own dlq which is your actual SQS_Queue queue.
We use the following script to redrive message from src queue to tgt queue:
filename: redrive.py
usage: python redrive.py -s {source queue name} -t {target queue name}
'''
This script is used to redrive message in (src) queue to (tgt) queue
The solution is to set the Target Queue as the Source Queue's Dead Letter Queue.
Also set Source Queue's redrive policy, Maximum Receives to 1.
Also set Source Queue's VisibilityTimeout to 5 seconds (a small period)
Then read data from the Source Queue.
Source Queue's Redrive Policy will copy the message to the Target Queue.
'''
import argparse
import json
import boto3
sqs = boto3.client('sqs')
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('-s', '--src', required=True,
help='Name of source SQS')
parser.add_argument('-t', '--tgt', required=True,
help='Name of targeted SQS')
args = parser.parse_args()
return args
def verify_queue(queue_name):
queue_url = sqs.get_queue_url(QueueName=queue_name)
return True if queue_url.get('QueueUrl') else False
def get_queue_attribute(queue_url):
queue_attributes = sqs.get_queue_attributes(
QueueUrl=queue_url,
AttributeNames=['All'])['Attributes']
print(queue_attributes)
return queue_attributes
def main():
args = parse_args()
for q in [args.src, args.tgt]:
if not verify_queue(q):
print(f"Cannot find {q} in AWS SQS")
src_queue_url = sqs.get_queue_url(QueueName=args.src)['QueueUrl']
target_queue_url = sqs.get_queue_url(QueueName=args.tgt)['QueueUrl']
target_queue_attributes = get_queue_attribute(target_queue_url)
# Set the Source Queue's Redrive policy
redrive_policy = {
'deadLetterTargetArn': target_queue_attributes['QueueArn'],
'maxReceiveCount': '1'
}
sqs.set_queue_attributes(
QueueUrl=src_queue_url,
Attributes={
'VisibilityTimeout': '5',
'RedrivePolicy': json.dumps(redrive_policy)
}
)
get_queue_attribute(src_queue_url)
# read all messages
num_received = 0
while True:
try:
resp = sqs.receive_message(
QueueUrl=src_queue_url,
MaxNumberOfMessages=10,
AttributeNames=['All'],
WaitTimeSeconds=5)
num_message = len(resp.get('Messages', []))
if not num_message:
break
num_received += num_message
except Exception:
break
print(f"Redrive {num_received} messages")
# Reset the Source Queue's Redrive policy
sqs.set_queue_attributes(
QueueUrl=src_queue_url,
Attributes={
'VisibilityTimeout': '30',
'RedrivePolicy': ''
}
)
get_queue_attribute(src_queue_url)
if __name__ == "__main__":
main()
AWS Lambda solution worked well for us -
Detailed instructions:
https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:303769779339:applications~aws-sqs-dlq-redriver
Github: https://github.com/honglu/aws-sqs-dlq-redriver.
Deployed with a click and another click to start the redrive!
Here is also the script (written in Typescript) to move the messages from one AWS queue to another one. Maybe it will be useful for someone.
import {
SQSClient,
ReceiveMessageCommand,
DeleteMessageBatchCommand,
SendMessageBatchCommand,
} from '#aws-sdk/client-sqs'
const AWS_REGION = 'eu-west-1'
const AWS_ACCOUNT = '12345678901'
const DLQ = `https://sqs.${AWS_REGION}.amazonaws.com/${AWS_ACCOUNT}/dead-letter-queue`
const QUEUE = `https://sqs.${AWS_REGION}.amazonaws.com/${AWS_ACCOUNT}/queue`
const loadMessagesFromDLQ = async () => {
const client = new SQSClient({region: AWS_REGION})
const command = new ReceiveMessageCommand({
QueueUrl: DLQ,
MaxNumberOfMessages: 10,
VisibilityTimeout: 60,
})
const response = await client.send(command)
console.log('---------LOAD MESSAGES----------')
console.log(`Loaded: ${response.Messages?.length}`)
console.log(JSON.stringify(response, null, 4))
return response
}
const sendMessagesToQueue = async (entries: Array<{Id: string, MessageBody: string}>) => {
const client = new SQSClient({region: AWS_REGION})
const command = new SendMessageBatchCommand({
QueueUrl: QUEUE,
Entries: entries.map(entry => ({...entry, DelaySeconds: 10})),
// [
// {
// Id: '',
// MessageBody: '',
// DelaySeconds: 10
// }
// ]
})
const response = await client.send(command)
console.log('---------SEND MESSAGES----------')
console.log(`Send: Successful - ${response.Successful?.length}, Failed: ${response.Failed?.length}`)
console.log(JSON.stringify(response, null, 4))
}
const deleteMessagesFromQueue = async (entries: Array<{Id: string, ReceiptHandle: string}>) => {
const client = new SQSClient({region: AWS_REGION})
const command = new DeleteMessageBatchCommand({
QueueUrl: DLQ,
Entries: entries,
// [
// {
// "Id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
// "ReceiptHandle": "someReceiptHandle"
// }
// ]
})
const response = await client.send(command)
console.log('---------DELETE MESSAGES----------')
console.log(`Delete: Successful - ${response.Successful?.length}, Failed: ${response.Failed?.length}`)
console.log(JSON.stringify(response, null, 4))
}
const run = async () => {
const dlqMessageList = await loadMessagesFromDLQ()
if (!dlqMessageList || !dlqMessageList.Messages) {
console.log('There is no messages in DLQ')
return
}
const sendMsgList: any = dlqMessageList.Messages.map(msg => ({ Id: msg.MessageId, MessageBody: msg.Body}))
const deleteMsgList: any = dlqMessageList.Messages.map(msg => ({ Id: msg.MessageId, ReceiptHandle: msg.ReceiptHandle}))
await sendMessagesToQueue(sendMsgList)
await deleteMessagesFromQueue(deleteMsgList)
}
run()
P.S. The script is with room for improvement, but anyway might be useful.
here is a simple python script you can use from the cli to do the same, depending only on boto3
usage
python redrive_messages __from_queue_name__ __to_queue_name__
code
import sys
import boto3
from src.utils.get_config.get_config import get_config
from src.utils.get_logger import get_logger
sqs = boto3.resource('sqs')
config = get_config()
log = get_logger()
def redrive_messages(from_queue_name:str, to_queue_name:str):
# initialize the queues
from_queue = sqs.get_queue_by_name(QueueName=from_queue_name)
to_queue = sqs.get_queue_by_name(QueueName=to_queue_name)
# begin querying for messages
should_check_for_more = True
messages_processed = []
while (should_check_for_more):
# grab the next message
messages = from_queue.receive_messages(MaxNumberOfMessages=1);
if (len(messages) == 0):
should_check_for_more = False;
break;
message = messages[0]
# requeue it
to_queue.send_message(MessageBody=message.body, DelaySeconds=0)
# let the queue know that the message was processed successfully
messages_processed.append(message)
message.delete()
print(f'requeued {len(messages_processed)} messages')
if __name__ == '__main__':
from_queue_name = sys.argv[1]
to_queue_name = sys.argv[2]
redrive_messages(from_queue_name, to_queue_name)