I'm using Amazon SQS for my application in a producer/consumer context. I want to enable queue level logging where I can see items put on the queue and removed from it later. How can I do that?
I have read the following:
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/logging-using-cloudtrail.html
However, that doesn't suffice for my use case. Are we not allowed to do this with AWS queues?
What you're trying to achieve is not possible with just SQS. Possible solutions include:
Implement some middleware API between you producer and SQS queue. API level would log requests from producer.
Use Kinesis instead of SQS. Kinesis allows you to replay/analyze records created in last 24 hours.
Implement logging in consumer.
Use Lambda function that will (with help of CloudWatch Event Rule triggers) read SQS queue once a minute, log records and put them in another SQS queue for later processing by consumer.
Use different type of queue that allows logging. For example, Redis has MONITOR command for that.
In addition to Sergey Kovalev answer, one now has the option for Lambda functions to be triggered by SQS events.
You simply:
select the SQS queue you want as the event source for your Lambda function
I understand your pain. Even I had the issue where SQS was not behaving as expected and I was looking for logs to understand the problem.
SQS don't publish logs, all SQS APIs are synchronous so the client get the appropriate response.
The solutions mentioned above are the workarounds
Among them having Loggin at produce and consumer might not help much. Because in my case I did had logging at produce and consumer, but still what exactly SQS ran into and when will not be visible.
Related
In my application we are using a SQS to queue messages to be processed by another module. SQS doesn't send notification that a message has come and I don't want to make my application to go to check on it every "X times". So I'm trying to use a lambda trigger to make a http request to my module and make it pool messages from SQS when a message got there.
The problem is SQS deletes the sent messages if there is no error on the lambda function (as far I know). Forcing an error just to keep the messages on the pool can't be right. So I need a way to keep messages on the SQS after the lambda was triggered.
Maybe I should move the code that process the message to the lambda function, but I'm looking for ways to keep it there.
Anyone could give some guidance?
Thanks in advance
SQS is built to be a single producer to consumer for its queues so the intended functionality is happening.
However, there is a solution available for this exact scenario but it will require you to update your architecture.
The solution is to use a fanout architecture.
You would instead publish to an SNS topic, which has your SQS queue subscribed to it. Then create additional SQS queues for parallel channels (1 per each unique Lambda).
Add each Lambda function as consumer of its own SQS queue, each with their own processing.
I have an AWS Lambda function to read from an SQS queue. The lambda logic is basically to read off one message from SQS and then it processes and deletes the message. Code to read the message being something like.
ReceiveMessageRequest messageRequest =
new ReceiveMessageRequest(queueUrl).withWaitTimeSeconds(5).withMaxNumberOfMessages(1);
Now my question is what is the best way to trigger this lambda and how does this lambda scale for instance, if there are let's say 1000 messages in the queue so will there be a 1000 lambdas running together, since in my case one lambda can read only one message off the queue.
Any pointers on best practices around this kind of design.
Right now you best option is probably to setup an AWS Cloudwatch event rule that calls the lambda function on the interval that you need.
Here is a sample app from AWS to do just that:
https://github.com/awslabs/aws-serverless-sqs-event-source
I do believe that AWS will eventually support SQS as a event type for AWS lambda, which should make this even easier, but for now you best choice is probably a version of the code I linked above.
We can now use SQS messages to trigger AWS Lambda Functions. Moreover, no longer required to run a message polling service or create an SQS to SNS mapping.
Further details:
https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
AWS added native support in June 2018: https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
There are probably a few ways to do this, but I found this guide to be fairly helpful when I tried to implement the same sort of functionality you are describing in Node.js. One downside to this strategy is that you can only poll the queue every 60s.
The basic workflow would look something like this:
Set up a CloudWatch Alarm that gets triggered when the queue has a certain number of messages.
The Cloudwatch alarm then posts to SNS
The SNS message triggers a Lambda scale() function
The scale() function updates a configuration record in a DynamoDB table that sets the number of worker processes needed
You then have a main CloudWatch Schedule that invokes a worker() function every 60s
The worker() function reads configuration from DynamoDB to determine how many concurrent processes are needed, based on the queue size.
Worker() then invokes the appropriate number of process() functions
Process() function consumes messages from SQS, performs your main application logic, and then removes the item from the queue.
You can find an example of what the scaling functions would look like in Node.js here
I have used this solution in a production environment for almost a year without any issues, even with thousands of messages in the queue. If you cut out the scaling portion it is only going to do one message a time.
We plan to use AWS SQS service to queue events created from web service and then use several workers to process those events. One event can only be processed one time. According to AWS SQS document, AWS SQS standard queue can "occasionally" produce duplicated message but with unlimited throughput. AWS SQS FIFO queue will not produce duplicated message but with throughput limitation of 300 API calls per second (with batchSize=10, equivalent of 3000 messages per second). Our current peak hour traffic is only 80 messages per second. So, both are fine in terms of throughput requirement. But, when I started to use AWS SQS FIFO queue, I found that I need to do extra work like providing extra parameters
"MessageGroupId" and "MessageDeduplicationId" or need to enable "ContentBasedDeduplication" setting. So, I am not sure which one is a better solution. We just need the message not duplicated. We don't need the message to be FIFO.
Solution #1:
Use AWS SQS FIFO queue. For each message, need to generate a UUID for "MessageGroupId" and "MessageDeduplicationId" parameters.
Solution #2:
Use AWS SQS FIFO queue with "ContentBasedDeduplcation" enabled. For each message, need to generate a UUID for "MessageGroupId".
Solution #3:
Use AWS SQS standard queue with AWS ElasticCache (either Redis or Memcached). For each message, the "MessageId" field will be saved in the cache server and checked for duplication later on. Existence means this message has been processed. (By the way, how long should the "MessageId" exists in the cache server. AWS SQS document does not mention how far back a message could be duplicated.)
You are making your systems complicated with SQS.
We have moved to Kinesis Streams, It works flawlessly. Here are the benefits we have seen,
Order of Events
Trigger an Event when data appears in stream
Deliver in Batches
Leave the responsibility to handle errors to the receiver
Go Back with time in case of issues
Buggier Implementation of the process
Higher performance than SQS
Hope it helps.
My first question would be that why is it even so important that you don't get duplicate messages? An ideal solution would be to use a standard queue and design your workers to be idempotent. For e.g., if the messages contain something like a task-ID and store the completed task's result in a database, ignore those whose task-ID already exists in DB.
Don't use receipt-handles for handling application-side deduplication, because those change every time a message is received. In other words, SQS doesn't guarantee same receipt-handle for duplicate messages.
If you insist on de-duplication, then you have to use FIFO queue.
I am working with PHP technology.
I have my program that will write message to Amazon SQS.
Can anybody tell me how I can use lambda service to get data from SQS and push it into MySQL. Lambda service should get trigger whenever new record gets added to the queue.
Can somebody share the steps or code that will help me to get through with this task?
There isn't any official way to link SQS and Lambda at the moment. Have you looked into using an SNS topic instead of an SQS queue?
Agree with Mark B.
Ways to get events over to lambda.
use SNS http://docs.aws.amazon.com/sns/latest/dg/sns-lambda.html
use SNS->SQS and have the lambda launched by the sns notification just use it to load whatever is in te SQS queue.
use kinesis.
alternatively have lambda run by cron job to read sqs. Depends on needed latency. If you require it be processed immediately then this is not the solution because you would be running the lambda all the time.
Important note for using SQS. You are charged when you query even if no messages are waiting. So do not do fast polls even in your lambdas. Easy to run up a huge bill doing nothing. Also good reason to make sure you set up cloudwatch on the account to monitor usage and charges.
I have the following infrastructure:
I have an EC2 instance with a NodeJS+Express process listening on a port for messages (process 1). Every time the process receives a message it sends it to an SQS queue. Then I have another process in the same machine reading the queue using long polling (process 2). When it finds a message in the queue it inserts the data in a MariaDB database sitting on an RDS instance.
(Just to clarify, messages are generated by users, they send a chunk of data which can contain arbitrary information to the endpoint where the process 1 is listening)
Now I want to put the process that reads the SQS (process 2) in a Lambda function so that the process that writes to the queue and the one that reads from the queue are completely independent. The problem is that I don't know if this is possible.
I know that Lambda function are invoked in response to an event, and the events supported at the moment are S3, SNS, SES, DynamoDB, Kinesis, Cognito, CloudWatch and Cloudformation but NOT SQS.
I was thinking in using SNS notifications to invoke the Lambda function so that every time a message is pushed to the queue, an SNS notification is fired and invokes the Lambda function but after playing a bit with it I've realised that is not possible to create an SNS notification from SQS, it's only possible to write SNS notifications to the queue.
Right now I'm a bit stuck because I don't know how to continue. I have the feeling that is not possible to create this infrastructure due to the current limitations in the AWS services. Is there another way to do what I want or am I in a dead-end?
Just to extend my question with some research I've made, this github repo shows how to read an SQS queu from a Lambda function but the lambda function works only if is fired from the command line:
https://github.com/robinjmurphy/sqs-to-lambda
In the readme, the author mentions the following:
Update: Lambda now supports SNS notifications as an event source,
which makes this hack entirely unneccessary for SNS notifcations. You
might still find it useful if you like the idea of using a Lambda
function to process jobs on an SQS queue.
But I think this doesn't solve my problem, an SNS notification can invoke the Lambda function but I don't see how I can create a notification when a message is received in the SQS queue.
Thanks
There are couple of Strategies which can be used to connect the dots, (A)Synchronously or Run-Sleep-Run to keep the data process flow between SNS, SQS, Lambda.
Strategy 1 : Have a Lambda function listen to SNS and process it in real time [Please note that an SQS Queue can subscribe to an SNS Topic - which would may be helpful for logging / auditing / retry handling]
Strategy 2 : Given that you are getting data sourced to SQS Queue. You can try with 2 Lambda Functions [Feeder & Worker].
Feeder would be scheduled lambda function whose job is to take items
from SQS (if any) and push it as an SNS topic (and continue doing it forever)
Worker would be linked to listen the SNS topic which would do the actual data processing
We can now use SQS messages to trigger AWS Lambda Functions. Moreover, no longer required to run a message polling service or create an SQS to SNS mapping.
Further details:
https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
AWS SQS is one of the oldest products of Amazon, which only supported polling (long and short) up until June 2018. As mentioned in this answer, AWS SQS now supports the feature of triggering lambda functions on new message arrival in SQS. A complete tutorial for this is provided in this document.
I used to tackle this problem using different mechanisms, and given below are some approaches you can use.
You can develop a simple polling application in Lambda, and use AWS CloudWatch to invoke it every 5 mins or so. You can make this near real-time by using CloudWatch events to invoke lambda with short downtimes. Use this tutorial or this tutorial for this purpose. (This could cost more on Lambdas)
You can consider that SQS is redundant if you don't need to persist the messages nor guarantee the order of delivery. You can use AWS SNS (Simple Notification Service) to directly invoke a lambda function and do whatever the processing required. Use this tutorial for this purpose. This will happen in real-time. But the main drawback is the number of lambdas that can be initiated per region at a given time. Please read this and understand the limitation before following this approach. Nevertheless AWS SNS Guarantees the order of delivery. Also SNS can directly call an HTTP endpoint and store the message in your DB.
I had a similar situation (and now have a working solution deploed). I have addressed it in a following manner:
i.e. publishing events to SNS; which then get fanned-out to Lambda and SQS.
NOTE: This is not applicable to the events that have to be processed in a certain order.
That there are some gotchas (w/ possible solutions) such as:
racing condition: lambda might get invoked before messages is deposited into the queue
distributed nature of SQS queue may lead to returning no messages even though there is a message note1.
The solution to both cases would be to do long-polling of SQS queue; but this does make your lambda bill more expensive.
note1
Short poll is the default behavior where a weighted random set of machines is sampled on a ReceiveMessage call. This means only the messages on the sampled machines are returned. If the number of messages in the queue is small (less than 1000), it is likely you will get fewer messages than you requested per ReceiveMessage call. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response; in which case you should repeat the request.
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html
We had some similar requirements so we ended up building a library and open sourcing it to help with SQS to Lambda async. I'm not sure if this fills your particular set of requirements, but thought it might be worth a look: https://read.iopipe.com/sqs-lambda-teaming-up-92c4096be49c