Django celery task duplication: can't lock DB?

Django celery task duplication: can't lock DB? - django

My django app allows users to send messages to each other, and I pool some of the recent messages together and send them in an email using celery and redis.
Every time a user sends a message, I add a Message to the db and then trigger an async task to pool that user's messages from the last 60 seconds and send them as an email.
tasks.pushMessagePool.apply_async(args = (fromUser,), countdown = 60)
If the user sends 5 messages in the next 60 seconds, then my assumption is that 5 tasks should be created, but only the first task sends the email, and the other 4 tasks do nothing. I implemented a simple locking mechanism to make sure that messages were only considered a single time and to ensure db locking.
#shared_task
def pushMessagePool(fromUser, ignore_result=True):
lockCode = randint(0,10**9)
data.models.Messages.objects.filter(fromUser = fromUser, locked=False).update(locked=True, lockCode = lockCode)
M = data.models.Messages.objects.filter(fromUser = fromUser, lockCode = lockCode)
sendEmail(M,lockCode)
With this setup, I still get occasional (~10%) duplicates. The duplicates will fire within 10ms of each other, and they have different lockCodes.
Why doesn't this locking mechanism work? Does celery refer to an old DB snapshot? That wouldn't make any sense.

Djangojack,here is a similar issue? But for SQS. I'm not sure if it applies to Redis too?
When creating your SQS queue you need to set the Default Visibility
timeout to some time that's greater than the max time you expect a
task to run. This is the time SQS will make a message invisible to all
other consumers after delivering to one consumer. I believe the
default is 30 seconds. So, if a task takes more than 30 seconds, SQS
will deliver the same message to another consumer because it assumes
the first consumer died and did not complete the task.
From a comment by #gustavo-ambrozio on this answer.

Related

sending lots of AWS SQS messages---FAST

I have an application that may need to send hundreds of thousands of messages each run of my program using SQS. The program takes 1-2 hours/run and I run it 5-10 times/day. So that's roughly 1 million messages/day.
I want to do it fast. Is my best approach to:
Send each with its own send-message, but send them in another thread so my main thread doesn't pause?
Use send-message-batch, which lets me send 10 messages at a time?
OMG. Why am I sending so many messages? Why not write them all into a big object, save the object in S3, and then send a pointer to the object with SQS?
My messages are the stdout and stderr of programs that are running in a distributed system. So the problem with #3 above is that I won't get the output of the program until the batching happens. I suppose that I could batch up every 60 seconds.
I'm sure that this has come up for other people. Is there a clever way to do this in the AWS SQS API that I am missing?
Kinesis is not an option in my environment.
We are currently sending the messages from python programs running on Apache Spark workers---about 2000 cores/cluster---and other monitoring systems and about 5-20 clusters. The messages will go to a lambda server. The problem is that some of the nodes send a few thousand messages within the course of 10-20 seconds
We tried using Spark itself to collect this information, storing it in an RDD, saving that RDD in S3, and so on. The problem with that approach was that we didn't get real-time monitoring, and we added several hours to processing time. (We're not entirely sure why it added so much time, but it's possible that Spark ended up re-computing some RDDs because some stuff would no longer fit in RAM or on the spill disks.)

We solved this problem three ways:
We created work queue with a consumer running in a separate queue. The consumer received messages from the worker and sent them off in batches of 10. If no message was received within a few seconds, the queue was flushed.
The full code is here: https://github.com/uscensusbureau/DAS_2020_Redistricting_Production_Code/blob/5e619a4b719284ad6af91e85e0548077ce3bfed7/source/programs/dashboard.py
The relevant class is below.
#
# We use a worker running in another thread to collect SQS messages
# and send them asychronously to the SQS in batches of 10 (or when WATCH_TIME expires.)
#
def sqs_queue():
return boto3.resource('sqs',
config = botocore.config.Config(
proxies={'https':bcc_https_proxy().replace("https://","")}
)).Queue(das_sqs_url())
SQS_MAX_MESSAGES=10 # SQS allows sending up to 10 messages at a time
WATCHER_TIME=5 # how long to collect SQS messages before sending them
EXIT='exit' # token message to send when program exits
# SQS Worker. Collects
class SQS_Client(metaclass=Singleton):
"""SQS_Client class is a singleton.
This uses a python queue to batch up messages that are send to the AWS Quwue.
We batch up to 10 messages a time, but send every message within 5 seconds.
"""
def __init__(self):
"""Set up the singleton by:
- getting a handle to the SQS queue through the BCC proxy.
- Creating the python queue for batching the requests to the SQS queue.
- Creating a background process to flush the queue every 5 seconds.
"""
# Set the default
if TRY_SQS_SECOND not in os.environ:
os.environ[TRY_SQS_SECOND]=YES
self.sqs_queue = sqs_queue() # queue to send this to SQS
self.pyqueue = queue.Queue() # producer/consumer queue used by dashboard.py
self.worker = threading.Thread(target=self.watcher, daemon=True)
self.worker.start()
atexit.register(self.terminate)
def flush(self, timeout=0.0):
"""Flush the pyqueue. Can be called from the main thread or the watcher thread.
While there are messages in the queue, grab up to 10, then send them to the sqs_queue.
Returns last message processed, which may be EXIT.
The watcher repeatedly calls flush() until it receives an Exit.
"""
entries = []
msg = None
t0 = time.time()
while True:
try:
msg = self.pyqueue.get(timeout=timeout, block=True)
except queue.Empty as e:
break
if msg==EXIT:
break
msg['Id'] = str( len( entries ))
entries.append(msg)
if len(entries)==SQS_MAX_MESSAGES:
break
if time.time() - t0 > timeout:
break
if entries:
# Send the 1-10 messages.
# If this fails, just save them in S3.
try:
if os.getenv(TRY_SQS_SECOND)==YES:
self.sqs_queue.send_messages(Entries=entries)
entries = []
except botocore.exceptions.ClientError as err:
logging.warning("Cannot send by SQS; sending by S3")
os.environ[TRY_SQS_SECOND]=NO
if entries:
assert os.getenv(TRY_SQS_SECOND)==NO # should have only gotten here if we failed above
for entry in entries:
send_message_s3(entry['MessageBody'])
return msg
def watcher(self):
"""Repeatedly call flush().
If the flush gets exit, it returns EXIT and we EXIT.
"""
while True:
if self.flush(timeout=WATCHER_TIME)==EXIT:
return
def queue_message(self, *, MessageBody, **kwargs):
self.pyqueue.put({'MessageBody':MessageBody})
def terminate(self):
"""Tell the watcher to exit"""
self.flush()
self.pyqueue.put(EXIT)
However, we were still unsatisfied with this, as emptying the SQS queue was also slow, and there is poor visibility into the queues.
We developed a system that used S3 as a message queue. Create objects with a given bucket and prefix and then a random string, and then remove them in the consumer. Different consumers used different prefixes of the random string.
We implemented a traditional system with HTTP REST and with the python server running under mod_wsgi. This was the most performant.

Rate-limiting a Worker for a Queue (e.g.: SQS)

Every day, I will have a CRON task run which populates an SQS queue with a number of tasks which needs to be achieved. So (for example) at 9AM every morning, and empty queue will receive ~100 messages that will need to be processed.
I would like a new worker to be spun up every second until the queue is empty. If any task fails, it's put at the back of the queue to be re-run.
For example, if each task takes up to 1.5 seconds to complete:
after 1 second, 1 worker will have started message A
after 2 seconds, 1 worker may still be running message A and 1 worker will have started running message B
after 100 seconds, 1 worker may still be running message XX and 1 worker will pick up message B because it failed previous
after 101 seconds, no more workers are propagated until the next morning
Is there any way to have this type of infrastructure configured within AWS lambda?

One way, though I'm not convinced it's optimal:
A lambda that's triggered by an CloudWatch Event (say every second, or every 10 seconds, depending on your rate limit). Which polls SQS to receive (at most) N messages, it then "fans-out" to another Lambda function with each message.
Some pseudo code:
# Lambda 1 (schedule by CloudWatch Event / e.g. CRON)
def handle_cron(event, context):
# in order to get more messages, we might have to receive several times (loop)
for message in queue.receive_messages(MaxNumberOfMessages=10):
# Note: the Event InvocationType so we don't want to wait for the response!
lambda_client.invoke(FunctionName="foo", Payload=message.body, InvocationType='Event')
and
# Lambda 2 (triggered only by the invoke in Lambda 1)
def handle_message(event, context):
# handle message
pass

Seems to me you would be better of publishing you messages to SNS, instead of SQS and then have your lambda functions subscribe to the SNS topic.
Let Lambda worry about how many 'instances' it needs to spinup in response to the load.
Here is one blog post on this method, but google may help you find one that is closer to your actual use case.
https://aws.amazon.com/blogs/mobile/invoking-aws-lambda-functions-via-amazon-sns/

Why not just have a Lambda function that starts polling sqs at 9am, getting one message at a time and sleeping for a second between each message? Dead letter queues can handle retries. Stop execution after not receiving a message from SQS after x seconds.
It is a unique case where you don't actually want parallel processing.

Subscribing to AWS SQS Messages

I have large number of messages in AWS SQS Queue. These messages will be pushed to it constantly by other source. There are no proper dynamic on how often those messages will be pushed to queue. Currently, I keep polling SQS every second and checking if there are any messages available in there. Is there any better way of handling this, like receiving notification from SQS or SNS that some messages are available so that I only request SQS when I needed instead of constant polling?

The way to do what you want is to use long polling - rather than constantly poll every second, you open a request that stays open until it either times out or a message comes into the queue. Take a look at the documentation for ReceiveMessageRequest
ReceiveMessageRequest req = new ReceiveMessageRequest()
.withWaitTimeSeconds(Integer.valueOf(20)); // set long poll timeout to 20 sec
// set other properties on the request as well
ReceiveMessageResult result = amazonSQS.receiveMessage(req);
A common usage pattern for this is to have a background thread running the long poll and pushing the results into an internal queue (such as LinkedBlockingQueue or an ExecutorService) for a worker thread to read from.
PS. Don't forget to call deleteMessage once you're done processing the result so you don't end up receiving it again.

You can also use the worker functionality in AWS Elastic Beanstalk. It allows you to build a worker to process each message, and when you use Elastic Beanstalk to deploy it to an EC2 instance, you can define it as subscribed to a specific queue. Then each message will be POST to the worker, without your need to call receive-message on it from the queue.
It makes your system wiring much easier, as you can also have auto scaling rules that will allow you to spawn multiple workers to handle more messages in time of peak load, and scale down back to a single worker, when the load is low. It will also delete the message automatically, if you respond with OK from your worker.
See more information about it here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html

You could also have a look at Shoryuken and the property delay:
delay: 25 # The delay in seconds to pause a queue when it's empty
But being honest we use delay: 0 here, the cost of SQS is inexpensive:
First 1 million Amazon SQS Requests per month are free
$0.50 per 1 million Amazon SQS Requests per month thereafter ($0.00000050 per SQS Request)
A single request can have from 1 to 10 messages, up to a maximum total payload of 256KB.
Each 64KB ‘chunk’ of payload is billed as 1 request. For example, a single API call with a 256KB payload will be billed as four requests.
You will probably spend less than 10 dollars monthly polling messages every second 24x7 in a single host.
One of the advantages of Shoryuken is that it fetches in batch, so it saves some money compared with a fetch per message solutions.

Efficient method to read messages from sqs without polling consecutively

I am very new to AWS SQS queues and I am currently playing around with python and boto.
Now I am able to read messages from SQS by polling consecutively.
The script is as follows:
while 1:
m = q.read(wait_time_seconds=10)
if m:
print m
How do I make this script constantly listen for new additions to the queue without using while loop?
Is there a way to write a Python consumer for SQS that doesn't have to poll periodically for new messages?

Not really... that's how SQS works. If a message arrives during the wait, it will be returned almost immediately.
This is not the inefficient operation that it seems like.
If you increase your timeout to the max allowed 20 seconds, then, worst case, you will generate no more than about 3 x 60 x 24 x 30 = 129,600 "empty" polls per month... × $0.00000050 per poll = $0.0648. (The first 1,000,000 requests are billed at $0.)
Note that during the timeout, if a new message arrives, it will return almost immediately, not wait the full 20 sec.

How to handle large Emailing queue and delivery with AWS SES?

We are developing an app. that need to handle large email queues. We have planned to store emails in a SQS queue and use SES to send emails. but a bit confused on how to actually handle the queue and process queue. should I use cronjob to regularly read the SQS queue and send emails? What would be the best way to actually trigger the script that will be emailing from our app?

Using SQS with SES is a great way to handle this. If something goes wrong while emailing the request will still be on the queue and will be processed next time around.
I just use a cron job that starts my queue processing/email sending job once an hour. The job runs for an hour as a simple loop:
while i've been running < 1 hour:
if there's a message in the queue:
process the message
delete the message from the queue
I set the WaitTimeSeconds parameter to the maximum (20 seconds) so that the check for a new message will wait a while for a new message if necessary so that the job isn't hitting AWS every few milliseconds. Otherwise, I could put a sleep statement of some kind in the loop.
The reason I run for just an hour is that the job might encounter some error that kills it, or have a memory leak, or some other unanticipated problem. This way any queued email requests will still get handled the next time the job is started.
If you want, you can start the job every fifteen minutes so you'll always have four worker processes handling queue requests. If one of them dies for some reason, you'll still be processing with the other three.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js