task queue in Appengine (using NDB) stopping another function from updating data - python-2.7

cred_query = credits_tbl.query(ancestor=user_key).fetch(1)
for q in cred_query:
q.total_credits = q.total_credits + credits_bought
q.put()
I have a task running which is constantly updating a users total_credits in the credits table.
While that task runs the user can also buy additional credits at any point (as shown in the code above) to add to the total. However, when they try to do so, it does not update the total_credits in the credits table.
I guess I don't understand the 'strongly consistent' modelling of appengine (using ndb) as well as I thought.
Do you know why this happens?

Related

Celery: dynamic queues at object level

I'm writing a django app to make polls which uses celery to put under control the voting system. Right now, I have two queues, default and polls, the first one with concurrency set to 8 and the second one set to 1.
$ celery multi start -A myproject.celery default polls -Q:default default -Q:polls polls -c:default 8 -c:polls 1
Celery routes:
CELERY_ROUTES = {
'polls.tasks.option_add_vote': {
'queue': 'polls',
},
'polls.tasks.option_subtract_vote': {
'queue': 'polls',
}
}
Task:
#app.task
def option_add_vote(pk):
"""
Updates given option id and its poll increasing vote number by 1.
"""
option = Option.objects.get(pk=pk)
try:
with transaction.atomic():
option.vote_quantity += 1
option.save()
option.poll.total_votes += 1
option.poll.save()
except IntegrityError as exc:
raise self.retry(exc=exc)
The option_add_vote method (task) updates the poll-object vote-number value adding 1 to the previous value. So, to avoid concurrency problems, I set the poll queue concurrency to 1. This allow the system to handle thousand of vote requests to be completed successfully.
The problem will be, as I can imagine, a bottle-neck when the system grows up.
So, I was thinking about some kind of dynamic queues where all vote requests to any options of a certain poll where routered to a custom queue. I think this will make the system more reliable and fast.
What do you think? How can I make it?
EDIT1:
I got a new idea thanks to Paul and Plahcinski. I'm storing the votes as objects in their own model (a user-options relationship). When someone votes an option it creates an object from this model, allowing me to count how many votes an option has. This free the system from the voting-concurrency problem, so it could be executed in parallel.
I'm thinking about using CELERYBEAT_SCHEDULE to cron a task that updates poll options based on the result of Vote.objects.get(pk=pk).count(). Maybe I could execute it every hour or do partial updates for those options that are getting new votes...
But, how do I give to the clients updated options in real time?
As Plahcinski says, I can have a cached value for my options in Redis (or any other mem-cached system?) and use it to temporally store this values, giving to any new request the cached value.
How can I mix this with my standar values in django models? Anyone could give me some code references or hints?
Am I in the good way or did I make mistakes?
What I would do is remove your incrementation for the database and move to redis and use the database model as your cached value. Have a celery beat that updates recently incremented redis keys to your database
http://redis.io/commands/INCR
What about just having a simple model that stores vote -1/+1 integers then a celery task that reconciles those with the FK object for atomic transactions and updates?

Is there any way to define task quota in celery?

I have requirements:
I have few heavy-resource-consume task - exporting different reports that require big complex queries, sub queries
There are lot users.
I have built project in django, and queue task using celery
I want to restrict user so that they can request 10 report per minute. The idea is they can put hundreds of request 10 minute, but I want celery to execute 10 task for a user. So that every user gets their turn.
Is there any way so that celery can do this?
Thanks
Celery has a setting to control the RATE_LIMIT (http://celery.readthedocs.org/en/latest/userguide/tasks.html#Task.rate_limit), it means, the number of task that could be running in a time frame.
You could set this to '100/m' (hundred per second) maning your system allows 100 tasks per seconds, its important to notice, that setting is not per user neither task, its per time frame.
Have you thought about this approach instead of limiting per user?
In order to have a 'rate_limit' per task and user pair you will have to do it. I think (not sure) you could use a TaskRouter or a signal based on your needs.
TaskRouters (http://celery.readthedocs.org/en/latest/userguide/routing.html#routers) allow to route tasks to a specify queue aplying some logic.
Signals (http://celery.readthedocs.org/en/latest/userguide/signals.html) allow to execute code in few well-defined points of the task's scheduling cycle.
An example of Router's logic could be:
if task == 'A':
user_id = args[0] # in this task the user_id is the first arg
qty = get_task_qty('A', user_id)
if qty > LIMIT_FOR_A:
return
elif task == 'B':
user_id = args[2] # in this task the user_id is the seconds arg
qty = get_task_qty('B', user_id)
if qty > LIMIT_FOR_B:
return
return {'queue': 'default'}
With the approach above, every time a task starts you should increment by one in some place (for example Redis) the pair user_id/task_type and
every time a task finishes you should decrement that value in the same place.
Its seems kind of complex, hard to maintain and with few failure points for me.
Other approach, which i think could fit, is to implement some kind of 'Distributed Semaphore' (similar to distributed lock) per user and task, so in each task which needs to limit the number of task running you could use it.
The idea is, every time a task which should have 'concurrency control' starts it have to check if there is some resource available if not just return.
You could imagine this idea as below:
#shared_task
def my_task_A(user_id, arg1, arg2):
resource_key = 'my_task_A_{}'.format(user_id)
available = SemaphoreManager.is_available_resource(resource_key)
if not available:
# no resources then abort
return
try:
# the resourse could be acquired just before us for other
if SemaphoreManager.acquire(resource_key):
#execute your code
finally:
SemaphoreManager.release(resource_key)
Its hard to say which approach you SHOULD take because that depends on your application.
Hope it helps you!
Good luck!

copying rather than modifying a job (APScheduler)

I'm writing a database-driven application with APScheduler (v3.0.0). Especially during development, I find myself frequently wanting to command a scheduled job to start running now without affecting its subsequent schedule.
It's possible to do this at job creation time, of course:
def dummy_job(arg):
pass
sched.add_job(dummy_job, trigger='interval', hours=3, args=(None,))
sched.add_job(dummy_job, trigger=None, args=(None,))
However, if I already have a job scheduled with an interval or date trigger...
>>> sched.print_jobs()
Jobstore default:
job1 (trigger: interval[3:00:00], next run at: 2014-08-19 18:56:48 PDT)
... there doesn't seem to be a good way to tell the scheduler "make a copy of this job which will start right now." I've tried sched.reschedule_job(trigger=None), which schedules the job to start right now, but removes its existing trigger.
There's also no obvious, simple way to duplicate a job object while preserving its args and any other stateful properties. The interface I'm imagining is something like this:
sched.dup_job(id='job1', new_id='job2')
sched.reschedule_job('job2', trigger=None)
Clearly, APScheduler already contains an internal mechanism to copy job objects since repeated calls to get_job don't return the same object (that is, (sched.get_job(id) is sched.get_job(id))==False).
Has anyone else come up with a solution here? I'm thinking of posting a suggestion on the developers' site if not.
As you've probably figured out by now, that phenomenon is caused by the job stores instantiating jobs on the fly based on data loaded from the back end. To run a copy of a job immediately, this should do the trick:
job = sched.get_job(id)
sched.add_job(job.func, args=job.args, kwargs=job.kwargs)

Getting task_ids for all tasks created with celery chord

My goal is to retrieve all the task_ids from a django celery chord call so that I can revoke the tasks later if needed. However, I cannot figure out the correct method to retrieve the task ids. I execute the chord as:
c = chord((loadTask.s(i) for i in range(0, num_lines, CHUNK_SIZE)), finalizeTask.si())
task_result = c.delay()
# get task_ids
I examined the task_result's children variable, but it is None.
I can manual create the chord semantics by using a group and another task as follows, and retrieve the associated task_ids, but I do not like breaking up the call. When this code is run within a task as subtasks, it can cause the main task to hang when the group is revoked before the finalize task begins.
g = group((loadTask.s(i) for i in range(0, num_lines, CHUNK_SIZE)))
task_result = g.delay()
storeTaskIds(task_result.children)
task_result.get()
task_result2 = self.finalizeTask.delay()
storeTaskIds([task_result2.task_id])
Any thoughts would be appreciated!
I'm trying to do something similar, I was hoping I could just revoke the chord with one call and everything within it would be recursively revoked for me.
You could make a chord out of the group and your finalizeTask to keep from breaking up the calls.
I realize this is coming two months after you asked, but maybe it'll help someone and maybe I should just get the task ids of everything in my group.

SharePoint List item BRIEFLY appears as edited by 'System Account' after item.update

I have a shopping cart like application running on SharePoint 2007.
I'm running a very standard update procedure on a list item:
using (SPWeb web = site.OpenWeb())
{
web.AllowUnsafeUpdates = true;
SPList list = web.Lists["Quotes"];
SPListItem item = list.GetItemById(_id);
item["Title"] = _quotename;
item["RecipientName"] = _quotename;
item["RecipientEmail"] = recipientemail;
item["IsActive"] = true;
item.Update();
site.Dispose();
}
This item updates properly, however it briefly appears as modified by System Account. If I wait a second and refresh the page, it shows up again as modified by CurrentUser.
This is an issue because on Page_Load I am retrieving the item that is marked as Active AND is listed as Modified By the CurrentUser. This means as a user updates his list, when the PostBack finishes, it shows he has no active items.
Is it the web.AllowUnsafeUpdates? This is necessary because I was getting a security error before.
What am I missing?
First off, it's not AllowUnsafeUpdates. This simply allows modifying of items from your code.
It's a bit hard to tell what's going on without understanding more of the flow of your application. I would suggest though that using Modified By to associate an item to a user may not be a great idea. This means, as you have discovered, that any modification by the system or even potentially an administrator will break that link.
I would store the current user in a custom field. That should solve your problem and would be a safer design choice.
There could be some other code running in Event Receivers and updating the item. Because event recievers runs in context of system user account, and if you update item from event reciever, the modified field will show just that: the system account has modified the item.