Adding a view counter to objects using Django Celery

Adding a view counter to objects using Django Celery - django

Not that it matters, but this is a follow-up to this question.
I want to keep a count of how many times each object in my database has been viewed. Let's say we have a Person model, with several instances. We want to keep a counter of how many times each instance has been viewed in the Person model, using Django Celery to avoid waiting for the database writes.
At the moment I'm doing this:
from celery.decorators import task
class Person(models.Model):
name = models.CharField(max_length=50)
class Stats(models.Model):
person = models.ForeignKey('Person', unique=True)
#task
def addView(self):
se = StatEvent()
se.save()
self.views.add(se)
class StatEvent(models.Model):
date = models.DateTimeField(auto_now_add=True)
Then, every time the view is called which lists a page of persons, I get all persons, update the statistics like this:
person.get_stats().addView.delay(person.get_stats())
where after I then return the list of persons to be displayed in the browser. I thought that the updating of the statistics would happen asynchronously, but there's a clear and long delay before the page is displayed, which is confirmed by having a print statement shown for each addition in the Celery command window. The page is only rendered once the last statistic has been updated.
How do I ensure that the user doesn't wait for the database update to finish?
Update
I thought it might have something to do with there not being enough worker processes to process each person separately, so I instead made a function that accepts a list of persons as parameter, and used this as the task to be executed. So, only one task in the queue:
#task(ignore_result=True)
def addViews(persons):
for person in persons:
stats = listing.get_stats()
se = StatEvent()
se.save()
stats.views.add(se)
However, when print to the console, like this:
print "adding"
print tasks.addClicks(persons)
print "done"
Then there's a clear delay between the "adding" and "done" step, and the returned value of the function is None.

Turns out my original suspicion about there not being enough workers was correct. My new function that put everything in one task solved the problem - I was just missing the .delay in that last tasks.addClicks(persons) call.

Related

Search 2 tables simultaneously using flask-executor

I have 2 large postgres tables which have an index so that I can perform a full text search on each.
Typically, they look like:
class Post_1(db.Model):
query_class = PostQuery
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String)
content = db.Column(db.Text)
datestamp = db.Column(db.Float)
search_vector = db.Column(TSVectorType('title', 'content'))
and
class Post_2(db.Model):
query_class = PostQuery
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String)
content = db.Column(db.Text)
datestamp = db.Column(db.Float)
search_vector = db.Column(TSVectorType('title', 'content'))
In my flask application, to get the documents which have a specific keyword in one of the tables, I would do:
Post_1.query.search(keyword).\
order_by(Post_1.datestamp.desc()).limit(1)
Since I want to run the same search simultaneously on both tables, I wanted to use flask-executor and wrote the following code:
from flask_executor import Executor
executor = Executor(app)
futures=[]
keyword = "covid"
future=executor.submit(Post_1.query.search(keyword).\
order_by(Post_1.datestamp.desc()).limit(1))
futures.append(future)
future = executor.submit(Post_2.query.search(keyword).\
order_by(Post_2.datestamp.desc()).limit(1))
futures.append(future)
This does not work and I get the following error:
RuntimeError: This decorator can only be used at local scopes when a request context is on the stack. For instance within view functions.
Could anyone help me please?

The error you're getting is because flask-executor is intended to run tasks inside view functions - that is, as part of a request from a user. You're running your code outside of a view (i.e. outside of a scope that would normally be in place when your user is interacting with your application).
Do you need to do this, or is this simply part of a test? If you do something like this, just to test it out:
#app.route('/test')
def testroute():
future=executor.submit(Post_1.query.search(keyword).\
order_by(Post_1.datestamp.desc()).limit(1))
futures.append(future)
future = executor.submit(Post_2.query.search(keyword).\
order_by(Post_2.datestamp.desc()).limit(1))
futures.append(future)
Then you should no longer get the error about running outside of a request context, because the code will be running as part of a request (i.e. inside a view function).
As a side note, the SQLAlchemy tasks you're submitting aren't callable - they're not function objects. Executors, whether the one created by Flask-Executor or the vanilla ones you can get via concurrent.futures, expect you to pass a "callable". I suspect your code still wouldn't work unless it was something like:
query = Post_1.query.search(keyword).\
order_by(Post_1.datestamp.desc()).limit(1).all
future = executor.submit(query)
Notice the lack of brackets at the end, because I want to use the callable object itself, not the result it will return
The executor would then "call" the object that had been passed:
executor.submit(Post_1.query.search(keyword).\
order_by(Post_1.datestamp.desc()).limit(1).all()

Django when looping over a queryset, when does the db read happen?

I am looping through my database and updating all my Company objects.
for company in Company.objects.filter(updated=False):
driver.get(company.company_url)
company.adress = driver.find_element_by_id("address").text
company.visited = True
company.save()
My problem is that it's taking too long so I wanted to run another instance of this same code, but I'm curious when the actual db reads happen. If company.visited get's changed to True while this loop is running, will still be visited by this loop? What if I added a second check for visited? I don't want to start a second loop if the first instance isn't going to recognize the work of the second instance:
for company in Company.objects.filter(updated=False):
if company.visited:
continue
driver.get(company.company_url)
company.adress = driver.find_element_by_id("address").text
company.visited = True
company.save()

Company.objects.filter(updated=False) translates to an ordinary SQL query:
SELECT * FROM appName_company WHERE updated is false
This SQL query is executed when you start iterating through Company objects. It's executed only once. The second server will not recognize the work of the first one, because they both will go through the same Company objects.
Lock rows to avoid race conditions using atomic transactions and select_for_update():
from django.db import transaction
for company in Company.objects.filter(updated=False):
with transaction.atomic():
Company.objects.select_for_update().get(id=company.id)
if company.visited:
continue
driver.get(company.company_url)
company.adress = driver.find_element_by_id("address").text
company.visited = True
company.save()
You can run this code on multiple servers. Each Company will be processed just once.
If you need to execute this code regularly, I highly recommend using Celery. Dispatch a task per each company, and let multiple workers do the work in parallel:
from celery import shared_task
#shared_task
def dispatch_tasks():
for company in Company.objects.filter(updated=False):
process_company.delay(company.id)
#shared_task
#transaction.atomic
def process_company(company_id):
company = Company.objects.select_for_update().get(id=company_id)
if company.visited:
continue
driver.get(company.company_url)
company.adress = driver.find_element_by_id("address").text
company.visited = True
company.save()
Edit: oh, I see that you've tagged the question with the sqlite tag. I recommend switching to PostgreSQL, as SQLite is really bad at concurrency. My answer should work with SQlite, but locks may slow down the database.

Is it possible for a queryset filter to not work properly? Django 1.11.12

I ran the code below... and somehow... I got a couple of campaigns that were not active and were passed into the update_existing_campaigns. I am a little mindblown as to how this happened... it was also only a couple of inactive campaigns that were passed in, not all of them. How is this possible?
all_campaigns = AdCampaign.objects.all()
active_campaigns = AdCampaign.objects.filter(active=True)
update_existing_campaigns(active_campaigns)
def update_existing_campaigns(active_campaigns):
# only update active campaigns that haven't already been run today. used to handle failures in middle of job
active_campaigns_to_update = active_campaigns.filter(last_auto_update__lt=datetime.date.today())
for active_campaign in active_campaigns_to_update:
# do something

How do I build a queryset in django retrieving threads in which a user has posted?

I'm making a threaded forum app using django-mptt. Everything is up and running, but I have trouble building one specific queryset.
I want to retrieve posts which:
are root nodes
are posted by current_user or have a descendant posted by current_user.
What I have so far is this:
Post.objects.filter(Q(user = current_user) | Q( )).exclude(parent__gt = 0)
in my second Q I need something to tell whether the current_user has posted one of its descendants. Anyone know if it's even possible?

I don't think you can do this in one query. Here's how to do it in two:
thread_ids = Post.objects.filter(user=current_user).values_list('tree_id', flat=True)
posts = Post.objects.filter(tree_id__in=thread_ids, level=0)
This gets the MPTT tree id of every thread which the user has posted in. Then it gets the root node of each of these threads.

How do I deal with this race condition in django?

This code is supposed to get or create an object and update it if necessary. The code is in production use on a website.
In some cases - when the database is busy - it will throw the exception "DoesNotExist: MyObj matching query does not exist".
# Model:
class MyObj(models.Model):
thing = models.ForeignKey(Thing)
owner = models.ForeignKey(User)
state = models.BooleanField()
class Meta:
unique_together = (('thing', 'owner'),)
# Update or create myobj
#transaction.commit_on_success
def create_or_update_myobj(owner, thing, state)
try:
myobj, created = MyObj.objects.get_or_create(owner=user,thing=thing)
except IntegrityError:
myobj = MyObj.objects.get(owner=user,thing=thing)
# Will sometimes throw "DoesNotExist: MyObj matching query does not exist"
myobj.state = state
myobj.save()
I use an innodb mysql database on ubuntu.
How do I safely deal with this problem?

This could be an off-shoot of the same problem as here:
Why doesn't this loop display an updated object count every five seconds?
Basically get_or_create can fail - if you take a look at its source, there you'll see that it's: get, if-problem: save+some_trickery, if-still-problem: get again, if-still-problem: surrender and raise.
This means that if there are two simultaneous threads (or processes) running create_or_update_myobj, both trying to get_or_create the same object, then:
first thread tries to get it - but it doesn't yet exist,
so, the thread tries to create it, but before the object is created...
...second thread tries to get it - and this obviously fails
now, because of the default AUTOCOMMIT=OFF for MySQLdb database connection, and REPEATABLE READ serializable level, both threads have frozen their views of MyObj table.
subsequently, first thread creates its object and returns it gracefully, but...
...second thread cannot create anything as it would violate unique constraint
what's funny, subsequent get on the second thread doesn't see the object created in the first thread, due to the frozen view of MyObj table
So, if you want to safely get_or_create anything, try something like this:
#transaction.commit_on_success
def my_get_or_create(...):
try:
obj = MyObj.objects.create(...)
except IntegrityError:
transaction.commit()
obj = MyObj.objects.get(...)
return obj
Edited on 27/05/2010
There is also a second solution to the problem - using READ COMMITED isolation level, instead of REPEATABLE READ. But it's less tested (at least in MySQL), so there might be more bugs/problems with it - but at least it allows tying views to transactions, without committing in the middle.
Edited on 22/01/2012
Here are some good blog posts (not mine) about MySQL and Django, related to this question:
http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
http://www.no-ack.org/2011/05/broken-transaction-management-in-mysql.html

Your exception handling is masking the error. You should pass a value for state in get_or_create(), or set a default in the model and database.

One (dumb) way might be to catch the error and simply retry once or twice after waiting a small amount of time. I'm not a DB expert, so there might be a signaling solution.

Since 2012 in Django we have select_for_update which lock rows until the end of the transaction.
To avoid race conditions in Django + MySQL
under default circumstances:
REPEATABLE_READ in the Mysql
READ_COMMITTED in the Django
you can use this:
with transaction.atomic():
instance = YourModel.objects.select_for_update().get(id=42)
instance.evolve()
instance.save()
The second thread will wait for the first thread (lock), and only if first is done, the second will read data saved by first, so it will work on updated data.
Then together with get_or_create:
def select_for_update_or_create(...):
instance = YourModel.objects.filter(
...
).select_for_update().first()
if order is None:
instnace = YouModel.objects.create(...)
return instance
The function must be inside transaction block, otherwise, you will get from Django:
TransactionManagementError: select_for_update cannot be used outside of a transaction
Also sometimes it's good to use refresh_from_db()
In case like:
instance = YourModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.attr = response.something
you'd like to see:
instance = MyModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.refresh_from_db() # 3
instance.attr = response.something
and that # 3 will reduce a lot a time window of possible race conditions, thus chance for that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Adding a view counter to objects using Django Celery - django

Turns out my original suspicion about there not being enough workers was correct. My new function that put everything in one task solved the problem - I was just missing the .delay in that last tasks.addClicks(persons) call.

Related

Search 2 tables simultaneously using flask-executor

Django when looping over a queryset, when does the db read happen?

Is it possible for a queryset filter to not work properly? Django 1.11.12

How do I build a queryset in django retrieving threads in which a user has posted?

How do I deal with this race condition in django?

Categories

Resources