Data synchronization issue with Django - django

I have the following models.py:
class BagOfApples(models.Model):
quantity = models.PositiveSmallIntegerField(default = 0)
Let's create a “bag of apples” object and put an “apple” in it:
>>> from myapp import models
>>>
>>> models.BagOfApples().save()
>>>
>>> bag1 = models.BagOfApples.objects.get(pk = 1)
>>> bag2 = models.BagOfApples.objects.get(pk = 1)
>>>
>>> bag1.quantity += 1
>>> bag1.save()
>>>
>>> bag1.quantity
1
>>> bag2.quantity
0
Is there a way to automatically reload the data in the bag2 variable?
In my real-world application, things are a bit more complicated than that, and the same objects can be modified in different parts of the code after being retrieved by different database queries. This is not even intentional (for instance, this problem occurs to me because Django caches results for OneToOne or ForeignKey relationships).
It would be really helpful to have some kind of manager (maybe a middleware can do this?) that keeps track of the identical objects.
In this example, the manager would automatically detect that bag2 is actually bag1, and would not execute a second query but simply return bag1.
If there is no good way to do this, any other helpful tips on better application design would be appreciated.

The key is to understand when your query is actually executed. The model object will not repeat the query each time you want to access one of its model fields.
To avoid keeping around objects with stale data, I would make sure to query when I actually need to data. In your example, I'd hold on to the information that you know is not going to change (the pk value), and do a get when you need the object with fresh values.
from myapp import models
models.BagOfApples().save()
bag1 = models.BagOfApples.objects.get(pk = 1)
bag1.quantity += 1
bag1.save()
print bag1.quantity # will print 1
bag2 = models.BagOfApples.objects.get(pk = 1)
print bag2.quantity # will print 1 too

Related

Django - Get Value Only from queryset

When the docs discuss values() and values_list(), or any query for that matter, they always require that you know what you are looking for, i.e.,
>>> Entry.objects.values_list('headline', flat=True).get(pk=1)
'First entry'
What about the situation where you need a value from a specific field, whether on this model or a foreign key, but you don't know the pk or the value in the specified field, and you don't care, you just need whatever is there. How do you query for it?
Alternatively, if I use this example from the docs:
>>> Entry.objects.values_list('id', flat=True).order_by('id')
<QuerySet [1, 2, 3, ...]>
Could I add slice notation to the end of the query? But even then, I might not know in advance which slice I need. In other words, how to dynamically get a value from a specified field without knowing in advance what it or its pk is? Thx.
Depending your scenario (this time a simple query) you have many of options to do it. One is to use a variable as field name. Then, feed that variable dynamically:
>>> field='headline'
>>> Entry.objects.values_list(field, flat=True).get(pk=1)
'First entry'
>>> field='body'
>>> Entry.objects.values_list(field, flat=True).get(pk=1)
'First entry body'
In order to slice results use offset/limit as follows:
Entry.objects.all()[offset:limit]
>>> field='headline'
>>> Entry.objects.values_list(field, flat=True)[5:10]

How can I manually modify models retrieved from the database in Django?

I wish to do something such as the following:
people = People.objects.filter(date=date)
person = people[0]
person['salary'] = 45000
The last line results in an error:
object does not support item assignment
To debug something like this I always find it easier to start with something working and modify line by line until something breaks.
I want to modify the object for rendering in the template. If I try:
person.salary = 45000
There is no error but trying
print person.salary
Immediately afterwards results in the original value being printed. Update:
In my code I was actually doing:
people[0].salary = 45000
Which doesn't work. For some reason
person = people[0]
person.salary = 45000
Does work. I thought the two pieces of code would be exactly the same
person is an object, you need to do like this:
person.salary = 45000
person.save()
You should read How to work with models.
Looking at the IDs, it seems that when you assign an entry to a variable, you get its copy, not its original reference:
In [11]: people = People.objects.filter(salary=100)
In [12]: person = people[0]
In [13]: person.salary = 5000
In [14]: print person.salary
5000
In [15]: people[0].salary
Out[15]: 100
In [16]: id(people[0])
Out[16]: 35312400
In [17]: id(person)
Out[17]: 35313104
So, let's look at what it happens in depth.
You know that in Django QuerySets are evaluated only when you need their results (lazy evaluation). To quote the Django documentation:
Slicing. As explained in Limiting QuerySets, a QuerySet can be sliced,
using Python’s array-slicing syntax. Slicing an unevaluated QuerySet
usually returns another unevaluated QuerySet, but Django will execute
the database query if you use the “step” parameter of slice syntax,
and will return a list. Slicing a QuerySet that has been evaluated
(partially or fully) also returns a list.
In particular, looking at the 'django.db.models.query' source code,
def __getitem__(self, k):
"""
Retrieves an item or slice from the set of results.
"""
# some stuff here ...
if isinstance(k, slice):
qs = self._clone()
if k.start is not None:
start = int(k.start)
else:
start = None
if k.stop is not None:
stop = int(k.stop)
else:
stop = None
qs.query.set_limits(start, stop)
return k.step and list(qs)[::k.step] or qs
qs = self._clone()
qs.query.set_limits(k, k + 1)
return list(qs)[0]
you can see that when you use slicing, you are calling the __getitem__ method.
Then the self._clone method will provide you a different instance of the same QuerySet. This is the reason you are getting different results.
Object-relational mapping that Django models provide hides the fact that you are interacting with a DB, by providing object-oriented interface for retrieving and manipulating data.
Unfortunately, ORM abstraction is not perfect, there are various cases when ORM semantic does not match the intuition. In these cases you need to investigate what's going on on the underlying SQL layer to figure out the cause of troubles.
Your problem arises from the fact that:
people = People.objects.filter(date=date)
does not execute any SQL query.
people[0]
executes SELECT a, b, c, .. FROM T WHERE filter, if you modify the resulting object by calling:
people[0].salary = 45000
the modification won't be saved to the DB, because the save() method was not called. The next call to people[0] again executes a SQL query, which does not return the unsaved modification.
When you encounter problems like this, Django Debug Toolbar can greatly help to identify which statements execute what SQL queries.

Is it possible to use Django F() expressions for atomic updates together with full_clean()?

I've just discovered that fields assigned with Django F() expressions fail to validate. I modified example from the Django doc:
>>> product = Product.objects.get(name='Venezuelan Beaver Cheese')
>>> product.number_sold = F('number_sold') + 1
>>> product.full_clean() # My addition.
>>> product.save()
And I'm getting: ValidationError: {'number_sold': [u"'(+: (DEFAULT: ), 0)' value must be an integer."]}. Indeed, number sold is not an integer, but an instance of django.db.models.expressions.ExpressionNode.
Is there a way around this? All my models extend a base class that automatically calls full_clean() on each save and I really like to keep this base class but be able to use atomic updates.
You problem is that you are trying to save a non integer value in an integer field.
The field number_sold is expected an integer in your case.
I would assume that F('number_sold') references an empty field.
Try to replace F('number_sold') + 1 with something hardcoded such as 1.
If that work, let me know and I can supply logic to avoid your error.

How to 'bulk update' with Django?

I'd like to update a table with Django - something like this in raw SQL:
update tbl_name set name = 'foo' where name = 'bar'
My first result is something like this - but that's nasty, isn't it?
list = ModelClass.objects.filter(name = 'bar')
for obj in list:
obj.name = 'foo'
obj.save()
Is there a more elegant way?
Update:
Django 2.2 version now has a bulk_update.
Old answer:
Refer to the following django documentation section
Updating multiple objects at once
In short you should be able to use:
ModelClass.objects.filter(name='bar').update(name="foo")
You can also use F objects to do things like incrementing rows:
from django.db.models import F
Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)
See the documentation.
However, note that:
This won't use ModelClass.save method (so if you have some logic inside it won't be triggered).
No django signals will be emitted.
You can't perform an .update() on a sliced QuerySet, it must be on an original QuerySet so you'll need to lean on the .filter() and .exclude() methods.
Consider using django-bulk-update found here on GitHub.
Install: pip install django-bulk-update
Implement: (code taken directly from projects ReadMe file)
from bulk_update.helper import bulk_update
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
r = random.randrange(4)
person.name = random_names[r]
bulk_update(people) # updates all columns using the default db
Update: As Marc points out in the comments this is not suitable for updating thousands of rows at once. Though it is suitable for smaller batches 10's to 100's. The size of the batch that is right for you depends on your CPU and query complexity. This tool is more like a wheel barrow than a dump truck.
Django 2.2 version now has a bulk_update method (release notes).
https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update
Example:
# get a pk: record dictionary of existing records
updates = YourModel.objects.filter(...).in_bulk()
....
# do something with the updates dict
....
if hasattr(YourModel.objects, 'bulk_update') and updates:
# Use the new method
YourModel.objects.bulk_update(updates.values(), [list the fields to update], batch_size=100)
else:
# The old & slow way
with transaction.atomic():
for obj in updates.values():
obj.save(update_fields=[list the fields to update])
If you want to set the same value on a collection of rows, you can use the update() method combined with any query term to update all rows in one query:
some_list = ModelClass.objects.filter(some condition).values('id')
ModelClass.objects.filter(pk__in=some_list).update(foo=bar)
If you want to update a collection of rows with different values depending on some condition, you can in best case batch the updates according to values. Let's say you have 1000 rows where you want to set a column to one of X values, then you could prepare the batches beforehand and then only run X update-queries (each essentially having the form of the first example above) + the initial SELECT-query.
If every row requires a unique value there is no way to avoid one query per update. Perhaps look into other architectures like CQRS/Event sourcing if you need performance in this latter case.
Here is a useful content which i found in internet regarding the above question
https://www.sankalpjonna.com/learn-django/running-a-bulk-update-with-django
The inefficient way
model_qs= ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
obj.name = 'foo'
obj.save()
The efficient way
ModelClass.objects.filter(name = 'bar').update(name="foo") # for single value 'foo' or add loop
Using bulk_update
update_list = []
model_qs= ModelClass.objects.filter(name = 'bar')
for model_obj in model_qs:
model_obj.name = "foo" # Or what ever the value is for simplicty im providing foo only
update_list.append(model_obj)
ModelClass.objects.bulk_update(update_list,['name'])
Using an atomic transaction
from django.db import transaction
with transaction.atomic():
model_qs = ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
ModelClass.objects.filter(name = 'bar').update(name="foo")
Any Up Votes ? Thanks in advance : Thank you for keep an attention ;)
To update with same value we can simply use this
ModelClass.objects.filter(name = 'bar').update(name='foo')
To update with different values
ob_list = ModelClass.objects.filter(name = 'bar')
obj_to_be_update = []
for obj in obj_list:
obj.name = "Dear "+obj.name
obj_to_be_update.append(obj)
ModelClass.objects.bulk_update(obj_to_be_update, ['name'], batch_size=1000)
It won't trigger save signal every time instead we keep all the objects to be updated on the list and trigger update signal at once.
IT returns number of objects are updated in table.
update_counts = ModelClass.objects.filter(name='bar').update(name="foo")
You can refer this link to get more information on bulk update and create.
Bulk update and Create

Can Django do nested queries and exclusions

I need some help putting together this query in Django. I've simplified the example here to just cut right to the point.
MyModel(models.Model):
created = models.DateTimeField()
user = models.ForeignKey(User)
data = models.BooleanField()
The query I'd like to create in English would sound like:
Give me every record that was created yesterday for which data is False where in that same range data never appears as True for the given user
Here's an example input/output in case that wasn't clear.
Table Values
ID Created User Data
1 1/1/2010 admin False
2 1/1/2010 joe True
3 1/1/2010 admin False
4 1/1/2010 joe False
5 1/2/2010 joe False
Output Queryset
1 1/1/2010 admin False
3 1/1/2010 admin False
What I'm looking to do is to exclude record #4. The reason for this is because in the given range "yesterday", data appears as True once for the user in record #2, therefore that would exclude record #4.
In a sense, it almost seems like there are 2 queries taking place. One to determine the records in the given range, and one to exclude records which intersect with the "True" records.
How can I do this query with the Django ORM?
You don't need a nested query. You can generate a list of bad users' PKs and then exclude records containing those PKs in the next query.
bad = list(set(MyModel.obejcts.filter(data=True).values_list('user', flat=True)))
# list(set(list_object)) will remove duplicates
# not needed but might save the DB some work
rs = MyModel.objects.filter(datequery).exclude(user__pk__in=bad)
# might not need the pk in user__pk__in - try it
You could condense that down into one line but I think that's as neat as you'll get. 2 queries isn't so bad.
Edit: You might wan to read the docs on this:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#in
It makes it sound like it auto-nests the query (so only one query fires in the database) if it's like this:
bad = MyModel.objects.filter(data=True).values('pk')
rs = MyModel.objects.filter(datequery).exclude(user__pk__in=bad)
But MySQL doesn't optimise this well so my code above (2 full queries) can actually end up running a lot faster.
Try both and race them!
looks like you could use:
from django.db.models import F
MyModel.objects.filter(datequery).filter(data=False).filter(data = F('data'))
F object available from version 1.0
Please, test it, I'm not sure.
Thanks to lazy evaluation, you can break your query up into a few different variables to make it easier to read. Here is some ./manage.py shell play time in the style that Oli already presented.
> from django.db import connection
> connection.queries = []
> target_day_qs = MyModel.objects.filter(created='2010-1-1')
> bad_users = target_day_qs.filter(data=True).values('user')
> result = target_day_qs.exclude(user__in=bad_users)
> [r.id for r in result]
[1, 3]
> len(connection.queries)
1
You could also say result.select_related() if you wanted to pull in the user objects in the same query.