I need to create LeaderboardEntry if it is not exists. It should be updated with the new value (current + new) if exists. I want to achieve this with a single query. How can I do that?
Current code looking like this: (2 queries)
reward_amount = 50
LeaderboardEntry.objects.get_or_create(player=player)
LeaderboardEntry.objects.filter(player=player).update(golds=F('golds') + reward_amount)
PS: Default value of "golds" is 0.
you can have one query hit less with the defaults :
reward_amount = 50
leader_board, created = LeaderboardEntry.objects.get_or_create(
player=player,
defaults={
"golds": reward_amount,
}
)
if not created:
leader_board.golds += reward_amount
leader_board.save(update_fields=["golds"])
I think your problem is the get_or_create() method so it return back a tuple with two values, (object, created) so you have to recieve them in your code as following:
reward_amount = 50
entry, __ = LeaderboardEntry.objects.get_or_create(player=player)
entry.golds += reward_amount
entry.save()
It will work better than your actual code, just will avoid make two queries.
Of course the save() method will hit again your database.
You can solve this with update_or_create:
LeaderboardEntry.objects.update_or_create(
player=player,
defaults={
'golds': F('golds') + reward_amount
}
)
EDIT:
Sorry, F expressions in update_or_create are not yet supported.
Related
I have about a million Comment records that I want to update based on that comment's body field. I'm trying to figure out how to do this efficiently. Right now, my approach looks like this:
update_list = []
qs = Comments.objects.filter(word_count=0)
for comment in qs:
model_obj = Comments.objects.get(id=comment.id)
model_obj.word_count = len(model_obj.body.split())
update_list.append(model_obj)
Comment.objects.bulk_update(update_list, ['word_count'])
However, this hangs and seems to time out in my migration process. Does anybody have suggestions on how I can accomplish this?
It's not easy to determine the memory footprint of a Django object, but an absolute minimum is the amount of space needed to store all of its data. My guess is that you may be running out of memory and page-thrashing.
You probably want to work in batches of, say, 1000 objects at a time. Use Queryset slicing, which returns another queryset. Try something like
BATCH_SIZE = 1000
start = 0
base_qs = Comments.objects.filter(word_count=0)
while True:
batch_qs = base_qs[ start: start+BATCH_SIZE ]
start += BATCH_SIZE
if not batch_qs.exists():
break
update_list = []
for comment in batch_qs:
model_obj = Comments.objects.get(id=comment.id)
model_obj.word_count = len(model_obj.body.split())
update_list.append(model_obj)
Comment.objects.bulk_update(update_list, ['word_count'])
print( f'Processed batch starting at {start}' )
Each trip around the loop will free the space occupied by the previous trip when it replaces batch_qs and update_list. The print statement will allow you to watch it progress at a hopefully acceptable, regular rate!
Warning - I have never tried this. I'm also wondering whether slicing and filtering will play nice with each other or whether one should use
base_qs = Comments.objects.all()
...
while True:
batch_qs = base_qs[ start: start+BATCH_SIZE ]
....
for comment in batch_qs.filter(word_count=0) :
so you are slicing your way though rows in the entire DB table and retrieving a subset of each slice that needs updating. This feels "safer". Anybody know for sure?
Is there a way in Django to achieve the following in one DB hit (Debug Toolbar shows 2 queries)?
q = SomeModel.objects.filter(name=name).order_by(some_field)
if q.count() == 0:
q = SomeModel.objects.all().order_by(some_field)
I want to check if there are objects with a given name. If yes, then return them. If not, return all objects. All done in one query.
I've checked Subquery, Q, conditional expressions but still don't see how to fit it into one query.
Ok, much as I resisted (I still think it's premature optimization), curiosity got the better of me. This is not pretty but does the trick:
from django.db.models import Q, Exists
name_qset = SomeObject.objects.filter(name=name)
q_func = Q(name_exists=True, name=name) | Q(name_exists=False)
q = SomeModel.objects.annotate(
name_exists=Exists(name_qset)
).filter(q_func).order_by(some_field)
Tried it out and definitely only one query. Interesting to see if it is actually appreciably faster for large datasets...
You best bet is to use .exists(), otherwise your code is fine
q = SomeModel.objects.filter(name=name).order_by(some_field)
if not q.exists():
q = SomeModel.objects.all().order_by(some_field)
In our Django project, there is a view which creates multiple objects (from 5 to even 100). The problem is that creating phase takes a very long time.
Don't know why is that so but I suppose that it could be because on n objects, there are n database lookups and commits.
For example 24 objects takes 67 seconds.
I want to speed up this process.
There are two things I think may be worth to consider:
To create these objects in one query so only one commit is executed.
Create a ThreadPool and create these objects parallel.
This is a part of the view which causes problems (We use Postgres on localhost so connection is not a problem)
#require_POST
#login_required
def heu_import(request):
...
...
product = Product.objects.create(user=request.user,name=name,manufacturer=manufacturer,category=category)
product.groups.add(*groups)
occurences = []
counter = len(urls_xpaths)
print 'Occurences creating'
start = datetime.now()
eur_currency = Currency.objects.get(shortcut='eur')
for url_xpath in urls_xpaths:
counter-=1
print counter
url = url_xpath[1]
xpath = url_xpath[0]
occ = Occurence.objects.create(product=product,url=url,xpath=xpath,active=True if xpath else False,currency=eur_currency)
occurences.append(occ)
print 'End'
print datetime.now()-start
...
return render(request,'main_app/dashboard/new-product.html',context)
Output:
Occurences creating
24
.
.
.
0
End
0:01:07.727000
EDIT:
I tried to put the for loop into the with transaction.atomic(): block but it seems to help only a bit (47 seconds instead of 67).
EDIT2:
I'm not sure but it seems that SQL queries are not a problem:
Please use bulk_create for inserting multiple objects.
occurences = []
for url_xpath in urls_xpaths:
counter-=1
print counter
url = url_xpath[1]
xpath = url_xpath[0]
occurances.append(Occurence(product=product,url=url,xpath=xpath,active=True if xpath else False,currency=eur_currency))
Occurence.objects.bulk_create(occurences)
I have a function that is passed in the id of an object in my class Image. I need the id of the next object in the model. Currently, I am doing it in the least efficient way possible as I need to get all the objects to do this. My current implementation is:
def get_next_id(curr_id):
result = []
Image_list = Image.objects.all()
total = Image.objects.all().count()
for i in range(len(Image_list)):
result.append(Image_list[i].id)
index_curr = result.index(curr_id)
if index_curr == total:
new_index = 0
else:
new_index = index_curr + 1
return Image_list[new_index]
I would be grateful if someone could provide a better way, or make this one more efficient. Thank you.
I would suggest something like this:
def get_next_id(curr_id):
try:
ret = Image.objects.filter(id__gt=curr_id).order_by("id")[0:1].get().id
except Image.DoesNotExist:
ret = Image.objects.aggregate(Min("id"))['id__min']
return ret
This does not take care of the special case where the table is empty, but then you should not have a valid curr_id in the first place if the table is empty. It also does not protect against passing nonsensical values as curr_id.
What this does is get first id which is greater than the current one. The [0:1] slice limits the data returned from the database to the first record: in effect the database is performing the slice rather than Python. If there is no id greater than the current one, then get the lowest one.
In response to your comment about how to do it in reverse:
def get_prev_id(curr_id):
try:
ret = Image.objects.filter(id__lt=curr_id).order_by("-id")[0:1].get().id
except Image.DoesNotExist:
ret = Image.objects.aggregate(Max("id"))['id__max']
return ret
The changes are:
Use id__lt, and order by -id.
Use Max rather than Min for the aggregate, and use the id__max key rather than id__min to get the value.
I want to concatenate two queryset obtained from two different models and i can do it using itertools like this:
ci = ContributorImage.objects.all()
pf = Portfolio.objects.all()
cpf = itertools.chain(ci,pf)
But the real fix is paginating results.If i pass a iterator(cpf, or our concatenated queryset) to Paginator function, p = Paginator(cpf, 10), it works as well but fails at retrieving first page page1 = p.page(1) with an error which says:
TypeError: object of type 'itertools.chain' has no len()
What can i do in case like this ?
The itertools.chain() will return a generator. The Paginator class needs an object implementing __len__ (generators, of course do not support it since the size of the collection is not known).
Your problem could be resolved in a number of ways (including using list to evaluate the generator as you mention) however I recommending taking a look at the QuerySetChain mentioned in this answer:
https://stackoverflow.com/a/432666/119071
I think it fits exactly to your problem. Also take a look at the comments of that answer - they are really enlightening :)
I know it's too late, but because I encountered this error, I would answer to this question.
you should return a list of objects:
ci = ContributorImage.objects.all()
pf = Portfolio.objects.all()
cpf = itertools.chain(ci,pf)
cpf_list = list(cpf)