Django queryset to match all related objects - django

Let's say I have a ForeignKey from Coconut to Swallow (ie, a swallow has carried many coconuts, but each coconut has been carried by only one swallow). Now let's say that I have a ForeignKey from husk_segment to Coconut.
Now, I have a list of husk_segments and I want to find out if ALL of these have been carried gripped by a particular swallow.
I can have swallow.objects.filter(coconuts_carried__husk_sements__in = husk_segment_list) to show that this swallow has gripped at least one husk segment in the list. Now, how can I show that every husk segment that the swallow has ever carried are in this list?

I can have swallow.objects.filter(coconuts_carried__husk_sements__in =
husk_segment_list) to show that this swallow has gripped at least one
husk segment in the list.
No, this is wrong, this gives you a list of swallows which have carried at least one husk segment from *husk_segment_list*.
If I've understood right, we are talking about checking for a specific swallow.
So, from your description I guess your models look something like this:
class Swallow(models.Model):
name = models.CharField(max_length=100)
class Coconut(models.Model):
swallow = models.ForeignKey(Swallow, related_name='coconuts_carried')
class HuskSegment(models.Model):
coconut = models.ForeignKey(Coconut, related_name='husk_segments')
If you already have the husk segment list you need to check againts the swallows segments, there's no reason you need to resolve it in a query. Get the swallows' segments and check if it's a superset of your husk segment list.
So we have:
#husk_segment_list = [<object: HuskSegment>, <object: HuskSegment>, <object: HuskSegment>...]
husk_segments_set = set((husk.pk for husk in husk_segment_list))
whitey = Swallow.object.get(name='Neochelidon tibialis')
wh_segments_set = set((value[0] for value in HuskSegment.objects.filter(coconut__in=whitey.coconuts_carried.all()).values_list('id')))
whitey_has_carried_all = wh_segments_set.issuperset(husk_segments_set)

See the docs on queries spanning multi-valued relationships -- you should chain filter calls.
A simple way to go would be something like
queryset = Swallow.objects.all()
for coconut in coconuts:
queryset = queryset.filter(coconuts_carried=coconut)
A fancy way to do this in one line using reduce would be
reduce(lambda q, c: q.filter(coconuts_carried=c), coconuts, Swallow.objects.all())

If i understood your altered question correctly you should be able to compare the coconut_carried_set of the swallow with the list of coconuts that have been carried.
see docs
I'm not entirely sure that this is what you want - I guess it depends on if you know which swallow you want to check beforehand or if you want to check it against all swallows - in that case there may be a better solution.

Related

Efficiently updating a large number of records based on a field in that record using Django

I have about a million Comment records that I want to update based on that comment's body field. I'm trying to figure out how to do this efficiently. Right now, my approach looks like this:
update_list = []
qs = Comments.objects.filter(word_count=0)
for comment in qs:
model_obj = Comments.objects.get(id=comment.id)
model_obj.word_count = len(model_obj.body.split())
update_list.append(model_obj)
Comment.objects.bulk_update(update_list, ['word_count'])
However, this hangs and seems to time out in my migration process. Does anybody have suggestions on how I can accomplish this?
It's not easy to determine the memory footprint of a Django object, but an absolute minimum is the amount of space needed to store all of its data. My guess is that you may be running out of memory and page-thrashing.
You probably want to work in batches of, say, 1000 objects at a time. Use Queryset slicing, which returns another queryset. Try something like
BATCH_SIZE = 1000
start = 0
base_qs = Comments.objects.filter(word_count=0)
while True:
batch_qs = base_qs[ start: start+BATCH_SIZE ]
start += BATCH_SIZE
if not batch_qs.exists():
break
update_list = []
for comment in batch_qs:
model_obj = Comments.objects.get(id=comment.id)
model_obj.word_count = len(model_obj.body.split())
update_list.append(model_obj)
Comment.objects.bulk_update(update_list, ['word_count'])
print( f'Processed batch starting at {start}' )
Each trip around the loop will free the space occupied by the previous trip when it replaces batch_qs and update_list. The print statement will allow you to watch it progress at a hopefully acceptable, regular rate!
Warning - I have never tried this. I'm also wondering whether slicing and filtering will play nice with each other or whether one should use
base_qs = Comments.objects.all()
...
while True:
batch_qs = base_qs[ start: start+BATCH_SIZE ]
....
for comment in batch_qs.filter(word_count=0) :
so you are slicing your way though rows in the entire DB table and retrieving a subset of each slice that needs updating. This feels "safer". Anybody know for sure?

How to filter a Django queryset, after it is modified in a loop

I have recently been working with Django, and it has been confusing me a lot (although I also like it).
The problem I am facing right now is when I am looping, and in the loop modifying the queryset, in the next loop the .filter is not working.
So let's take the following simplified example:
I have a dictionary that is made from the queryset like this
animal_dict = {chicken: 6,
cows: 7,
fish: 1,
sheep: 2}
The queryset is called self.animals
for key in dict:
if dict[key] < 3:
remove_animal = max(dict, key=dict.get)
remove = self.animals.filter(animal = remove_animal)[-2:]
self.animals = self.animals.difference(remove)
key[replaced_industry] = key[replaced_industry] - 2
What I am trying to do is as follows: my goal is that there needs to be a balance under the animals. So since there are not enough fish, 2 of the animals with the highest n have to go (cows). And then in the second loop - since there are not enough sheep, 2 of the animals with the highest n have to go again (chicken).
Now the first time it loops (with fish), the .filter does exactly as it should. However, when I loop it a second time (for sheep), the remove = self.animals.filter(animal = remove_animal)[-2:] gives me an output is not in line with animal = filter. When I print the remove in the second loop, it returns a list of all different animals (instead of just 1).
After the loops, the dict should look like this: {chicken: 4,
cows: 5,
fish: 1,
sheep: 2}
This because first cow will go down 2 and it is the max, and then chicken will go down 2, as it is then the max
I am definitely missing some Django logic here, but to me this seems very strange. I hope the question is well-understood, else happy to clarify further.
As others have pointed out, every time you call self.animals.filter you are making a request to the database, and this should not be done in a loop.
It isn't very clear what you are trying to achieve, but it seems like you want the number of each type of animal to be (almost?) the same number, and that the only operation you can perform on it is reducing the number of animals.
Its always better to avoid loops if you can.
If you want them all to be the same number, and you can only reduce the number of animals you have, then the best solution would be
fewest_number = min(self.animals.values())
self.animals = {animal: fewest_number for animal in self.animals.keys()}
If you want, to say, allow a tolerance +1 of the fewest animal
fewest_number = min(self.animals.values()) + 1
self.animals = {animal: fewest_number for animal in self.animals.keys()}
If you can increase the number of each type of animal, then you could find the average:
average_number = sum(self.animals.values()) / len(self.animals)
self.animals = {animal: average_number for animal in self.animals.keys()}
Answering my own question here. Apparently it didn't work because only count(), order_by(), values(), values_list() and slicing of union queryset is allowed. You can't filter on union queryset and the same applies to .difference.
More about this here: Django: Filter a Queryset made of unions not working
Because in the second loop it is made into a union queryset, the filter function simply doesn't work. The weird thing is that this doesn't show an error, but will simply not filter, what makes it hard to detect.

list of lists of dictionaries?

I need to create a structure, in my mind similar to an array of linked lists (where a python list = array and dictionary = linked list). I have a list called blocks, and this is something like what I am looking to make:
blocks[0] = {dictionary},{dictionary},{dictionary},...
blocks[1] = {dictionary},{dictionary},{dictionary},...
etc..
currently I build the blocks as such:
blocks = []
blocks.append[()]
blocks.append[()]
blocks.append[()]
blocks.append[()]
I know that must look ridiculous. I just cannot see in my head what that just made, which is part of my problem. I assign to a block from a different list of dictionary items. Here is a brief overview of how a single block is created...
hold = {}
hold['file']=file
hold['count']=count
hold['mass']=mass_lbs
mg1.append(hold)
##this append can happen several times to mg1
blocks[i].append(mg1[j])
##where i is an index for the block I want to append to, and j is the list index corresponding to whichever dictionary item of mg1 I want to grab.
The reason I want these four main indices in blocks is so that I have shorter code with just the one list instead of block1 block2 block3 block4, which would just make the code way longer than it is now.
Okay, going off of what was discussed in the comments, you're looking for a simple way to create a structure that is a list of four items where each item is a list of dictionaries, and all the dictionaries in one of those lists have the same keys but not necessarily the same values. However, if you know exactly what keys each dictionary will have and that never changes, then it might be worth it to consider making them classes that wrap dictionaries and have each of the four lists be a list of objects. This would be easier to keep in your head, and a bit more Pythonic in my opinion. You also gain the advantage of ensuring that the keys in the dictionary are static, plus you can define helper methods. And by emulating the methods of a container type, you can still use dictionary syntax.
class BlockA:
def __init__(self):
self.dictionary = {'file':None, 'count':None, 'mass':None }
def __len__(self):
return len(self.dictionary)
def __getitem__(self, key):
return self.dictionary[key]
def __setitem__(self, key, value):
if key in self.dictionary:
self.dictionary[key] = value
else:
raise KeyError
def __repr__(self):
return str(self.dictionary)
block1 = BlockA()
block1['file'] = "test"
block2 = BlockA()
block2['file'] = "other test"
Now, you've got a guarantee that all instances of your first block object will have the same keys and no additional keys. You can make similar classes for your other blocks, or some general class, or some mix of the two using inheritance. Now to make your data structure:
blocks = [ [block1, block2], [], [], [] ]
print(blocks) # Or "print blocks" if you're not using Python 3.x
blocks[0][0]['file'] = "some new file"
print(blocks)
It might also be worthwhile to have a class for this blocks container, with specific methods for adding blocks of each type and accessing blocks of each type. That way you wouldn't trip yourself up with accidentally adding the wrong kind of block to one of the four lists or similar issues. But depending on how much you'll be using this structure, that could be overkill.

filter output of subprocess.check_output

I'm trying to match values of a list to a regex pattern. If the particular value within the list matches, I'll append it to a different list of dicts. If the above mentioned value does not match, I want to remove the value from the list.
import subprocess
def list_installed():
rawlist = subprocess.check_output(['yum', 'list', 'installed']).splitlines()
#print rawlist
for each_item in rawlist:
if "[\w86]" or \
"noarch" in each_item:
print each_item #additional stuff here to append list of dicts
#i haven't done the appending part yet
#the list of dict's will be returned at end of this funct
else:
remove(each_item)
list_installed()
The end goal is to eventually be able to do something similar to:
nifty_module.tellme(installed_packages[3]['version'])
nifty_module.dosomething(installed_packages[6])
Note to gnu/linux users going wtf:
This will eventually grow into a larger sysadmin frontend.
Despite the lack of an actual question in your post, I'll make a couple of comments.
You have a problem here:
if "[\w86]" or "noarch" in each_item:
It's not interpreted the way you think of it and it always evaluates to True. You probably need
if "[\w86]" in each_item or "noarch" in each_item:
Also, I'm not sure what you are doing, but in case you expect that Python will do regex matching here: it won't. If you need that, look at re module.
remove(each_item)
I don't know how it's implemented, but it probably won't work if you expect it to remove the element from rawlist: remove won't be able to actually access the list defined inside list_installed. I'd advise to use rawlist.remove(each_item) instead, but not in this case, because you are iterating over rawlist. You need to re-think the procedure a little (create another list and append needed elements to it instead of removing, for example).

Django schedule: Difference between event.get_occurrence() and event.get_occurrences()?

Based on django-schedule. I can't find the guy who made it.
Pardon me if I'm missing something, but I've been trying to get an events occurrences, preferably for a given day.
When I use event.get_occurrence(date), it always returns nothing. But when I use event.get_occurrences(before_date, after_date), suddenly the occurrences on the previously attempted date show up.
Why won't this work with just one datetime object?
This difference is probably in the actual design of these two methods. Frankly, get_occurrence is rather flawed, in general. A method like this should always return something, even if it's just None, but there's scenarios where it doesn't return at all. Namely, if your event doesn't have an rrule, and the date you passed get_occurrence isn't the same as your event's start, then no value is returned.
There's not really anything that can be done about that. It's just flawed code.
Based on the above comment especially in the case when the event doesn't return and occurrence, the following snippet below can force the retrieval of an occurrence especially when you are sure that it exists
from dateutil.relativedelta import relativedelta
def custom_get_occurrence(event,start_date):
occurrence = event.get_occurrence(start_date)
if occurrence is None:
occurrences = event.get_occurrences(start_date, start_date+relative_delta(months=3)
result = filter(lambda x: x.start==start_date,occurrences)
occurence = result[0]
The above code resolves issue that might occur when the default get_occurrence doesn't return a result.