Searching a many to many database using Google Cloud Datastore

Searching a many to many database using Google Cloud Datastore - python-2.7

I am quite new to google app engine. I know google datastore is not sql, but I am trying to get many to many relationship behaviour in it. As you can see below, I have Gif entities and Tag entities. I want my application to search Gif entities by related tag. Here is what I have done;
class Gif(ndb.Model):
author = ndb.UserProperty()
link = ndb.StringProperty(indexed=False)
class Tag(ndb.Model):
name = ndb.StringProperty()
class TagGifPair(ndb.Model):
tag_id = ndb.IntegerProperty()
gif_id = ndb.IntegerProperty()
#classmethod
def search_gif_by_tag(cls, tag_name)
query = cls.query(name=tag_name)
# I am stuck here ...
Is this a correct start to do this? If so, how can I finish it. If not, how to do it?

You can use repeated properties https://developers.google.com/appengine/docs/python/ndb/properties#repeated the sample in the link uses tags with entity as sample but for your exact use case will be like:
class Gif(ndb.Model):
author = ndb.UserProperty()
link = ndb.StringProperty(indexed=False)
# you store array of tag keys here you can also just make this
# StringProperty(repeated=True)
tag = ndb.KeyProperty(repeated=True)
#classmethod
def get_by_tag(cls, tag_name):
# a query to a repeated property works the same as if it was a single value
return cls.query(cls.tag == ndb.Key(Tag, tag_name)).fetch()
# we will put the tag_name as its key.id()
# you only really need this if you wanna keep records of your tags
# you can simply keep the tags as string too
class Tag(ndb.Model):
gif_count = ndb.IntegerProperty(indexed=False)

Maybe you want to use list? I would do something like this if you only need to search gif by tags. I'm using db since I'm not familiar with ndb.
class Gif(db.Model):
author = db.UserProperty()
link = db.StringProperty(indexed=False)
tags = db.StringListProperty(indexed=True)
Query like this
Gif.all().filter('tags =', tag).fetch(1000)

There's different ways of doing many-to-many relationships. Using ListProperties is one way. The limitation to keep in mind if using ListProperties is that there's a limit to the number of indexes per entity, and a limit to the total entity size. This means that there's a limit to the number of entities in the list (depending on whether you hit the index count or entity size first). See the bottom of this page: https://developers.google.com/appengine/docs/python/datastore/overview
If you believe the number of references will work within this limit, this is a good way to go. Considering that you're not going to have thousands of admins for a Page, this is probably the right way.
The other way is to have an intermediate entity that has reference properties to both sides of your many-to-many. This method will let you scale much higher, but because of all the extra entity writes and reads, this is much more expensive.

Related

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.

Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

Filter multiple Django model fields with variable number of arguments

I'm implementing search functionality with an option of looking for a record by matching multiple tables and multiple fields in these tables.
Say I want to find a Customer by his/her first or last name, or by ID of placed Order which is stored in different model than Customer.
The easy scenario which I already implemented is that a user only types single word into search field, I then use Django Q to query Order model using direct field reference or related_query_name reference like:
result = Order.objects.filter(
Q(customer__first_name__icontains=user_input)
|Q(customer__last_name__icontains=user_input)
|Q(order_id__icontains=user_input)
).distinct()
Piece of a cake, no problems at all.
But what if user wants to narrow the search and types multiple words into search field.
Example: user has typed Bruce and got a whole lot of records back as a result of search.
Now he/she wants to be more specific and adds customer's last name to search.So the search becomes Bruce Wayne, after splitting this into separate parts I'm having Bruce and Wayne. Obviously I don't want to search Orders model because order_id is a single-word instance and it's sufficient to find customer at once so for this case I'm dropping it out of query at all.
Now I'm trying to match customer by both first AND last name, I also want to handle the scenario where the order of provided data is random, to properly handle Bruce Wayne and Wayne Bruce, meaning I still have customers full name but the position of first and last name aren't fixed.
And this is the question I'm looking answer for: how to build query that will search multiple fields of model not knowing which of search words belongs to which table.
I'm guessing the solution is trivial and there's for sure an elegant way to create such a dynamic query, but I can't think of a way how.

You can dynamically OR a variable number of Q objects together to achieve your desired search. The approach below makes it trivial to add or remove fields you want to include in the search.
from functools import reduce
from operator import or_
fields = (
'customer__first_name__icontains',
'customer__last_name__icontains',
'order_id__icontains'
)
parts = []
terms = ["Bruce", "Wayne"] # produce this from your search input field
for term in terms:
for field in fields:
parts.append(Q(**{field: term}))
query = reduce(or_, parts)
result = Order.objects.filter(query).distinct()
The use of reduce combines the Q objects by ORing them together. Credit to that part of the answer goes to this answer.

The solution I came up with is rather complex, but it works exactly the way I wanted to handle this problem:
search_keys = user_input.split()
if len(search_keys) > 1:
first_name_set = set()
last_name_set = set()
for key in search_keys:
first_name_set.add(Q(customer__first_name__icontains=key))
last_name_set.add(Q(customer__last_name__icontains=key))
query = reduce(and_, [reduce(or_, first_name_set), reduce(or_, last_name_set)])
else:
search_fields = [
Q(customer__first_name__icontains=user_input),
Q(customer__last_name__icontains=user_input),
Q(order_id__icontains=user_input),
]
query = reduce(or_, search_fields)
result = Order.objects.filter(query).distinct()

django subquery with a join in it

I've got django 1.8.5 and Python 3.4.3, and trying to create a subquery that constrains my main data set - but the subquery itself (I think) needs a join in it. Or maybe there is a better way to do it.
Here's a trimmed down set of models:
class Lot(models.Model):
lot_id = models.CharField(max_length=200, unique=True)
class Lot_Country(models.Model):
lot = models.ForeignKey(Lot)
country = CountryField()
class Discrete(models.Model):
discrete_id = models.CharField(max_length=200, unique=True)
master_id = models.ForeignKey(Inventory_Master)
location = models.ForeignKey(Location)
lot = models.ForeignKey(Lot)
I am filtering on various attributes of Discrete (which is discrete supply) and I want to go "up" through Lot, over the Lot_Country, meaning "I only want to get rows from Discrete if the Lot associated with that row has an entry in Lot_Country for my appropriate country (let's say US.)
I've tried something like this:
oklots=list(Lot_Country.objects.filter(country='US'))
But, first of all that gives me the str back, which I don't really want (and changed it to be lot_id, but that's a hack.)
What's the best way to constrain Discrete through Lot and over to Lot_Country? In SQL I would just join in the subquery (or even in the main query - maybe that's what I need? I guess I don't know how to join up to a parent then down into that parent's other child...)
Thanks in advance for your help.

I'm not sure what you mean by "it gives me the str back"... Lot_Country.objects.filter(country='US') will return a queryset. Of course if you print it in your console, you will see a string.
I also think your models need refactoring. The way you have currently defined it, you can associate multiple Lot_Countrys with one Lot, and a country can only be associated with one lot.
If I understand your general model correctly that isn't what you want - you want to associate multiple Lots with one Lot_Country. To do that you need to reverse your foreign key relationship (i.e., put it inside the Lot).
Then, for fetching all the Discrete lots that are in a given country, you would do:
discretes_in_us = Discrete.objects.filter(lot__lot_country__country='US')
Which will give you a queryset of all Discretes whose Lot is in the US.

Counting the number of related objects with a certain value in Django

This are simplified models to demonstrate my problem:
class User(models.Model):
username = models.CharField(max_length=30)
total_readers = models.IntegerField(default=0)
class Book(models.Model):
author = models.ForeignKey(User)
title = models.CharField(max_length=100)
class Reader(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
So, we have Users, Books and Readers (Users, who have read a Book). Thus, Reader is basically a many-to-many relationship between Book and User.
Now let's say, the current user reads a book. Now, I'd like to update the number of total readers for all books of this book's author:
# get the book (as an example pk=1)
book = Book.objects.get(pk=1)
# save Reader object for this user and this book
Reader(user=request.user, book=book).save()
# count and save the total number of readers for this author in all his books
book.author.total_readers = Reader.objects.filter(book__author=book.author).count()
book.author.save()
By doing so, Django creates a LEFT OUTER JOIN query for PostgreSQL and we get the expected result. However, the database tables are huge and this has become a bottleneck.
In this example, we could simply increase the total_readers by one on each view, instead of actually counting the database rows. However, this is just a simplified model structure and we cannot do this in reality here.
What I can do, is creating another field in the Reader model called book_author_id. Thus, I denormalize data and can count the Reader objects without having PostgreSQL making the LEFT OUTER JOIN with the User table.
Finally, here's my question: Is it possible to create some sort of database index, so that PostgreSQL handles this denormalization automatically? Or do I really have to create this additional model field and redundantly store the author's PK in there?
EDIT - to point out the essential question: I got several great answers, which work for a lot of scenarios. However, they don't solve this actual problem. The only thing I'd like to know, is if it's possible to have PostgreSQL handle such a denormalization automatically - e.g. by creating some sort of database index.

Sometimes, this query can serve better:
book.author.total_readers = Reader.objects.filter(book__in=Book.objects.filter(author=book.author)).count()
That will generate query with sub-query, sometimes it will have better performance that query with join. You even go further and end up creating 2 queries separately:
book.author.total_readers = Reader.objects.filter(book_id__in=Book.objects.filter(author=book.author).values_list('id', flat=True)).count()
That will generate 2 queries, one will retrieve list of all book IDs for that author and second will retrieve count of reads for books with ID in that list.

Good solution also may be to create some batch task that will run for example once per hour and count up all reads, but that way you will end up with not live refreshing count of reads.
You can also create celery task that will run just after read is created to generate new value for author. That way you won't have long response time and delay from creating read to counting it up won't be so long.

It's always way better to solve bottlenecks of this sort with good design and maybe a little bit of caching rather than duplicating data in the way you suggest. The total_readers field is data you should generate instead of recording.
class User(models.Model):
username = models.CharField(max_length=30)
#property
def total_readers(self):
cached_value = caching_client.get("readers_"+self.username, None)
if cached_value is None:
cached_value = self.readers()
caching_client.set("readers_"+self.username,
cached_value)
return cached_value
def readers(self):
return Reader.objects.filter(book__author__user=self).count()
There are libraries that do the caching via decorators but I felt it was a pattern you would benefit from seeing expressly. You can also attach a TTL to the cache so that you insure that the value can't be wrong for longer than TTL. You can also regenerate the cache upon creation of a Reader object.
You might actually get some mileage with declaring an m2m and defining through relationships but I have no experience of it.

Generating a single queryset with filtered summary data across a foreign key?

I have a small Django project to learn with (it's a web UI for the RANCID backup software) and I've run into a problem.
The model for the app defines Devices, and DeviceGroups. Each Device is a member of a group and has a couple of state flags - Enabled, Successful - to indicate if they are operating correctly. Here's the relevant bits.
class DeviceGroup(models.Model):
group_name = models.CharField(max_length=60,unique=True)
class Device(models.Model):
hostname = models.CharField(max_length=60,unique=True)
enabled = models.BooleanField(default=True)
device_group = models.ForeignKey(DeviceGroup)
last_was_success = models.BooleanField(default=False,editable=False)
I have a summary table on the front 'dashboard' page, that shows a list of all the groups, and for each group, how many devices are in it. I'd like to also show the number of Active devices, and the number of failing (i.e. Not last_was_success) devices per-group. The plain device count is already available through the ForeignKey field.
This seems like the kind of thing that annotate is for, but not quite. And actually, I'm not sure how I'd do it with raw SQL either. Most likely as three queries and some lookup afterwards, or subqueries.
So - is it possible 'nicely' in Django? Or alternatively, how do you do the joining up again in the Template or View? The object passed into the template is simply:
device_groups = DeviceGroup.objects.order_by('group_name')
currently, and I don't think I can just add extra fields onto the queryset results "manually", can I? i.e. it's not a dict or similar.

i think you must use
device_groups = DeviceGroup.objects.all().order_by('group_name')
or
device_groups = DeviceGroup.objects.filter(condition).order_by('group_name')

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js