Sorting by distance with a related ManyToMany field - django

I have this two models.
class Store(models.Model):
coords = models.PointField(null=True,blank=True)
objects = models.GeoManager()
class Product(models.Model):
stores = models.ManyToManyField(Store, null=True, blank=True)
objects = models.GeoManager()
I want to get the products sorted by the distance to a point. If the stores field in Product was a Foreign Key I would do this and it works.
pnt = GEOSGeometry('POINT(5 23)')
Product.objects.distance(pnt, field_name='stores__coords').order_by('distance')
But since the field is a ManyToMany field it breaks with
ValueError: <django.contrib.gis.db.models.fields.PointField: coords> is not in list
I kind of expected this because it's not clear which of the stores it should use to calculate the distance, but is there any way to do this.
I need the list of products ordered by distance to a specific point.

Just an idea, maybe this would work for you, this should take only two database queries (due to how prefetch works). Don't judge harshly if it doesn't work, I haven't tried it:
class Store(models.Model):
coords = models.PointField(null=True,blank=True)
objects = models.GeoManager()
class Product(models.Model):
stores = models.ManyToManyField(Store, null=True, blank=True, through='ProductStore')
objects = models.GeoManager()
class ProductStore(models.Model):
product = models.ForeignKey(Product)
store = models.ForeignKey(Store)
objects = models.GeoManager()
then:
pnt = GEOSGeometry('POINT(5 23)')
ps = ProductStore.objects.distance(pnt, field_name='store__coords').order_by('distance').prefetch_related('product')
for p in ps:
p.product ... # do whatever you need with it

This is how I solved it but I dont really like this solution. I think is very inefficient. There should be a better way with GeoDjango. So, until i find a better solution I probably wont be using this. Here's what I did.
I added a new method to the product model
class Product(models.Model):
stores = models.ManyToManyField(Store, null=True, blank=True)
objects = models.GeoManager()
def get_closes_store_distance(point):
sorted_stores = self.stores.distance(point).order_by('distance')
if sorted_stores.count() > 0:
store = sorted_stores[0]
return store.distance.m
return 99999999 # If no store, return very high distance
Then I can sort this way
def sort_products(self, obj_list, lat, lng):
pt = 'POINT(%s %s)' % (lng, lat)
srtd = sorted(obj_list, key=lambda obj: obj.get_closest_store_distance(pt))
return srtd
Any better solutions or ways to improve this one are very welcome.

I will take "distance from a product to a point" to be the minimum distance from the point to a store with that product. I will take the output to be a list of (product, distance) for all products sorted by distance ascending. (A comment by someone who placed a bounty indicated they sometimes also want (product,distance,store) sorted by distance then store within product.)
Every model has a corresponding table. The fields of the model are the columns of the table. Every model/table should have a fill-in-the-(named-)blanks statement where its records/rows are the ones that make a true statement.
Store(coords,...) // store [store] is at [coords] and ...
Product(product,store,...) // product [product] is stocked by store [store] and ...
Since Product has store(s) as manyToManyField it already is a "ProductStore" table of products and stocking stores and Store already is a "StoreCoord" table of stores and their coordinates.
You can mention any object's fields in a query filter() for a model with a manyToManyField.
The SQL for this is simple:
select p.product,distance
select p.product,distance(s.coord,[pnt]) as distance
from Store s join Product p
on s.store=p.store
group by product
having distance=min(distance)
order by distance
It should be straightforward to map this to a query. However, I am not familiar enough with Django to give you exact code now.
from django.db.models import F
q = Product.objects.all()
.filter(store__product=F('product'))
...
.annotate(distance=Min('coord.distance([pnt])'))
...
.order_by('distance')
The Min() is an example of aggregation.
You may also be helped by explicitly making a subquery.
It is also possible to query this by the raw interface. However, the names above are not right for a Django raw query. Eg the table names will by default be APPL_store and APPL_product where APPL is your application name. Also, distance is not your pointField operator. You must give the right distance function. But you should not need to query at the raw level.

Related

Bin a queryset using Django?

Let's say we have the following simplistic models:
class Category(models.Model):
name = models.CharField(max_length=264)
def __str__(self):
return self.name
class Meta:
verbose_name_plural = "categories"
class Status(models.Model):
name = models.CharField(max_length=264)
def __str__(self):
return self.name
class Meta:
verbose_name_plural = "status"
class Product(models.Model):
title = models.CharField(max_length=264)
description = models.CharField(max_length=264)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
price = models.DecimalField(max_digits=10)
status = models.ForeignKey(Status, on_delete=models.CASCADE)
My aim is to get some statistics, like total products, total sales, average sales etc, based on which price bin each product belongs to.
So, the price bins could be something like 0-100, 100-500, 500-1000, etc.
I know how to use pandas to do something like that:
Binning column with python pandas
I am searching for a way to do this with the Django ORM.
One of my thoughts is to convert the queryset into a list and apply a function to get the apropriate price bin and then do the statistics.
Another thought which I am not sure how to impliment, is the same as the one above but just apply the bin function to the field in the queryset I am interested in.
There are three pathways I can see.
First is composing the SQL you want to use directly and putting it to your database with a modification of your models manager class. .objects.raw("[sql goes here]"). This answer shows how to define group with a simple function on the content - something like that could work?
SELECT FLOOR(grade/5.00)*5 As Grade,
COUNT(*) AS [Grade Count]
FROM TableName
GROUP BY FLOOR(Grade/5.00)*5
ORDER BY 1
Second is that there is no reason you can't move the queryset (with .values() or .values_list()) into a pandas dataframe or similar and then bin it, as you mentioned. There is probably a bit of an efficiency loss in terms of getting the queryset into a dataframe and then processing it, but I am not sure that it would certainly or always be bad. If its easier to compose and maintain, that might be fine.
The third way I would try (which I think is what you really want) is chaining .annotate() to label points with the bin they belong in, and the aggregate count function to count how many are in each bin. This is more advanced ORM work than I've done, but I think you'd start looking at something like the docs section on conditional aggregation. I've adapted this slightly to create the 'price_class' column first, with annotate.
Product.objects.annotate(price_class=floor(F('price')/100).aggregate(
class_zero=Count('pk', filter=Q(price_class=0)),
class_one=Count('pk', filter=Q(price_class=1)),
class_two=Count('pk', filter=Q(price_class=2)), # etc etc
)
I'm not sure if that 'floor' is going to work, and you may need 'expression wrapper' to ensure the push price_class into the write type of output_field. All the best.

Custom SQL for Geodjango on ForignKey

I have a following model:
class UserProfile(models.Model):
user = models.OneToOneField(User)
location = models.PointField(blank=True, null=True, srid=CONSTANTS.SRID)
objects = models.GeoManager()
class Item(models.Model):
owner = models.ForeignKey(UserProfile)
objects = models.GeoManager()
Now I need to sort the Items by distance to some point:
p = Point(12.5807203, 50.1250706)
Item.objects.all().distance(p, field='owner__location')
But that throws me an error:
TypeError: ST_Distance output only available on GeometryFields.
From GeoDjango GeoQuerySet.distance() results in 'ST_Distance output only available on GeometryFields' when specifying a reverse relationship in field_name I can see there is already ticket for this.
Now I don't like the solution proposed in that question since that way I would not get the distance and I would lose the distances.
So I was thinking that I could achieve this by making a custom sql query. I know that this:
UserProfile.objects.distance(p)
will produce something like this:
SELECT (ST_distance_sphere("core_userprofile"."location",ST_GeomFromEWKB('\x0101000020e6100000223fd12b5429294076583c5002104940'::bytea))) AS "distance", "core_userprofile"."id", "core_userprofile"."user_id", "core_userprofile"."verified", "core_userprofile"."avatar_custom", "core_userprofile"."city", "core_userprofile"."location", "core_userprofile"."bio" FROM "core_userprofile"
So my question is: is there some easy way how to manually construct such query that would sort items by distance?
Since the geometry you're measuring distance to is on UserProfile, it makes sense to query for UserProfile objects and then handle each Item object they own. (The distance is the same for all items owned by a profile.)
For example:
all_profiles = UserProfile.objects.all()
for profile in all_profiles.distance(p).order_by('distance'):
for item in profile.item_set.all():
process(item, profile.distance)
You may be able to make this more efficient with prefetch_related:
all_profiles = UserProfile.objects.all()
all_profiles = all_profiles.prefetch_related('item_set') # we'll need these
for profile in all_profiles.distance(p).order_by('distance'):
for item in profile.item_set.all(): # items already prefetched
process(item, profile.distance)
If it's important for some reason to query directly for Item objects, try using extra:
items = Item.objects.all()
items = items.select_related('owner')
distance_select = "st_distance_sphere(core_userprofile.location, ST_GeomFromEWKT('%s'))" % p.wkt
items = items.extra({'distance': distance_select})
items = items.order_by('distance')
Raw queries are another option, which let you get model objects from a raw SQL query:
items = Item.objects.raw("SELECT core_item.* FROM core_item JOIN core_userprofile ...")

Embed product-variance logic into Django models

I wonder how I would model my Products model to auto-create (and that the admin-App would also understand it) variants of a Product based on it's variant-parts.
My Products have;
Colors
Sizes
and can probably get more features in the future.
How would I model my Product class to generate all variants of the Product?
Say I would create a new Product in Colors Red Blue Green and in Sizes XS S M L XL.
class Product(models.Model):
name = models.CharField(max_length=200)
class Color(models.Model):
product = models.ForeignKey(Product)
name = models.CharField(max_length=200)
class Size(models.Model):
product = models.ForeignKey(Product)
name = models.CharField(max_length=200)
class FutureVariant(models.Model):
product = models.ForeignKey(Product)
name = models.CharField(max_length=200)
# etc.
Now when I would need a smart method that when I would auto-create all color-size-[FUTURE VARIANT] for that product.
So I would tell Django;
Create new Product
In the colors Red Blue Green
In the sizes XS S M L XL
And the Product class would go and produce Products with all possible combinations in the products_product table.
I'm almost sure that this has design flaws. But I'm just curious how to put this logic in the ORM, and not to write weird procedural code, which would probably go against the DRY principal.
In Database logic I would think of something like this;
PRODUCTS
- id
- name
PRODUCTS_VARIANTS_COLORS
- id
- name
- html_code
PRODUCTS_VARIANTS_SIZES
- id
- name
PRODUCTS_VARIANTS_TABLES
- table_name
- table_id
PRODUCTS_VARIANTS
- product_id
- variant_table
- variant_id
This way I could make endless variant tables, as long as I would register them in my PRODUCTS_VARIANTS_TABLES and store their name as relevant. PRODUCTS_VARIANTS would hold all the the variants of the product, including combinations of them all. I am also aiming to have a selection-phase where the user can chose (in a HTML checkbox-list) which variants it does and doesn't want.
The problem (I think) is that this would not really comply with a logic in the ORM.
I don't know if you are asking about alternatives or just looking to make your way work, but what about splitting a product from it's attributes?
So instead of having separate models for attributes, you just have an Attribute model. This way you are future-proofing your database so you can easily add more attributes (like if you have products with a height and width instead of just color or size).
class AttributeBase(models.Model):
label = models.CharField(max_length=255) # e.g. color, size, shape, etc.
...
class Attribute(models.Model):
base = models.ForeignKey('AttributeBase', related_name='attributes')
value = models.CharField(max_length=255) # e.g. red, L, round, etc.
internal_value = models.CharField(max_length=255, null=True, blank=True) # other values you may need e.g. #ff0000, etc.
...
class ProductAttribute(Attribute):
product = models.ForeignKey('Product', related_name='attributes')
It now becomes very easy to create all attributes for a product...
class Product(models.Model):
...
def add_all_attributes(self):
for attribute in Attribute.objects.all():
self.attributes.add(attribute)
now when you use product.add_all_attributes() that product will contain every attribute. AND you can even make it add attributes of a certain AttributeBase
def add_all_attributes_for_base(self, label):
base = AttributeBase.objects.get(label=label)
for attribute in base.attributes.all():
self.attributes.add(attribute)
You could write something as:
class Product(models.Model):
#classmethod
def create_variants(cls):
# compute all possible combinations
combinations = ...
for combination in combinations:
Product.objects.create(**combination)
Creating all the combinations would indeed happen through registering the possible variants and their possible values.
Note that ORM is there to help you map Django objects to database records, it doesn't help you with producing the database records (read: Django models) that you wish to save.

Django: prefetch_related results ordered by a field of an intermediary table

How can I prefetch_related objects in Django and order them by a field in an intermediary table?
Here's the models I'm working with:
class Node(models.Model):
name = models.CharField(max_length=255)
edges = models.ManyToManyField('self', through='Edge', symmetrical=False)
class Edge(models.Model):
from_node = models.ForeignKey(Node, related_name='from_node')
to_node = models.ForeignKey(Node, related_name='to_node')
weight = models.FloatField(default=0)
Given a node, I'd like to prefetch all of the related nodes, ordered by weight.
When I use this query:
n = Node.objects.prefetch_related('to_node').order_by('edge__weight').get(name='x')
the order_by has no effect.
Edit:
My best answer so far
n = Node.objects.get(name='x')
edges = Edge.objects.filter(from_node=n).prefetch_related('to_node').order_by('weight')
Then instead of iterating n.edges (as I'd prefer), I iterate edges.to_node
Nowadays, you can also use the Prefetch class to achieve this:
https://docs.djangoproject.com/en/1.10/ref/models/querysets/#django.db.models.Prefetch
Or, if you want to do this all the time as a default, you can look into the meta ordering on the intermediary table, something like:
class SomeThroughModel(models.Model):
order = models.IntegerField("Order", default=0, blank=False, null=False)
...
class Meta:
ordering = ['order'] # order is the field holding the order
Just a conceptual idea (written from memory).
The problem is, that the order_by refers to the Node model.
However, there is a way to
Node.objects.get(name='x').edges.extra(select={'weight':'%s.weight' % Edge._meta.db_table}).order_by('weight')
This will force the ORM to:
Add 'weight' field, which would normally be omitted.
Order the results by it.
Number of queries should be the same as if the prefetch_query worked, one to get the Node, second to get the related nodes.
Unfortunately this is not a very 'clean' solution, as we need to use _meta.
Not that clean though..
//Untested Code
Node n = Node.objects.get(name="x")
//This would return To Node IDs' ordered by weight
n.edges.filter(from_node = n).values_list('to_node', flat=True).order_by('weight')

Performing a Django Query on a Model, But Ending Up with a QuerySet for That Model's ManyToManyField

I have a third party Django App (Satchmo) which has a model called Product which I make extensive use of in my Django site.
I want to add the ability to search for products via color. So I have created a new model called ProductColor. This model looks roughly like this...
class ProductColor(models.Model):
products = models.ManyToManyField(Product)
r = models.IntegerField()
g = models.IntegerField()
b = models.IntegerField()
name = models.CharField(max_length=32)
When a store product's data is loaded into the site, the product's color data is used to create a ProductColor object which will point to that Product object.The plan is to allow a user to search for a product by searching a color range.
I can't seem to figure out how to put this query into a QuerySet. I can make this...
# If the color ranges look something like this...
r_range, g_range, b_range = ((3,130),(0,255),(0,255))
# Then my query looks like
colors_in_range = ProductColor.objects.select_related('products')
if r_range:
colors_in_range = colors_in_range.filter(
Q(r__gte=r_range[0])
| Q(r__lte=r_range[1])
)
if g_range:
colors_in_range = colors_in_range.filter(
Q(g__gte=g_range[0])
| Q(g__lte=g_range[1])
)
if b_range:
colors_in_range = colors_in_range.filter(
Q(b__gte=b_range[0])
| Q(b__lte=b_range[1])
)
So I end up with a QuerySet which contains all of the ProductColor objects in that color range. I could then build a list of Products by accessing the products ManyToMany attribute of each ProductColor attribute.
What I really need is a valid QuerySet of Products. This is because there is going to be other logic which is performed on these results and it needs to operate on a QuerySet object.
So my question is how can I build the QuerySet that I really want? And failing that, is there an efficient way to re-build the QuerySet (preferably without hitting the database again)?
If you want to get a Product queryset you have to filter the Product objects and filter via the reverse relation for product color:
products = Product.objects.filter(productcolor_set__r__gte=x).distinct()
You can use the range field lookup:
You can use range anywhere you can use
BETWEEN in SQL -- for dates, numbers
and even characters.
your query:
r_range, g_range, b_range = ((3,130),(0,255),(0,255))
products = Product.objects.filter(productcolor_set__r__range=r_range,
productcolor_set__g__range=g_range,
productcolor_set__b__range=b_range).distinct()