Arel: active relation from Arel::SelectManager with join - ruby-on-rails-4

Let us we have a Rails 4.2.x app and we have two tables posts and authors, and we want to use Arel to get the posts authored by an author with name == 'Karl'.
(In this case we could be happy with Active Record joins but this is just to keep the example simple.)
posts = Arel::Table.new :posts
authors = Arel::Table.new :authors
my_query = posts.project(Arel.star)
.join(authors)
.on(posts[:author_id].eq(authors[:id]))
.where(authors[:name].eq('Karl'))
> my_query.class
=> Arel::SelectManager
Now we could get back an array (of class Array) of posts by doing:
> Post.find_by_sql my_query
[master] Post Load (3.1ms) SELECT * FROM "posts" INNER JOIN "authors"
ON "posts"."author_id" = "authors"."id"
WHERE "authors"."name" = 'Karl'
=> [#<Post:0x005612815ebdf8
id: 7474,
...
]
So we do get an array of posts, not an active record relation:
> Post.find_by_sql(my_query).class
=> Array
Also injecting the manager into Post.where won't work
> Post.where my_query
=> #<Post::ActiveRecord_Relation:0x2b13cdc957bc>
> Post.where(my_query).first
ActiveRecord::StatementInvalid: PG::SyntaxError:
ERROR: subquery must return only one column
SELECT "posts".* FROM "posts"
WHERE ((SELECT * FROM "posts" INNER JOIN "authors" ON "posts"."author_id" = "authors"."id" WHERE "authors"."name" = 'Karel'))
ORDER BY "posts"."id" ASC LIMIT 1
I am thinking I must be missing something. In short: how do you get an active record relation from a select manager like my_query above (or another select manager accomplishing the same thing).

You can't get ActiveRecord::Relation from Arel::SelectManager neither from sql string. You have two ways to load data through ActiveRecord:
Do all query logic in Arel. In this case you can't use any of ActiveRecord::Relation methods. But you have same functionality in Arel. In your example you may set limit through Arel:
my_query.take(10)
Other way is to use Arel in ActiveRecord::Relation methods. You may rewrite your query like this:
posts = Arel::Table.new :posts
authors = Arel::Table.new :authors
join = posts.join(authors).
on(posts[:author_id].eq(authors[:id])).
join_sources
my_query = Post.
joins(join).
where(authors[:name].eq('Karl'))
> my_query.class
=> ActiveRecord::Relation
In this case you may use my_query as ActiveRecord::Relation

Related

Convert a raw self join sql to Django orm code (no internal foreign key)

I have Article and Tag models, having a many to many relation through another model of ArticleTag. I want to find that for a given tag "Health", how many times it's been simultaneously with other tags on articles. (e.g. on 4 articles with tag "Covid", on 0 articles with tag "Infection" , etc)
I can perform a self join on ArticleTag with some Where conditions and Group By clause and get the desired result:
SELECT tag.title, COUNT(*) as co_occurrences
FROM app_articletag as t0
INNER JOIN app_articletag t1 on (t0.article_id = t1.article_id)
INNER JOIN app_tag tag on (tag.id = t1.tag_id)
WHERE t0.tag_id = 43 and t1.tag_id != 43
GROUP BY t1.tag_id, tag.title
However I want to stay away from raw queries as much as possible and work with Django QuerySet APIs.
I've seen other threads about self join, but their model all have a foreign key to itself.
Here are my Django models:
class Tag(Model):
...
class Article(Model):
tags = models.ManyToManyField(Tag, through='ArticleTag', through_fields=('article', 'tag'))
class ArticleTag(Model):
tag = models.ForeignKey(Tag, on_delete=models.CASCADE))
article = models.ForeignKey(Article, on_delete=models.CASCADE))
One approach is something like this, given t is the Health tag:
ArticleTag.objects.values(
"tag__name"
).annotate(
articles_with_health=Count(
"pk", filter=Q(article__articletag__tag=t)
)
).exclude(tag=t)
This should return a result like:
[
{'tag__name': 'Infection', 'articles_with_health': 0},
{'tag__name': 'Covid', 'articles_with_health': 4}
]

Elegant way of fetching multiple objects in custom order

What's an elegant way for fetching multiple objects in some custom order from a DB in django?
For example, suppose you have a few products, each with its name, and you want to fetch three of them to display in a row on your website page, in some fixed custom order. Suppose the names of the products which you want to display are, in order: ["Milk", "Chocolate", "Juice"]
One could do
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
products = [
unordered_products.filter(name="Milk")[0],
unordered_products.filter(name="Chocolate")[0],
unordered_products.filter(name="Juice")[0],
]
And the post-fetch ordering part could be improved to use a name-indexed dictionary instead:
ordered_product_names = ["Milk", "Chocolate", "Juice"]
products_by_name = dict((x.name, x) for x in unordered_products)
products = [products_by_name[name] for name in ordered_product_names]
But is there a more elegant way? e.g., convey the desired order to the DB layer somehow, or return the products grouped by their name (aggregation seems to be similar to what I want, but I want the actual objects, not statistics about them).
You can order your product by a custom order with only one query of your ORM (executing one SQL query only):
ordered_products = Product.objects.filter(
name__in=['Milk', 'Chocolate', 'Juice']
).annotate(
order=Case(
When(name='Milk', then=Value(0)),
When(name='Chocolate', then=Value(1)),
When(name='Juice', then=Value(2)),
output_field=IntegerField(),
)
).order_by('order')
Update
Note
Speaking about "elegant way" (and best practice) I think extra method (proposed by #Satendra) is absolutely to avoid.
Official Django documentation report this about extra :
Warning
You should be very careful whenever you use extra(). Every time you
use it, you should escape any parameters that the user can control by
using params in order to protect against SQL injection attacks .
Please read more about SQL injection protection.
Optimized version
If you want to handle more items whit only one query you can change my first query and use the Django ORM flexibility as suggested by #Shubhanshu in his answer:
products = ['Milk', 'Chocolate', 'Juice']
ordered_products = Product.objects.filter(
name__in=products
).order_by(Case(
*[When(name=n, then=i) for i, n in enumerate(products)],
output_field=IntegerField(),
))
The output of this command will be similar to this:
<QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>]>
And the SQL generated by the ORM will be like this:
SELECT "id", "name"
FROM "products"
WHERE "name" IN ('Milk', 'Chocolate', 'Juice')
ORDER BY CASE
WHEN "name" = 'Milk' THEN 0
WHEN "name" = 'Chocolate' THEN 1
WHEN "name" = 'Juice' THEN 2
ELSE NULL
END ASC
When there is no relation between the objects that you are fetching and you still wish to fetch (or arrange) them in certain (custom) order, you may try doing this:
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
product_order = ["Milk", "Chocolate", "Juice"]
preserved = Case(*[When(name=name, then=pos) for pos, name in enumerate(product_order)])
ordered_products = unordered_products.order_by(preserved)
Hope it helps!
Try this into meta class from model:
class Meta:
ordering = ('name', 'related__name', )
this get your records ordered by your specified field's
then: chocolate, chocolate blue, chocolate white, juice green, juice XXX, milk, milky, milk YYYY should keep that order when you fetch
Creating a QuerySet from a list while preserving order
This means the order of output QuerySet will be same as the order of list used to filter it.
The solution is more or less same as #PaoloMelchiorre answer
But if there are more items lets say 1000 products in
product_names then you don't have to worry about adding more conditions in Case, you can use extra method of QuerySet
product_names = ["Milk", "Chocolate", "Juice", ...]
clauses = ' '.join(['WHEN name=%s THEN %s' % (name, i) for i, name in enumerate(product_names)])
ordering = 'CASE %s END' % clauses
queryset = Product.objects.filter(name__in=product_names).extra(
select={'ordering': ordering}, order_by=('ordering',))
# Output: <QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>,...]>

Django - joining multiple tables (models) and filtering out based on their attribute

I'm new to django and ORM in general, and so have trouble coming up with query which would join multiple tables.
I have 4 Models that need joining - Category, SubCategory, Product and Packaging, example values would be:
Category: 'male'
SubCategory: 'shoes'
Product: 'nikeXYZ'
Packaging: 'size_36: 1'
Each of the Model have FK to the model above (ie. SubCategory has field category etc).
My question is - how can I filter Product given a Category (e.g. male) and only show products which have Packaging attribute available set to True? Obviously I want to minimise the hits on my database (ideally do it with 1 SQL query).
I could do something along these lines:
available = Product.objects.filter(packaging__available=True)
subcategories = SubCategory.objects.filter(category_id=<id_of_male>)
products = available.filter(subcategory_id__in=subcategories)
but then that requires 2 hits on database at least (available, subcategories) I think. Is there a way to do it in one go?
try this:
lookup = {'packaging_available': True, 'subcategory__category_id__in': ['ids of males']}
product_objs = Product.objects.filter(**lookup)
Try to read:
this
You can query with _set, multi __ (to link models by FK) or create list ids
I think this should work but it's not tested:
Product.objects.filter(packaging__available=True,subcategori‌​es__category_id__in=‌​[id_of_male])
it isn't tested but I think that subcategories should be plural (related_name), if you didn't set related_name, then subcategory__set instead od subcategories should work.
Probably subcategori‌​es__category_id__in=‌​[id_of_male] can be switched to .._id=id_of_male.

will_paginate "breaks" query result

I have a query which shows 3566 results what is ok.
When I use paginate on it, result is 18 but in console I see that query which it runs is ok
this is my controller
def listcontractors
#listcons = Contract.paginate(:page => params[:page], :per_page => 50).joins(:contractor)
.select("contractors.id,name,ico,city,country,count(resultinfo_id)")
.group("contractors.id,name,ico,city,country")
.order("name")
end
this is query I see in console, when I put it in psql result is ok
(22.2ms) SELECT COUNT(*) AS count_all,
contractors.id,name,ico,city,country AS
contractors_id_name_ico_city_country FROM "contractors" INNER JOIN
"contracts" ON "contracts"."contractor_id" = "contractors"."id" GROUP
BY contractors.id,name,ico,city,country Contractor Load (30.8ms)
SELECT contractors.id,name,ico,city,country,count(resultinfo_id) as
count FROM "contractors" INNER JOIN "contracts" ON
"contracts"."contractor_id" = "contractors"."id" GROUP BY
contractors.id,name,ico,city,country ORDER BY name LIMIT 50 OFFSET
1050
when I remove .paginate part from the query, result is ok
my models are
class Contract < ActiveRecord::Base
belongs_to :resultinfo
belongs_to :contractor
end
class Contractor < ActiveRecord::Base
has_many :contracts
end
I tried to switch query to Contractor.joins(:contract) but issue was same, with paginate result is much lower than it should be
any idea why this happens?
thanks
thanks to gmcnaughton I created this solution
ids = Contractor.order("name").pluck(:id)
#listcons = ids.paginate(:page => params[:page], :per_page => 50)
#groupedcons = Contractor.joins(:contracts)
.where(id: #listcons)
.select("contractors.id,name,ico,city,country,count(resultinfo_id)")
.group("contractors.id,name,ico,city,country")
.order("name")
and I had to add to initializers require 'will_paginate/array' because otherwise it shows undefined total_pages method for an array
Mixing paginate and group is tricky. paginate sets an OFFSET and LIMIT on the query, which get applied to the result of the GROUP BY -- rather than limiting what records will get grouped.
If you want to paginate through the all Contracts, then group each page of 50 results (one page at a time), try this:
def listcontractors
# get one page of contract ids
ids = Contract.paginate(:page => params[:page], :per_page => 50).pluck(:id)
# group just the items in that page
#listcons = Contract.where(id: ids)
.select("contractors.id,name,ico,city,country,count(resultinfo_id)")
.group("contractors.id,name,ico,city,country")
.order("name")
end
Hope that helps!

Fast way to sort a model by count of child's child

I currently have the following models: MinorCategory > Product > Review
On a view, I show the 12 MinorCategories that have the most reviews. This view is very slow to respond, and I think it is a problem with how I do the query.
Here is my current code:
class MinorCategory < ActiveRecord::Base
has_many :products
has_many :reviews, through: :products
...
def count_reviews
self.reviews.count
end
...
end
class Review < ActiveRecord::Base
belongs_to :product, touch: true
...
end
class HomeController < ApplicationController
#categories = MinorCategory.all.sort_by(&:count_reviews).reverse.take(12)
end
So that is basically it. In the view itself I go through each #categories and display a few things, but the query in the controller is what seems to be slow. From SkyLight:
SELECT COUNT(*) FROM "reviews" INNER JOIN "products" ON "reviews"."product_id" = "products"."id" WHERE "products"."minor_category_id" = ? ... avg 472ms
I am not good with sql or active record, and still pretty new to Ruby on Rails. I've spent a couple hours trying other methods, but I can not get them to work so I thought I would check here.
Thank you in advance to anybody that has a moment.
You need some basic SQL knowledge to better understand how database queries work, and how to take advantage of a DBMS. Using ActiveRecord is not an excuse to not learn some SQL.
That said, your query is very inefficient because you don't use the power of the database at all. It's a waste of resources both on the Ruby environment and on the database environment.
The only database query is
MinorCategory.all
which extracts all the records. This is insanely expensive, especially if you have a large number of categories.
Moreover, self.reviews.count is largely inefficient because it is affected by the N+1 query issue.
Last but not least, the sorting and limiting is made in the Ruby environment, whereas you should really do it in the database.
You can easily obtain a more efficient query by taking advantage of the database computation capabilities. You will need to join the two tables together. The query should look like:
SELECT
minor_categories.*, COUNT(reviews.id) AS reviews_count
FROM
"minor_categories" INNER JOIN "reviews" ON "reviews"."minor_category_id" = "minor_categories"."id"
GROUP BY
minor_categories.id
ORDER BY
reviews_count DESC
LIMIT 10
which in ActiveRecord translates as
categories = MinorCategory.select('minor_categories.*, COUNT(reviews.id) AS reviews_count').joins(:reviews).order('reviews_count DESC').group('minor_categories.id').limit(10)
You can access a single category count by using reviews_count
# take a category
category = categories[0]
category.reviews_count
Another approach that doesn't require a JOIN would be to cache the counter in the category table.