Loading data from associated model in same query in rails - ruby-on-rails-4

Basically, I have a Listing model, where each listing has a country id. I need the country name in my search results view. I know I can do #listing.country.name, but this performs an extra query for each listing in my search results. I'm using Thinking Sphinx, and in my controller I have
#listings = Listing.search(#ts_params).page(page_num).per(limit)
I have tried adding .includes(:countries) and variations thereof but no luck.
What's the best way to go about this? I want the country data to be fetched in the same query as the listings.
I have exactly the same issue with listing images - it is performing an extra query for every listing to find the image, when surely it can be done in one with joins.

Are you trying to eager load the associated model (to avoid an N + 1 query problem), or are you trying to load the associated model into fields on the parent model?
If it's the former, you're probably better off forgetting about :select and instead of :joins using:
ts_params[:sql][:include] = :countries, :listing_images
Now you should be able to call listing.countries and listing.listing_images to access child models, as normal.

Thinking Sphinx provides functionality to eager load associated entities, so for eager loading we don't need to add [:sql]. Following is the way to do this.
For eager loading associated entities using sphinx.
ts_params[:include] = [:country, :listing_image]

I managed to solve this using the :sql hash provided by Thinking Sphinx. I now have the following:
#ts_params[:sql][:joins] = "INNER JOIN countries ON countries.id = listings.country_id INNER JOIN listing_images ON listing_images.listing_id = listings.id"
#ts_params[:sql][:select] = "listings.*, countries.name as country_name, listing_images.image as image_name"
This is correctly retrieving the country name and image name, but I still have a bit of work to do in making it work with the images - I think that will deserve its own question!

Current v4 syntax is:
Article.search :sql => {:include => :user}
Ref: https://freelancing-gods.com/thinking-sphinx/v4/searching.html

Related

Best practice for using Django templates and connect it to database

I am trying to build a database for my website. There are currently three entries with different attributes in my database. I have not created these entries in order, but I have assigned a 'Chapter number' attribute which indicates the order 1,2,3.
I am now trying to inject this using 'context' and 'render' function in my views. I am using the method 'objects.all()' to add all objects to my context. I have a simple Html file where I am inserting the data from the database by looping over (a simple for loop) these added objects.
Now the output that is being generated (naturally) is that it is following the order in which I created the database. I am not sure how I can have the loop run in such a way that I get these chapters in correct order. Thank you for patiently reading my question. Any help will be appreciated.
You may use the order_by method which is included in Djangos QuerySet API:
https://docs.djangoproject.com/en/3.0/ref/models/querysets/
If you offer some more information of your specific data I might provide you with an example.
For orientation purposes, sorting queried objects by date would work as follows:
most_recent = Entry.objects.order_by('-timestamp')
You can sort by any field like so:
sorted_by_field = Entry.objects.order_by('custom_field')

Best index for a Django model when filtering on one field and ordering on another field

I use Django 2.2 linked to PostgreSQL and would like to optimise my database queries.
Given the following simplified model:
class Person(model.Models):
name = models.CharField()
age = models.Integerfield()
on which I have to do the following query, say,
Person.objects.filter(age__gt=20, age__lt=30).order_by('name')
What would be the best way to define the index in the model Meta field so as to optimise the query?
Which of these four options would be best?
class Meta
indexes = [models.Index(fields=['age','name']),
models.Index(fields=['name','age']),
models.Index(fields=['name']),
models.Index(fields=['age'])]
Is it, for example, possible to prevent sorting when the query is done? Thank you.
This is really a postgres question, as much as a Django question, right?
I think there is a good chance that creating an index on your sort field will help with performance. But there are a lot of caveats and if it's really important to you, you might want to do some testing focused on Postgres (ie, just run some queries in psql and see what happens). Some caveats include:
it might depend on which type of index is created for you by Django
Postgres, of course, does not always use index anyway when running a query but it should if you've got the right one and the right query (and if there is enough data in the table to justify loading the index)
it might matter how your SELECT is formatted by Django
I suggest you create your model and specify that you want the index. Then use Django Debug Toolbar to find out what SELECT query is really getting run. Then, open a dbshell with manage.py dbshell (aka psql) and run ANALYZE with that same select. Assuming you can interpret the output, you will see for yourself whether your index is coming in to play. Paste the ANALYZE output here, if you like.
According to this Postgres documentation ORDER BY can be assisted by a btree index. The b-tree type of index is what Django will create for you by default.
So, why don't you try this:
class Meta:
indexes = [models.Index(fields=['age', 'name'])]
Then go run an EXPLAIN ANALYZE in dbshell and see whether it worked.
# You should apply indexing on age, because you are searching for 'age' column data
indexes = [
models.Index(fields=['age'])
]

How to retrieve values from Django ForeignKey -> ManyToMany fields?

I have a model (Realtor) with a ForeignKey field (BillingTier), which has a ManyToManyField (BillingPlan). For each logged in realtor, I want to check if they have a billing plan that offers automatic feedback on their listings. Here's what the models look like, briefly:
class Realtor(models.Model):
user = models.OneToOneField(User)
billing_tier = models.ForeignKey(BillingTier, blank=True, null=True, default=None)
class BillingTier(models.Model):
plans = models.ManyToManyField(BillingPlan)
class BillingPlan(models.Model):
automatic_feedback = models.BooleanField(default=False)
I have a permissions helper that checks the user permissions on each page load, and denies access to certain pages. I want to deny the feedback page if they don't have the automatic feedback feature in their billing plan. However, I'm not really sure the best way to get this information. Here's what I've researched and found so far, but it seems inefficient to be querying on each page load:
def isPermitted(user, url):
premium = [t[0] for t in user.realtor.billing_tier.plans.values_list('automatic_feedback') if t[0]]
I saw some solutions which involved using filter (ManyToMany field values from queryset), but I'm equally unsure of using the query for each page load. I would have to get the billing tier id from the realtor: bt_id = user.realtor.billing_tier.id and then query the model like so:
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct()
I think the second option reads nicer, but I think the first would perform better because I wouldn't have to import and query the BillingTier model.
Is there a better option, or are these two the best I can hope for? Also, which would be more efficient for every page load?
As per the OP's invitation, here's an answer.
The core question is how to define an efficient permission check based on a highly relational data model.
The first variant involves building a Python list from evaluating a Django query set. The suspicion must certainly be that it imposes unnecessary computations on the Python interpreter. Although it's not clear whether that's tolerable if at the same time it allows for a less complex database query (a tradeoff which is hard to assess), the underlying DB query is not exactly simple.
The second approach involves fetching additional 1:1 data through relational lookups and then checking if there is any record fulfilling access criteria in a different, 1:n relation.
Let's have a look at them.
bt_id = user.realtor.billing_tier.id: This is required to get the hook for the following 1:n query. It is indeed highly inefficient in itself. It can be optimized in two ways.
As per Django: Access Foreign Keys Directly, it can be written as bt_id = user.realtor.billing_tier_id because the id is of course present in billing_tier and needs not be found via a relational operation.
Assuming that the page in itself would only load a user object, Django can be told to fetch and cache relational data along with that via select_related. So if the page does not only fetch the user object but the required billing_tier_id as well, we have saved one additional DB hit.
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct() can be optimized using Django's exists because that will redurce efforts both in the database and regarding data traffic between the database and Python.
Maybe even Django's prefetch_related can be used to combine the 1:1 and 1:n queries into a single query, but it's much more difficult to judge whether that pays. Could be worth a try.
In any case, it's worth installing a gem called Django Debug Toolbar which will allow you to analyze how much time your implementation spends on database queries.

Filter on a list of tags

I'm trying to select all the songs in my Django database whose tag is any of those in a given list. There is a Song model, a Tag model, and a SongTag model (for the many to many relationship).
This is my attempt:
taglist = ["cool", "great"]
tags = Tag.objects.filter(name__in=taglist).values_list('id', flat=True)
song_tags = SongTag.objects.filter(tag__in=list(tags))
At this point I'm getting an error:
DatabaseError: MultiQuery does not support keys_only.
What am I getting wrong? If you can suggest a completely different approach to the problem, it would be more than welcome too!
EDIT: I should have mentioned I'm using Django on Google AppEngine with django-nonrel
You shouldn't use m2m relationship with AppEngine. NoSQL databases (and BigTable is one of them) generally don't support JOINs, and programmer is supposed to denormalize the data structure. This is a deliberate design desicion: while your database will contain redundant data, your read queries will be much simpler (no need to combine data from 3 tables), which in turn makes the design of DB server much simpler as well (of course this is made for the sake of optimization and scaling)
In your case you should probably get rid of Tag and SongTag models, and just store the tag in the Song model as a string. I of course assume that Tag model only contains id and name, if Tag in fact contains more data, you should still have Tag model. Song model in that case should contain both tag_id and tag_name. The idea, as I explained above, is to introduce redundancy for the sake of simpler queries
Please, please let the ORM build the query for you:
song_tags = SongTag.objects.filter(tag__name__in = taglist)
You should try to use only one query, so that Django also generates only one query using a join.
Something like this should work:
Song.objects.filter(tags__name__in=taglist)
You may need to change some names from this example (most likely the tags in tags__name__in), see https://docs.djangoproject.com/en/1.3/ref/models/relations/.

Django create/alter tables on demand

I've been looking for a way to define database tables and alter them via a Django API.
For example, I'd like to be write some code which directly manipulates table DDL and allow me to define tables or add columns to a table on demand programmatically (without running a syncdb). I realize that django-south and django-evolution may come to mind, but I don't really think of these tools as tools meant to be integrated into an application and used by and end user... rather these tools are utilities used for upgrading your database tables. I'm looking for something where I can do something like:
class MyModel(models.Model): # wouldn't run syncdb.. instead do something like below
a = models.CharField()
b = models.CharField()
model = MyModel()
model.create() # this runs the create table (instead of a syncdb)
model.add_column(c = models.CharField()) # this would set a column to be added
model.alter() # and this would apply the alter statement
model.del_column('a') # this would set column 'a' for removal
model.alter() # and this would apply the removal
This is just a toy example of how such an API would work, but the point is that I'd be very interested in finding out if there is a way to programatically create and change tables like this. This might be useful for things such as content management systems, where one might want to dynamically create a new table. Another example would be a site that stores datasets of an arbitrary width, for which tables need to be generated dynamically by the interface or data imports. Dose anyone know any good ways to dynamically create and alter tables like this?
(Granted, I know one can do direct SQL statements against the database, but that solution lacks the ability to treat the databases as objects)
Just curious as to if people have any suggestions or approaches to this...
You can try and interface with the django's code that manages changes in the database. It is a bit limited (no ALTER, for example, as far as I can see), but you may be able to extend it. Here's a snippet from django.core.management.commands.syncdb.
for app in models.get_apps():
app_name = app.__name__.split('.')[-2]
model_list = models.get_models(app)
for model in model_list:
# Create the model's database table, if it doesn't already exist.
if verbosity >= 2:
print "Processing %s.%s model" % (app_name, model._meta.object_name)
if connection.introspection.table_name_converter(model._meta.db_table) in tables:
continue
sql, references = connection.creation.sql_create_model(model, self.style, seen_models)
seen_models.add(model)
created_models.add(model)
for refto, refs in references.items():
pending_references.setdefault(refto, []).extend(refs)
if refto in seen_models:
sql.extend(connection.creation.sql_for_pending_references(refto, self.style, pending_references))
sql.extend(connection.creation.sql_for_pending_references(model, self.style, pending_references))
if verbosity >= 1 and sql:
print "Creating table %s" % model._meta.db_table
for statement in sql:
cursor.execute(statement)
tables.append(connection.introspection.table_name_converter(model._meta.db_table))
Take a look at connection.creation.sql_create_model. The creation object is created in the database backend relevant to the database you are using in your settings.py. All of them are under django.db.backends.
If you must have ALTER table, I think you can create your own custom backend that extends an existing one and adds this functionality. Then you can interface with it directly through a ExtendedModelManager you create.
Quickly off the top of my head..
Create a Custom Manager with the Create/Alter methods.