i'm triyng to make a full text search with postgresql and django So I've created a function search_client(text) which returns a list of clients. To call it from the DB i use something like this:
SELECT * FROM search_client('something')
and i'm not really sure how to call it from django. i know i could do something like
cursor = connection.cursor()
cursor.execute("SELECT * FROM search_client('something')")
result = cursor.fetchall()
but that will only return a list of values, and i'd like to have a list of objects, like when i use the "filter()" method.
Any ideas?? thanks for your time!
If your goal is a full-featured search engine, have a look at django-haystack. It rocks.
As for your question, the new (Django 1.2) raw method might work:
qs = MyModel.objects.raw("SELECT * FROM search_client('something')")
If you're using Django 1.2, you can use the raw() ORM method to execute custom SQL but get back Django models. If you're not, you can still execute the SQL via the extra() method on default QuerySet, and pump it into a custom method to either then go pull the real ORM records, or make new, temporary, objects
First, you probably don't want to do this. Do you have proof that your database function is actually faster?
Implement this in Python first. When you can prove that your Python implementation really is the slowest part of your transaction, then you can try a stored procedure.
Second, you have the extra method available in Django.
http://docs.djangoproject.com/en/1.2/ref/models/querysets/#django.db.models.QuerySet.extra
Note that compute-intensive database procedures are often slow.
Related
You can query Django's JSONField, either by direct lookup, or by using annotations. Now I realize if you annotate a field, you can all sorts of complex queries, but for the very basic query, which one is actually the preferred method?
Example: Lets say I have model like so
class Document(models.Model):
data = JSONField()
And then I store an object using the following command:
>>> Document.objects.create(data={'name': 'Foo', 'age': 24})
Now, the query I want is the most basic: Find all documents where data__name is 'Foo'. I can do this 2 ways, one using annotation, and one without, like so:
>>> from django.db.models.expressions import RawSQL
>>> Document.objects.filter(data__name='Foo')
>>> Document.objects.annotate(name = RawSQL("(data->>'name')::text", [])).filter(name='Foo')
So what exactly is the difference? And if I can make basic queries, why do I need to annotate? Provided of course I am not going to make complex queries.
There is no reason whatsoever to use raw SQL for queries where you can use ORM syntax. For someone who is conversant in SQL but less experienced with Django's ORM, RawSQL might provide an easier path to a certain result than the ORM, which has its own learning curve.
There might be more complex queries where the ORM runs into problems or where it might not give you the exact SQL query that you need. It is in these cases that RawSQL comes in handy – although the ORM is getting more feature-complete with every iteration, with
Cast (since 1.10),
Window functions (since 2.0),
a constantly growing array of wrappers for database functions
the ability to define custom wrappers for database functions with Func expressions (since 1.8) etc.
They are interchangable so it's matter of taste. I think Document.objects.filter(data__name='Foo') is better because:
It's easier to read
In the future, MariaDB or MySql can support JSON fields and your code will be able to run on both PostgreSQL and MariaDB.
Don't use RawSQL as a general rule. You can create security holes in your app.
In my project I want to get people who have birthday between some days, I hope to find a solution which does not force any limitations to queries.
I have found this solution which seems efficient and suite for my problem. But now I have a second problem to create the function in database using django ORM, because this must be portable and works with test database also. I could not find any proper way to able to define the function and the index based on it in django.
In brief I want to create below function in database using django:
CREATE OR REPLACE FUNCTION indexable_month_day(date) RETURNS TEXT as $BODY$
SELECT to_char($1, 'MM-DD');
$BODY$ language 'sql' IMMUTABLE STRICT;
CREATE INDEX person_birthday_idx ON people (indexable_month_day(dob));
To answer your question, using RunSQL you can insert raw SQL into a migration
-- it looks like you should be able to put this raw SQL into a migration file, including the function that would create the custom index. So running the migration would create the custom in
But don't do this -- you should just use Django to index the dob field, i.e.
dob = models.DateField(db_index=True)
and use Django to write your queries as well.
I have the following function to determine who downloaded a certain book:
#cached_property
def get_downloader_info(self):
return self.downloaders.select_related('user').values(
'user__username', 'user__full_name')
Since I'm only using two fields, does it make sense to use .defer() on the remaining fields?
I tried to use .only(), but I get an error that some fields are not JSON serializable.
I'm open to all suggestions, if any, for optimizing this queryset.
Thank you!
Before you try every possible optimization, you should get your hands on the SQL query generated by the ORM (you can print it to stdout or use something like django debug toolbar) and see what is slow about it. After that I suggest you run that query with EXPLAIN ANALYZE and find out what is slow about that query. If the query is slow because lot of data has to be transfer than it makes lot of sense to use only or defer. Using only and defer (or values) gives you better performances only if you need to retrieve lot of data, but it does not make your database job much easier (unless you really have to read a lot of data of course).
Since you are using Django and Postgresql, you can get a psql session with manage.py dbshell and get query timings with \timing
If I perform a prefetch_related('toppings') for a queryset, and I want to later filter(spicy=True) by fields in the related table, Django ignores the cached info and does a database query. I found that this is documented (under the Note box) and seems to happen for all forms of caching (select_related(), already evaluated querysets, etc.) when another filter() is performed.
However, is there some sort of super secret hidden time-saving shortcut to filter locally (using the cache and not hitting the database) without having to write the python code to loop the queryset (using list/dict comprehension, etc.)? Maybe something like a filter_locally(spicy=True)?
EDIT:
One of the reasons why a list/comprehension doesn't work well for me is because a list/dict does not have the queryset methods. In my case, the first level M2M field, toppings, isn't the end goal for me and I need to check a 2nd related M2M field (which I have already pre-fetched as well). While this is also possible using list comprehension, it's just much simpler to have something such as filter_locally(spicy=True, origin__country='Spain') because:
it allows accessing many levels of related fields with minimal effort
it allows chaining other queryset methods
it's easier to read because it's consistent with the familiar filter()
it's easier to modify existing code using filter() without prefetch to add this optimization in without much changes.
But from the responses, Django has no such support :(
You have to write the python code to loop through the queryset (a list/dict comprehension is ideal). All the filter() code knows how to do is add filtering language to the SQL sent to the database. Filtering locally is a totally different problem than filtering remotely, so the solutions to those two separate problems won't be able to share any logic.
A list comprehension one-liner would be pretty straightforward, though; the syntax might not be much more complex than with filter().
If you're filtering on a boolean doing the list comprehension is pretty easy. You can also swap out the topping.spicy==True for a string comparison or whatever.
I would do something like:
qs = Pizza.objects.all().prefetch_related('toppings')
res = list(qs)
def get_spicy(qs):
res = list(qs)
return [pizza for pizza in res if any(topping.spicy==True for
topping in pizza.toppings.all())]
That is if you want to return the pizza object if any of its toppings is spicy. You can also replace the any() with all() to check for all, and do a lot of pretty powerful queries with this syntax. I'm somewhat surprised that there is no easy way to do this in django. It seems like a lot of these simple queries should be easy to implement in a generic manner.
The above code assumes a many2many. It should be easy to modify to work with a simple FK relationship such as a one2one or one2many.
Hope this was helpful.
Interested in knowing how lazy loading is achieved in frameworks like Django. When is the decision made to perform the join? And is there a way to force eager loading in Django? Are there times when you would need to force Django to eager load?
The general answer is that Django makes the decision to perform the query when you actually ask for some records. Most commonly this means iterating over the queryset (for record in queryset:) or using the list() built-in function to convert the queryset to a list.
See When QuerySets are evaluated for more specifics from the official docs.
It accomplishes this by defining a class, called QuerySet in django/db/models/query.py, where the special methods like __repr__, __getitem__ and __iter__ are coded to do the right thing.
If you need to force eager loading just run the built-in Python list function on the queryset, like:
qs = SomeModel.objects.all()
ql = list(qs)
This call to list() will perform the DB query and load all of the objects into memory. It should be pretty rare that you need to do this, but one case is when you need to use the query results in more than one place in your templates. Converting to list and passing the list in your template context will perform the query only once instead of once for every place in your template you iterate.