Django get_FOO_display and distinct()

Django get_FOO_display and distinct() - django

I've seen answers to both halves of my question, but I can't work out how to marry the two.
I have a book model, and a translatedBook model.
The translatedBook has a langage set up as model choices in the usual way:
LANGUAGES = (
(u'it', u'Italian'),
(u'ja', u'Japanese'),
(u'es', u'Spanish'),
(u'zh-cn', u'Simplified Chinese'),
(u'zh-tw', u'Traditional Chinese'),
(u'fr', u'French'),
(u'el', u'Greek'),
(u'ar', u'Arabic'),
(u'bg', u'Bulgarian'),
(u'bn', u'Bengali'),
etc
I know that to get "Italian" I have to do translatedBook.get_language_display on a Book object.
But how do I get a list of distinct languages in their long format?
I've tried:
lang_avail = TargetText.objects.values('language').distinct().order_by('language')
lang_avail = TargetText.objects.distinct().order_by('language').values('language').
lang_avail = TargetText.objects.all().distinct('language').order_by('language')
but I can't seem to get what I want - which is a list like:
"English, Italian, Simplified Chinese, Spanish"
The final lang_avail listed above didn't return the list of 5, it returned the list of 355 (ie, # of books) with multiple repeats....
-- EDIT --
Daniel's answer almost got me there - as it turns out, that throws a "dicts are unhashable" error. Thanks to Axiak on the django irc, we use Daniel's solution with this line instead:
langs = TargetText.objects.values_list('language', flat=True).distinct().order_by('language')
and it works.

There isn't a built-in way. You could do something like this:
lang_dict = dict(LANGUAGES)
langs = TargetText.objects.values('language').distinct().order_by('language')
long_langs = [lang_dict[lang] for lang in langs]
which simply makes a dictionary out of the LANGUAGE choices and then looks up each language ID.

Related

Custom Django count filtering

A lot of websites will display:
"1.8K pages" instead of "1,830 pages"
or
"43.2M pages" instead of "43,200,123 pages"
Is there a way to do this in Django?
For example, the following code will generate the quantified amount of objects in the queryset (i.e. 3,123):
Books.objects.all().count()
Is there a way to add a custom count filter to return "3.1K pages" instead of "3,123 pages?
Thank you in advance!

First off, I wouldn't do anything that alters the way the ORM portion of Django works. There are two places this could be done, if you are only planning on using it in one place - do it on the frontend. With that said, there are many ways to achieve this result. Just to spout off a few ideas, you could write a property on your model that calls count then converts that to something a little more human readable for the back end. If you want to do it on the frontend you might want to find a JavaScript lib that could do the conversion.
I will edit this later from my computer and add an example of the property.
Edit: To answer your comment, the easier one to implement depends on your skills in python vs in JavaScript. I prefer python so I would probably do it in there somewhere on the model.
Edit2: I have wrote an example to show you how I would do a classmethod on a base model or on the model that you need these numbers on. I found a python package called humanize and I took its function that converts these to readable and modified it a bit to allow for thousands and took out some of the super large number conversion.
def readable_number(value, short=False):
# Modified from the package `humanize` on pypy.
powers = [10 ** x for x in (3, 6, 9, 12, 15, 18)]
human_powers = ('thousand', 'million', 'billion', 'trillion', 'quadrillion')
human_powers_short = ('K', 'M', 'B', 'T', 'QD')
try:
value = int(value)
except (TypeError, ValueError):
return value
if value < powers[0]:
return str(value)
for ordinal, power in enumerate(powers[1:], 1):
if value < power:
chopped = value / float(powers[ordinal - 1])
chopped = format(chopped, '.1f')
if not short:
return '{} {}'.format(chopped, human_powers[ordinal - 1])
return '{}{}'.format(chopped, human_powers_short[ordinal - 1])
class MyModel(models.Model):
#classmethod
def readable_count(cls, short=True):
count = cls.objects.all().count()
return readable_number(count, short=short)
print(readable_number(62220, True)) # Returns '62.2K'
print(readable_number(6555500)) # Returns '6.6 million'
I would stick that readable_number in some sort of utils and just import it in your models file. Once you have that, you can just stick that string wherever you would like on your frontend.
You would use MyModel.readable_count() to get that value. If you want it under MyModel.objects.readable_count() you will need to make a custom object manager for your model, but that is a bit more advanced.

counting occurrence of name in list of namedtuple (the name is in a nested tuple)

As the title says, i'm trying to count the occurrence of a name in a list of namedtuples, with the name i'm looking for in a nested tuple.
It is an assignment for school, and a big part of the code is given.
The structure of the list is as follows:
paper = namedtuple( 'paper', ['title', 'authors', 'year', 'doi'] )
for (id, paper_info) in Summaries.iteritems():
Summaries[id] = paper( *paper_info )
It was easy to get the number of unique titles for each year, since both 'title' and 'year' contain one value, but i can't figure out how to count the number of unique authors per year.
I don't expect you guys to give me the entire code or something, but if you could give me a link to a good tutorial about this subject this would help a lot.
I did google around a lot, but i cant find any helpful information!
I hope i'm not asking too much, first time i ask a question here.
EDIT:
Thanks for the responses so far. This is the code i have now:
authors = [
auth
for paper in Summaries.itervalues()
for auth in paper.authors
]
authors
The problem is, i only get a list of all the authors with this code. I want them linked to the year tough, so i can check the amount of unique authors for each year.

For keeping track of unique objects, I like using set. A set behaves like a mathematical set in that it can have at most one copy of any given thing in it.
from collections import namedtuple
# by convention, instances of `namedtuple` should be in UpperCamelCase
Paper = namedtuple('paper', ['title', 'authors', 'year', 'doi'])
papers = [
Paper('On Unicorns', ['J. Atwood', 'J. Spolsky'], 2008, 'foo'),
Paper('Discourse', ['J. Atwood', 'R. Ward', 'S. Saffron'], 2012, 'bar'),
Paper('Joel On Software', ['J. Spolsky'], 2000, 'baz')
]
authors = set()
for paper in papers:
authors.update(paper.authors) # "authors = union(authors, paper.authors)"
print(authors)
print(len(authors))
Output:
{'J. Spolsky', 'R. Ward', 'J. Atwood', 'S. Saffron'}
4
More compactly (but also perhaps less readably), you could construct the authors set by doing:
authors = set([author for paper in papers for author in paper.authors])
This may be faster if you have a large volume of data (I haven't checked), since it requires fewer update operations on the set.

If you don't want to use embeded type set() and want to understand the logic, use a list and if bifurcation.
If we don't use set() in senshin's code:
# authors = set()
# for paper in papers:
# authors.update(paper.authors) # "authors = union(authors, paper.authors)"
authors = []
for paper in papers:
for author in paper.authors:
if not author in authors:
authors.append(author)
You can get similar result as senshin's. I hope it helps.

Concatenate queryset in django

I want to concatenate two queryset obtained from two different models and i can do it using itertools like this:
ci = ContributorImage.objects.all()
pf = Portfolio.objects.all()
cpf = itertools.chain(ci,pf)
But the real fix is paginating results.If i pass a iterator(cpf, or our concatenated queryset) to Paginator function, p = Paginator(cpf, 10), it works as well but fails at retrieving first page page1 = p.page(1) with an error which says:
TypeError: object of type 'itertools.chain' has no len()
What can i do in case like this ?

The itertools.chain() will return a generator. The Paginator class needs an object implementing __len__ (generators, of course do not support it since the size of the collection is not known).
Your problem could be resolved in a number of ways (including using list to evaluate the generator as you mention) however I recommending taking a look at the QuerySetChain mentioned in this answer:
https://stackoverflow.com/a/432666/119071
I think it fits exactly to your problem. Also take a look at the comments of that answer - they are really enlightening :)

I know it's too late, but because I encountered this error, I would answer to this question.
you should return a list of objects:
ci = ContributorImage.objects.all()
pf = Portfolio.objects.all()
cpf = itertools.chain(ci,pf)
cpf_list = list(cpf)

How to make this django attribute name search better?

lcount = Open_Layers.objects.all()
form = SearchForm()
if request.method == 'POST':
form = SearchForm(request.POST)
if form.is_valid():
data = form.cleaned_data
val=form.cleaned_data['LayerName']
a=Open_Layers()
data = []
for e in lcount:
if e.Layer_name == val:
data = val
return render_to_response('searchresult.html', {'data':data})
else:
form = SearchForm()
else:
return render_to_response('mapsearch.html', {'form':form})
This just returns back if a particular "name" matches . How do to change it so that it returns when I give a search for "Park" , it should return Park1 , Park2 , Parking , Parkin i.e all the occurences of the park .

You can improve your searching logic by using a list to accumulate the results and the re module to match a larger set of words.
However, this is still pretty limited, error prone and hard to maintain or even harder to make evolve. Plus you'll never get as nice results as if you were using a search engine.
So instead of trying to manually reinvent the wheel, the car and the highway, you should spend some time setting up haystack. This is now the de facto standard to do search in Django.
Use woosh as a backend at first, it's going to be easier. If your search get slow, replace it with solr.
EDIT:
Simple clean alternative:
Open_Layers.objects.filter(name__icontains=val)
This will perform a SQL LIKE, adding %` for you.
This going to kill your database if used too often, but I guess this is probably not going to be an issue with your current project.
BTW, you probably want to rename Open_Layers to OpenLayers as this is the Python PEP8 naming convention.

Instead of
if e.Layer_name == val:
data = val
use
if val in e.Layer_name:
data.append(e.Layer_name)
(and you don't need the line data = form.cleaned_data)

I realise this is an old post, but anyway:
There's a fuzzy logic string comparison already in the python standard library.
import difflib
Mainly have a look at:
difflib.SequenceMatcher(None, a='string1', b='string2', autojunk=True).ratio()
more info here:
http://docs.python.org/library/difflib.html#sequencematcher-objects
What it does it returns a ratio of how close the two strings are, between zero and 1. So instead of testing if they're equal, you chose your similarity ratio.
Things to watch out for, you may want to convert both strings to lower case.
string1.lower()
Also note you may want to impliment your favourite method of splitting the string i.e. .split() or something using re so that a search for 'David' against 'David Brent' ranks higher.

Is there a way to filter a django queryset based on string similarity (a la python difflib)?

I have a need to match cold leads against a database of our clients.
The leads come from a third party provider in bulk (thousands of records) and sales is asking us to (in their words) "filter out our clients" so they don't try to sell our service to a established client.
Obviously, there are misspellings in the leads. Charles becomes Charlie, Joseph becomes Joe, etc. So I can't really just do a filter comparing lead_first_name to client_first_name, etc.
I need to use some sort of string similarity mechanism.
Right now I'm using the lovely difflib to compare the leads' first and last names to a list generated with Client.objects.all(). It works, but because of the number of clients it tends to be slow.
I know that most sql databases have soundex and difference functions. See my test of it in the update below - it doesn't work as well as difflib.
Is there another solution? Is there a better solution?
Edit:
Soundex, at least in my db, doesn't behave as well as difflib.
Here is a simple test - look for "Joe Lopes" in a table containing "Joseph Lopes":
with temp (first_name, last_name) as (
select 'Joseph', 'Lopes'
union
select 'Joe', 'Satriani'
union
select 'CZ', 'Lopes'
union
select 'Blah', 'Lopes'
union
select 'Antonio', 'Lopes'
union
select 'Carlos', 'Lopes'
)
select first_name, last_name
from temp
where difference(first_name+' '+last_name, 'Joe Lopes') >= 3
order by difference(first_name+' '+last_name, 'Joe Lopes')
The above returns "Joe Satriani" as the only match. Even reducing the similarity threshold to 2 doesn't return "Joseph Lopes" as a potential match.
But difflib does a much better job:
difflib.get_close_matches('Joe Lopes', ['Joseph Lopes', 'Joe Satriani', 'CZ Lopes', 'Blah Lopes', 'Antonio Lopes', 'Carlos Lopes'])
['Joseph Lopes', 'CZ Lopes', 'Carlos Lopes']
Edit after gruszczy's response:
Before writing my own, I looked for and found a T-SQL implementation of Levenshtein Distance in the repository of all knowledge.
In testing it, it still won't do a better matching job than difflib.
Which led me to research what algorithm is behind difflib. It seems to be a modified version of the Ratcliff-Obershelp algorithm.
Unhappily I can't seem to find some other kind soul who has already created a T-SQL implementation based on difflib's... I'll try my hand at it when I can.
If nobody else comes up with a better answer in the next few days, I'll grant it to gruszczy. Thanks, kind sir.

soundex won't help you, because it's a phonetic algorithm. Joe and Joseph aren't similar phonetically, so soundex won't mark them as similar.
You can try Levenshtein distance, which is implemented in PostgreSQL. Maybe in your database too and if not, you should be able to write a stored procedure, which will calculate the distance between two strings and use it in your computation.

It's possible with trigram_similar lookups since Django 1.10, see docs for PostgreSQL specific lookups and Full text search

As per the answer of andilabs you can use the Levenshtein function to create your custom function. Postgres doc indicates that the Levenshtein function is as follows:
levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
levenshtein(text source, text target) returns int
andilabs answer can use the only second function. If you want a more advanced search with insertion/deletion/substitution costs, you can rewrite function like this:
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s', %(ins_cost)d, %(del_cost)d, %(sub_cost)d)"
function = 'levenshtein'
def __init__(self, expression, search_term, ins_cost=1, del_cost=1, sub_cost=1, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
ins_cost=ins_cost,
del_cost=del_cost,
sub_cost=sub_cost,
**extras
)
And call the function:
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka', 3, 3, 1) # ins = 3, del = 3, sub = 1
).filter(
lev_dist__lte=2
)

If you need getting there with django and postgres and don't want to use introduced in 1.10 trigram-similarity https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/lookups/#trigram-similarity you can implement using Levensthein like these:
Extension needed fuzzystrmatch
you need adding postgres extension to your db in psql:
CREATE EXTENSION fuzzystrmatch;
Lets define custom function with wich we can annotate queryset. It just take one argument the search_term and uses postgres levenshtein function (see docs):
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
then in any other place in project we just import defined Levenshtein and F to pass the django field.
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka')
).filter(
lev_dist__lte=2
)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django get_FOO_display and distinct() - django

Related

Custom Django count filtering

counting occurrence of name in list of namedtuple (the name is in a nested tuple)

Concatenate queryset in django

How to make this django attribute name search better?

Is there a way to filter a django queryset based on string similarity (a la python difflib)?

Categories

Resources