Sort django haystack results by datetime (whoosh backend) - django

I have a SearchIndex whose results I want to be sorted based on a DatetimeField. However when I try to manage.py rebuild_index, I get a ValueError complaining about the datetime being a... datetime.
In case it matters, I use timezones and pytz, but for the sorting I want, timezones do not really matter, I just want the newest first kind of thing.
The Index
I have removed some irrelevant fields.
class ArticleIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
publish_date = indexes.DateTimeField(model_attr='publish_date')
The view/url pair
posts_sqs = SearchQuerySet().order_by('-publish_date')
urlpatterns += patterns(
'haystack.views',
url(r'^search/$', search_view_factory(
view_class=PostsSearchView,
template='pages/search.html',
searchqueryset=posts_sqs,
form_class=ThreeCharMinSearchForm), # a custom form
name='pages.search'),
)
The rebuild_index Exception
ValueError: datetime.datetime(2015, 1, 23, 16, 31, 28, tzinfo=<UTC>) is not unicode or sequence
I have tried to implement prepare_publish_date methods that return strftime representations ('%Y %m %d %H %M' and '%d %m %Y %H %M') with both naive and aware datetimes, timetuples, "epoch times" with a CharField instead of a DateTimeField and I can't remember what else and all failed, except for the "epoch time" version, which was terribly slow though.
As a last note, I use Python 2.7.8, Django 1.6.10 and before I tried to do this sorting, the index was working nicely (even better that what was expected), so I am pretty sure the rest of the implementation is correct.
I understand that it is Whoosh that's expecting unicode, but I don't know what to do exactly. Any thoughts?

Thanks for the feedback everyone, because when noone answers your question on SO, the feedback basically is "Dude, what you 're saying doesn't make sense, question your assumptions and check everything.".
So, in my case, there was a customized WhooshSearchBackend which was stripped down and did not account for DateTimeFields. For anyone stumbling upon this, Whoosh and Haystack can handle datetimes just fine. If they don't, check your setup.

Related

"T" character added to serialized date with Django wadofstuff serializer

I recently upgraded to Django 1.4 from 1.1. I have been running WadofStuff Django Serializers 1.0.0. After upgrade, I noticed that dates from my django model get serialized with a 'T' character inserted:
{"pk": 7, "model": "ao.message", "fields": {"content_file": "bar.wav", "date": "2012-07-04T10:58:46", "summary_file": "foo.wav"}}
What's up with that 'T'? Can/should it be removed? Is there a way to specify my desired output date format to the serializer (say, if I didn't want it to return with a 'T')?
Thanks
A single point in time can be represented by concatenating: a complete date expression, the letter T as a delimiter, and a valid time expression. For example "2007-04-05T14:30" (Wikipedia Link )
For further details regarding this T insertion in datetime format as far as python is concerned, you may go to this link: "Python Datetime Representations". The first example specifically illustrates your problem and suggests its solutions too.

Django-haystack (xapian) autocomplete giving incomplete results

I have a django site running django-haystack with xapian as a back end. I got my autocomplete working, but it's giving back weird results. The results coming back from the searchqueryset are incomplete.
For example, I have the following data...
['test', 'test 1', 'test 2']
And if I type in 't', 'te', or 'tes' I get nothing back. However, if I type in 'test' I get back all of the results, as would be expected.
I have something looking like this...
results = SearchQuerySet().autocomplete(auto=q).values('auto')
And my search index looks like this...
class FacilityIndex(SearchIndex):
text = CharField(document=True, use_template=True)
created = DateTimeField(model_attr='created')
auto = EdgeNgramField(model_attr='name')
def get_model(self):
return Facility
def index_queryset(self):
return self.get_model().objects.filter(created__lte=datetime.datetime.now())
Any tips are appreciated. Thanks.
A bit late, but you need to check the min ngram size that is being indexed. It is most likely 4 chars, so it won't match on anything with fewer chars than that. I am not a Xapian user though, so I don't know how to change this configuration option for that backend.

how to write a query to get find value in a json field in django

I have a json field in my database which is like
jsonfield = {'username':'chingo','reputation':'5'}
how can i write a query so that i can find if a user name exists. something like
username = 'chingo'
query = User.objects.get(jsonfield['username']=username)
I know the above query is a wrong but I wanted to know if there is a way to access it?
If you are using the django-jsonfield package, then this is simple. Say you have a model like this:
from jsonfield import JSONField
class User(models.Model):
jsonfield = JSONField()
Then to search for records with a specific username, you can just do this:
User.objects.get(jsonfield__contains={'username':username})
Since Django 1.9, you have been able to use PostgreSQL's native JSONField. This makes search JSON very simple. In your example, this query would work:
User.objects.get(jsonfield__username='chingo')
If you have an older version of Django, or you are using the Django JSONField library for compatibility with MySQL or something similar, you can still perform your query.
In the latter situation, jsonfield will be stored as a text field and mapped to a dict when brought into Django. In the database, your data will be stored like this
{"username":"chingo","reputation":"5"}
Therefore, you can simply search the text. Your query in this siutation would be:
User.objects.get(jsonfield__contains='"username":"chingo"')
2019: As #freethebees points out it's now as simple as:
User.objects.get(jsonfield__username='chingo')
But as the doc examples mention you can query deeply, and if the json is an array you can use an integer to index it:
https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#querying-jsonfield
>>> Dog.objects.create(name='Rufus', data={
... 'breed': 'labrador',
... 'owner': {
... 'name': 'Bob',
... 'other_pets': [{
... 'name': 'Fishy',
... }],
... },
... })
>>> Dog.objects.create(name='Meg', data={'breed': 'collie', 'owner': None})
>>> Dog.objects.filter(data__breed='collie')
<QuerySet [<Dog: Meg>]>
>>> Dog.objects.filter(data__owner__name='Bob')
<QuerySet [<Dog: Rufus>]>
>>> Dog.objects.filter(data__owner__other_pets__0__name='Fishy')
<QuerySet [<Dog: Rufus>]>
Although this is for postgres, I believe it works the same in other DBs like MySQL
Postgres: https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#querying-jsonfield
MySQL: https://django-mysql.readthedocs.io/en/latest/model_fields/json_field.html#querying-jsonfield
This usage is somewhat anti-pattern. Also, its implementation is not going to have regular performance, and perhaps is error-prone.
Normally don't use jsonfield when you need to look up through fields. Use the way the RDBMS provides or MongoDB(which internally operates on faster BSON), as Daniel pointed out.
Due to the deterministic of JSON format,
you could achieve it by using contains (regex has issue when dealing w/ multiple '\' and even slower), I don't think it's good to use username in this way, so use name instead:
def make_cond(name, value):
from django.utils import simplejson
cond = simplejson.dumps({name:value})[1:-1] # remove '{' and '}'
return ' ' + cond # avoid '\"'
User.objects.get(jsonfield__contains=make_cond(name, value))
It works as long as
the jsonfield using the same dump utility (the simplejson here)
name and value are not too special (I don't know any egde-case so far, maybe someone could point it out)
your jsonfield data is not corrupt (unlikely though)
Actually I'm working on a editable jsonfield and thinking about whether to support such operations. The negative proof is as said above, it feels like some black-magic, well.
If you use PostgreSQL you can use raw sql to solve problem.
username = 'chingo'
SQL_QUERY = "SELECT true FROM you_table WHERE jsonfield::json->>'username' = '%s'"
User.objects.extra(where=[SQL_EXCLUDE % username]).get()
where you_table is name of table in your database.
Any methods when you work with JSON like with plain text - looking like very bad way.
So, also I think that you need a better schema of database.
Here is the way I have found out that will solve your problem:
search_filter = '"username":{0}'.format(username)
query = User.objects.get(jsonfield__contains=search_filter)
Hope this helps.
You can't do that. Use normal database fields for structured data, not JSON blobs.
If you need to search on JSON data, consider using a noSQL database like MongoDB.

django escapejs and simplejson

I'm trying to encode a Python array into json using simplejson.dumps:
In [30]: s1 = ['test', '<script>']
In [31]: simplejson.dumps(s1)
Out[31]: '["test", "<script>"]'
Works fine.
But I want to escape the strings first (using escapejs from Django) before calling simplejson.dumps:
In [35]: s_esc
Out[35]: [u'test', u'\\u003Cscript\\u003E']
In [36]: print simplejson.dumps(s_esc)
["test", "\\u003Cscript\\u003E"]
My problem is: I want the escaped string to be: ["test", "\u003Cscript\u003E"] instead of ["test", "\\u003Cscript\\u003E"]
I can use replace:
In [37]: print simplejson.dumps(s_esc).replace('\\\\', '\\')
["test", "\u003Cscript\u003E"]
But is this a good approach? I just want to escape the strings first before encoding them to json. So there will be no syntax errors when I use them in template.
Thanks. :)
simplejson 2.1.0 and later include a JSONEncoderForHTML encoder that does exactly what you want. To use it in your example:
>>> s1 = ['test', '<script>']
>>> simplejson.dumps(s1, cls=simplejson.encoder.JSONEncoderForHTML)
'["test", "\\u003cscript\\u003e"]'
I ran into this recently where I didn't have control over the code that was generating the data structures, so I couldn't escape the strings as they were being assembled. JSONEncoderForHTML solved the problem neatly at the point of output.
Of course, you'll need to have simplejson 2.1.0 or later. (Django used to come with an older version, and Django 1.5 deprecated django.utils.simplejson entirely.) If you can't upgrade for some reason, the JSONEncoderForHTML code is relatively small and could probably be pulled into earlier code or used with Python 2.6+'s json package -- though I haven't tried this myself
You're doing the operations in the wrong order. You should dump your data to a JSON string, then escape that string. You can do the escaping with the addslashes Django filter.

Django admin doesn't show translated enumerations in list view under Python 2.3

When using localized list of "choices" for a model field, the admin doesn't show the translated values in the list view.
Short example:
from django.utils.translation import ugettext_lazy as _
class OrderStates:
STATES = (
(STATE_NEW, _("New")),
(STATE_CANCELLED, _("Cancelled")), )
class Order(models.Model):
state = models.IntegerField(choices=OrderStates.STATES)
# ..
class OrderAdmin(admin.ModelAdmin):
list_display = [ 'id', 'state', 'address', 'user']
# ..
admin.site.register(Order, OrderAdmin)
The localized versions of "New" and "Cancelled" show up correctly in the front-end and in the admin form when editing an order. But in the admin list view I get blank fields - regardless of the language I switch to, including English. Column names are fine.
This only happens with Python 2.3 (talk about niche questions). The choices display correctly everywhere with Python 2.5. I don't get any errors or warnings in neither.
Tried using ugettext instead of ugettext_lazy for the options, which didn't work. ugettext_noop sort of works - it at least shows the original english versions instead of blank fields.
Am I doing something wrong or is this a bug?
This is probably a bug somewhere in Django, not calling force_unicode on the item correctly. The original code you pasted is correct. You don't mention what Django version you're using, so I'd reccomend trying the latest 1.0.3 or 1.1 release to see if that happens to fix it, else check the ticket tracker to see if it's already been reported (note that if it hasn't been fixed yet it probably won't be at all, since 1.1 is the last version to support 2.3).
try using:
import gettext as _
Though, that may break if some of your translations use non-ascii values. Actually, this should have been fixed some time ago, see Ticket #5287.
Hope this helps.