elastic-Search Aggregation [python] - django

I indexed the post and community models,
post = Index('posts')
post.settings(
number_of_shards=1,
number_of_replicas=0
)
#post.doc_type
class PostDocument(DocType):
community = fields.ObjectField(properties={
'id': fields.IntegerField(),
'description': fields.TextField(),
'name': fields.StringField(),
})
I want to search posts and aggregate the communities
(returns communities of the posts in the result)
I may need to use aggregation, I had difficulties while implementing it, the documentation was not clear for me.
q = Q("multi_match", query=query, fields=['title', 'content'])
document.query(q)
document.aggs.bucket('per_tag', 'terms', field='community')

I think you need change the aggregation to something similar to:
document.aggs.bucket('per_tag', 'terms', field='community__id')
Because community is a complex objects, and elasticsearch only can do aggregation with simple fields. (keyword or integer)

Related

How to mix multiple querysets into one and re order them by time created?

I am learning Django and still a beginner. For practising, i am trying to make a demo social media website. In my project, users can create groups, then they can post and comment there. In the home page, i am trying to add a section like 'recent activities' where a user can see recent activities in that website like "John created a group 'Javascript', Tim posted a comment in 'Python', Sarah posted in 'CSS'" Now i have made some queries like:
groups = Group.objects.all().order_by('-created')[0:5]
posts = Post.objects.all().order_by('-created')[0:5]
comments = Comment.objects.all().order_by('-created')[0:5]
I want to mix them all in a single queryset. Then order them all by the time they were created. I know it's a silly question and i have been stuck here since morning. Can you help me and show me the process please?
You can chain these together and order by the created field with:
from operator import attrgetter
groups = Group.objects.order_by('-created')[:5]
posts = Post.objects.order_by('-created')[:5]
comments = Comment.objects.order_by('-created')[:5]
all_items = sorted(
[*groups, *posts, *comments],
key=attrgetter('created'),
reversed=True
)
Now all_items is a hetrogenous list with different types of objects. This will thus make the rendering process a bit more complicated since a comment probably has different fields than a Post for example.
You can also use chain function from itertools module to combine the querysets and then sort them in reverse order using the created field as key.
from itertools import chain
groups = Group.objects.all()[0:5]
posts = Post.objects.all()[0:5]
comments = Comment.objects.all()[0:5]
queryset = sorted(
chain(groups, posts, comments),
key=lambda instance: instance.created,
reverse=True
)

Django Rest Framework filtering a set of item to include only latest entry of each type

I have a list of object of this kind of structure returned in my api
SomeCustomModel => {
itemId: "id",
relatedItem: "id",
data: {},
created_at: "data string"
}
I want to return a list that contains only unique relatedItemIds, filtered by the one that was created most recently.
I have written this and it seems to work
id_tracker = {}
query_set = SomeCustomModel.objects.all()
for item in query_set:
if item.relatedItem.id not in id_tracker:
id_tracker[item.relatedItem.id] = 1
else:
query_set = query_set.exclude(id=item.id)
return query_set
This works by I am wondering if there is cleaner way of writing this using only django aggregations.
I am using Mysql so the distinct("relatedItem") aggregation is not supported.
You should try to do this within sql. You can use Subquery to accomplish this. Here's the example from the django docs.
from django.db.models import OuterRef, Subquery
newest = Comment.objects.filter(post=OuterRef('pk')).order_by('-created_at')
Post.objects.annotate(newest_commenter_email=Subquery(newest.values('email')[:1]))
Unfortunately, I haven't found anything that can replace distict() in a django-esque manner. However, you could do something along the lines of:
list(set(map(lambda x: x.['relatedItem_id'], query_set.order_by('created_at').values('relatedItem_id'))))
or
list(set(map(lambda x: x.relatedItem_id, query_set.order_by('created_at'))))
which are a bit more Pythonic.
However, you are saying that you want to return a list yet your function returns a queryset. Which is the valid one?

query in query django

how to make such query in django.
I have Site model where I can find relation to the topic model. In topic model I can fing relation to the post model. I want to extract post from a site having information only about site, not a topic. What is more posts have to starts with query.
query = request.GET.get('query','')
iweb_obj = IWeb.objects.get(id=iweb_id)
topics = Topic.objects.filter(iweb=iweb_obj)
iweb_posts = []
for t in topics:
posts = Post.objects.filter(topic=t)
for p in posts:
iweb_posts.append(p)
iweb_posts = iweb_.filter(content__istartswith=query)
I have an error that iweb_posts isnt query set and I cant make such action. It is quite obvious, however I do not have idea how to make it works ? I've heard that I can use filter(**kwargs) but I do not know how to use it ?
Your logic looks a little funky since you're overwriting posts each time in the topic loop. You can accomplish what you need without loops and lists using only query set filters (I've added an __in filter, for example):
query = request.GET.get('query','')
iweb_obj = IWeb.objects.get(id=iweb_id)
topics = Topic.objects.filter(iweb=iweb_obj)
iweb_posts = Post.objects.filter(topic__in=topics).filter(content__istartswith=query)

how to write a query to get find value in a json field in django

I have a json field in my database which is like
jsonfield = {'username':'chingo','reputation':'5'}
how can i write a query so that i can find if a user name exists. something like
username = 'chingo'
query = User.objects.get(jsonfield['username']=username)
I know the above query is a wrong but I wanted to know if there is a way to access it?
If you are using the django-jsonfield package, then this is simple. Say you have a model like this:
from jsonfield import JSONField
class User(models.Model):
jsonfield = JSONField()
Then to search for records with a specific username, you can just do this:
User.objects.get(jsonfield__contains={'username':username})
Since Django 1.9, you have been able to use PostgreSQL's native JSONField. This makes search JSON very simple. In your example, this query would work:
User.objects.get(jsonfield__username='chingo')
If you have an older version of Django, or you are using the Django JSONField library for compatibility with MySQL or something similar, you can still perform your query.
In the latter situation, jsonfield will be stored as a text field and mapped to a dict when brought into Django. In the database, your data will be stored like this
{"username":"chingo","reputation":"5"}
Therefore, you can simply search the text. Your query in this siutation would be:
User.objects.get(jsonfield__contains='"username":"chingo"')
2019: As #freethebees points out it's now as simple as:
User.objects.get(jsonfield__username='chingo')
But as the doc examples mention you can query deeply, and if the json is an array you can use an integer to index it:
https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#querying-jsonfield
>>> Dog.objects.create(name='Rufus', data={
... 'breed': 'labrador',
... 'owner': {
... 'name': 'Bob',
... 'other_pets': [{
... 'name': 'Fishy',
... }],
... },
... })
>>> Dog.objects.create(name='Meg', data={'breed': 'collie', 'owner': None})
>>> Dog.objects.filter(data__breed='collie')
<QuerySet [<Dog: Meg>]>
>>> Dog.objects.filter(data__owner__name='Bob')
<QuerySet [<Dog: Rufus>]>
>>> Dog.objects.filter(data__owner__other_pets__0__name='Fishy')
<QuerySet [<Dog: Rufus>]>
Although this is for postgres, I believe it works the same in other DBs like MySQL
Postgres: https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#querying-jsonfield
MySQL: https://django-mysql.readthedocs.io/en/latest/model_fields/json_field.html#querying-jsonfield
This usage is somewhat anti-pattern. Also, its implementation is not going to have regular performance, and perhaps is error-prone.
Normally don't use jsonfield when you need to look up through fields. Use the way the RDBMS provides or MongoDB(which internally operates on faster BSON), as Daniel pointed out.
Due to the deterministic of JSON format,
you could achieve it by using contains (regex has issue when dealing w/ multiple '\' and even slower), I don't think it's good to use username in this way, so use name instead:
def make_cond(name, value):
from django.utils import simplejson
cond = simplejson.dumps({name:value})[1:-1] # remove '{' and '}'
return ' ' + cond # avoid '\"'
User.objects.get(jsonfield__contains=make_cond(name, value))
It works as long as
the jsonfield using the same dump utility (the simplejson here)
name and value are not too special (I don't know any egde-case so far, maybe someone could point it out)
your jsonfield data is not corrupt (unlikely though)
Actually I'm working on a editable jsonfield and thinking about whether to support such operations. The negative proof is as said above, it feels like some black-magic, well.
If you use PostgreSQL you can use raw sql to solve problem.
username = 'chingo'
SQL_QUERY = "SELECT true FROM you_table WHERE jsonfield::json->>'username' = '%s'"
User.objects.extra(where=[SQL_EXCLUDE % username]).get()
where you_table is name of table in your database.
Any methods when you work with JSON like with plain text - looking like very bad way.
So, also I think that you need a better schema of database.
Here is the way I have found out that will solve your problem:
search_filter = '"username":{0}'.format(username)
query = User.objects.get(jsonfield__contains=search_filter)
Hope this helps.
You can't do that. Use normal database fields for structured data, not JSON blobs.
If you need to search on JSON data, consider using a noSQL database like MongoDB.

Importing django model methods in json

I am trying to output a set of database records in JSON as follows:
def json_dbtable(request, p):
t = MyModel.objects.filter({some query})
s = serializers.get_serializer("json")()
re = s.serialize(t, ensure_ascii=False)
return HttpResponse(re, mimetype="application/json")
However, one of the fields i'm trying to return needs to change if it is null, and to remedy this the model has a definition that is used as a property .e.g:
name = property(_get_useful_name)
So, to get to the crux of the question. How can I include this "name" property in my json serialization as well as the raw field data?
The short answer is no, the long answer, is you could serialize your MyModel instance yourself:
simplejson.dumps([{'pk': m.pk, 'name': m.name} for m in MyModel.objects.filter(...)])
I have written a serialization framework for Python called any2any
which include (de)serializers for Django
and which allows you to do that easily.
It will be way cleaner than the DIY way.
Hope that helps !