How to limit records of relations with include-filter in loopback? - loopbackjs

I want to query records from a specific model via REST-Api from a LoopBack-application. Also i want to include related objects via the include-filter.
This works fine, but returns ALL related objects. Is it possible to limit them and also to order them by a field of related objects?
Models:
- DEPARTMENT
Fields:
- id
- name
- ...
Relations_ -> hasMany: Messages
Relations_ -> hasMany: Members
- MESSAGE
Fields:
- id
- senderId
- body
- ...
- MEMBER
Fields:
- id
- email
- ...
Queries:
What i want to achieve is to query all Departments with all their members, but only the last message ordered by a specific field (created-timestamp).
The first approach could be the plain query-string variant of a GET-Request:
http://loopback-server:3000/api/departments?filter[include]=members&filter[include]=messages
This will return all departments with all messages and all members. However, i would like to limit the number of returned messages to the last one (or last 5 or whatever, sorted by a specific field of MESSAGE-model.
I also tried the jsonfied query syntax:
http://loopback-server:3000/api/departments?filter={"include":{"relation": "messages","limit":1}}
Unfortunately the "limit"-parameter is not used here for the relation of messages.
The following variant will return only first department, means the limit-param is applied to the departments-model, not the relation model.
http://loopback-server:3000/api/departments?filter={"include":{"relation": "messages"},"limit":1}
Then i discovered the scope-parameter
and tried this:
http://loopback-server:3000/api/departments?filter={"include":{"relation": "messages","scope":{"limit":1, "skip":0}}}
This gives a really strange result. This ommits all messages related to departments, instead of one specific record returning one message (it has over 10), what i would expect. Removing the scope-parameter shows that the departments indeed has many messages each.
(I know that the parameters of an URL with all these special characters like {",:"} need to be url-encoded. I leave it clean here for better readability)
My question:
How to achieve that query with a single request?

It's not possible to query relationships by their properties (yet). As for the limit, your last approach with the scope should be modified a little:
"scope":{{"include":{"relation": "messages","limit":1, "skip":0}}}
Here you can read about queries on relations by their properties:
https://github.com/strongloop/loopback/issues/517

I dunno what version you are in, but for Loopback 3
you can do this..
include: {
{
relation: 'Messages', // include the messages object
scope: { // this is where you do a normal filter
where: {<whatevercondition>},
order: "<fieldname> <ASC/DESC>",
limit:1,
include:{
//yes, you can include 3rd level relation as well.
}
}
},
{
relation: 'Members', // include the Members object
scope: { // further filter the related model
order: "<fieldname> <ASC/DESC>",
limit: <whateverlimityoument>
}
}
}

try this code:
`${urlApi}/user/?filter[limit]=${records_per_page}&filter[skip]=${(currentPage -1) *
records_per_page}`

Limit for inclusion scope works correctly when you have only one parent record.
If you want to select N parent records and include 1 related record in each of them, try my workaround: Limit for included records in Loopback4

Related

Is there a way how to address nested properties in AWS DynamoDB for purpose of documentClient.query() call?

I am currently testing how to design a query from AWS.DynamoDB.DocumentClient query() call that takes params: DocumentClient.QueryInput, which is used for retrieving data collection from a table in DynamoDB.
Query seems to be simple and working fine while working with indexes of type String or Number only. What I am not able to make is an query, that will use a valid index and filter upon an attribute that is nested (see my data structure please).
I am using FilterExpression, where can be defined logic for filtering - and that seems to be working fine in all cases except cases when trying to do filtering on nested attribute.
Current parameters, I am feeding query with
parameters {
TableName: 'myTable',
ProjectionExpression: 'HashKey, RangeKey, Artist ,#SpecialStatus, Message, Track, Statistics'
ExpressionAttributeNames: { '#SpecialStatus': 'Status' },
IndexName: 'Artist-index',
KeyConditionExpression: 'Artist = :ArtistName',
ExpressionAttributeValues: {
':ArtistName': 'BlindGuadian',
':Track': 'Mirror Mirror'
},
FilterExpression: 'Track = :Track'
}
Data structure in DynamoDB's table:
{
'Artist' : 'Blind Guardian',
..
'Track': 'Mirror Mirror',
'Statistics' : [
{
'Sales': 42,
'WrittenBy' : 'Kursch'
}
]
}
Lets assume we want to filter out all entries from DB, by using Artist in KeyConditionExpression. We can achieve this by feeding Artist with :ArtistName. Now the question, how to retrieve records that I can filter upon WritenBy, which is nested in Statistics?
To best of my knowledge, we are not able to use any other type but String, Number or Binary for purpose of making secondary indexes. I've been experimenting with Secondary Indexes and Sorting Keys as well but without luck.
I've tried documentClient.scan(), same story. Still no luck with accessing nested attributes in List (FilterExpression just won't accept it).
I am aware of possibility to filter result on "application" side, once the records are retrieved (by Artists for instance) but I am interested to filter it out in FilterExpression
If I understand your problem correctly, you'd like to create a query that filters on the value of a complex attribute (in this case, a list of objects).
You can filter on the contents of a list by indexing into the list:
var params = {
TableName: "myTable",
FilterExpression: "Statistics[0].WrittenBy = :writtenBy",
ExpressionAttributeValues: {
":writtenBy": 'Kursch'
}
};
Of course, if you don't know the specific index, this wont really help you.
Alternatively, you could use the CONTAINS function to test if the object exists in your list. The CONTAINS function will require all the attributes in the object to match the condition. In this case, you'd need to provide Sales and WrittenBy, which probably doesn't solve your problem here.
The shape of your data is making your access pattern difficult to implement, but that is often the case with DDB. You are asking DDB to support a query of a list of objects, where the object has a specific attribute with a specific value. As you've seen, this is quote tricky to do. As you know, getting the data model to correctly support your access patterns is critical to your success with DDB. It can also be difficult to get right!
A couple of ideas that would make your access pattern easier to implement:
Move WrittenBy out of the complex attribute and put it alongside the other top-level attributes. This would allow you to use a simple FilterExpression on the WrittenBy attribute.
If the WrittenBy attribute must stay within the Statistics list, make it stand alone (e.g. [{writtenBy: Kursch}, {Sales: 42},...]). This way, you'd be able to use the CONTAINS keyword in your search.
Create a secondary index with the WrittenBy field in either the PK or SK (whichever makes sense for your data model and access patterns).

How can you filter a Django query's joined tables then iterate the joined tables in one query?

I have table Parent, and a table Child with a foreign key to table Parent.
I want to run a query for all Parents with a child called Eric, and report Eric's age.
I run:
parents = Parents.objects.filter(child__name='Eric')
I then iterate over the queryset:
for parent in parents:
print(f'Parent name {parent.name} child Eric age {parent.child.age}')
Clearly this doesn't work - I need to access child through the foreign key object manager, so I try:
for parent in parents:
print(f'Parent name {parent.name}')
for child in parent.child_set.all():
print(f'Child Eric age {parent.child.age}')
Django returns all children's ages, not just children named Eric.
I can repeat the filter conditions:
parents = Parents.objects.filter(child__name='Eric')
for parent in parents:
print(f'Parent name {parent.name}')
for child in parent.child_set.filter(name='Eric'):
print(f'Child Eric age {child.age}')
But this means duplicate code (so risks future inconsistency when another dev makes a change to one not the other), and runs a second query on the database.
Is there a way of getting the matching records and iterating over them? Been Djangoing for years and can't believe I can't do this!
PS. I know that I can do Child.objects.filter(name='Eric').select_related('parent'). But what I would really like to do involves a second child table. So add to the above example a table Address with a foreign key to Parent. I want to get parents with children named Eric and addresses in Timbuktu and iterate over the all Timbuktu addresses and all little Erics. This is why I don't want to use Child's object manager.
This is the best I could come up with - three queries, repeating each filter.
children = Children.objects.filter(name='Eric')
addresses = Address.objects.filter(town='Timbuktu')
parents=(
Parent.objects
.filter(child__name='Eric', address__town='Timbuktu')
.prefetch_related(Prefetch('child_set', children))
.prefetch_related(Prefetch('address_set', addresses))
)
The .values function gives you direct access to the recordset returned (thank you #Iain Shelvington):
parents_queryset_dicts = Parent.objects
.filter(child__name='Eric', address__town='Timbuktu')
.values('id', 'name', 'child__id', 'address__id', 'child__age', 'address__whatever')
.order_by('id', 'child__id', 'address__id')
Note though that this retrieves a Cartesian product of children and addresses, so our gain in reduced query count is slightly offset by double-sized result sets and de-duplication below. So I am starting to think two queries using Child.objects and Address.objects is superior - slightly slower but simpler code.
In my actual use case I have multiple, multi-table chains of foreign key joins, so am splitting the query to prevent the Cartesian join, but still making use of the .values() approach to get filtered, nested tables.
If you then want a hierarchical structure, eg, for sending as JSON to the client, to produce:
parents = {
parent_id: {
'name': name,
'children': {
child_id: {
'age': child_age
},
'addresses': {
address_id: {
'whatever': address_whatever
},
},
},
}
Run something like:
prev_parent_id = prev_child_id = prev_address_id = None
parents = {}
for parent in parents_queryset_dicts:
if parent['id'] != prev_parent_id:
parents[parent['id']] = {'name': parent['name'], children: {}, addresses: {}}
prev_parent_id = parent['id']
if parent['child__id'] != prev_child_id:
parents[parent['id']]['children'][parent['child__id']] = {'age': parent['child__age']}
prev_child_id = parent['child__id']
if parent['address__id'] != prev_address_id:
parents[parent['id']]['addresses'][parent['address__id']] = {'whatever': parent['address__whatever']}
prev_address_id = parent['address__id']
This is dense code, and you no longer get access to any fields not explicitly extracted and copied in, including any nested ~_set querysets, and de-duplication of the Cartesian product is not obvious to later developers. You can grab the queryset, keep it, then extract the .values, so you have both from the same, single, database query. But often the three query repeated filters is a bit cleaner, if a couple database queries less efficient:
children = Children.objects.filter(name='Eric')
addresses = Address.objects.filter(town='Timbuktu')
parents_queryset = (
Parent.objects
.filter(child__name='Eric', address__town='Timbuktu')
.prefetch_related(Prefetch('child_set', children))
.prefetch_related(Prefetch('address_set', addresses))
)
parents = {}
for parent in parents_queryset:
parents[parent.id] = {'name': parent['name'], children: {}, addresses: {}}
for child in parent.child_set: # this is implicitly filtered
parents[parent.id]['children'][child.id] = {'age': child.age}
for address in parent.address_set: # also implicitly filtered
parents[parent.id]['addresses'][address.id] = {'whatever': address.whatever}
One last approach, which someone briefly posted then deleted - I'd love to know why - is using annotate and F() objects. I have not experimented with this, the SQL generated looks fine though and it seems to run a single query and not require repeating filters:
from django.db.models import F
parents = (
Parent.objects.filter(child__name='Eric')
.annotate(child_age=F('child__age'))
)
Pros and cons seem identical to .values() above, although .values() seems slightly more basic Django (so easier to read) and you don't have to duplicate field names (eg, with the obfuscation above of child_age=child__age). Advantages might be convenience of . accessors instead of ['field'], you keep hold of the lazy nested recordsets, etc. - although if you're counting the queries you probably want things to fall over if you issue an accidental query per row.

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.
Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

How to create a conversation inbox in Django

I have a Message class which has fromUser, toUser, text and createdAt fields.
I want to imitate a whatsapp or iMessage or any SMS inbox, meaning I want to fetch the last message for each conversation.
I tried:
messages = Message.objects.order_by('createdAt').distinct('fromUser', 'toUser')
But this doesn't work because of SELECT DISTINCT ON expressions must match initial ORDER BY expressions error.
I don't really understand what it means, I also tried:
messages = Message.objects.order_by('fromUser','toUser','createdAt').distinct('fromUser', 'toUser')
and such but let me not blur the real topic here with apparently meaningless code pieces. How can I achieve this basic or better said, general well-known, result?
Your second method is correct. From the Django docs:
When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.
For example, SELECT DISTINCT ON (a) gives you the first row for each value in column a. If you don’t specify an order, you’ll get some arbitrary row.
This means that you must include the same columns in your order_by() method that you want to use in the distinct() method. Indeed, your second query correctly includes the columns in the order_by() method:
messages = Message.objects.order_by('fromUser','toUser','createdAt').distinct('fromUser', 'toUser')
In order to fetch the latest record, you need to order the createdAt column by descending order. The way to specify this order is to include a minus sign on the column name in the order_by() method (there is an example of this in the docs here). Here's the final form that you should use to get your list of messages in latest-first order:
messages = Message.objects.order_by('fromUser','toUser','-createdAt').distinct('fromUser', 'toUser')

Deduplicaton / matching in Couchdb?

I have documents in couchdb. The schema looks like below:
userId
email
personal_blog_url
telephone
I assume two users are actually the same person as long as they have
email or
personal_blog_url or
telephone
be identical.
I have 3 views created, which basically maps email/blog_url/telephone to userIds and then combines the userIds into the group under the same key, e.g.,
_view/by_email:
----------------------------------
key values
a_email#gmail.com [123, 345]
b_email#gmail.com [23, 45, 333]
_view/by_blog_url:
----------------------------------
key values
http://myblog.com [23, 45]
http://mysite.com/ss [2, 123, 345]
_view/by_telephone:
----------------------------------
key values
232-932-9088 [2, 123]
000-111-9999 [45, 1234]
999-999-0000 [1]
My questions:
How can I merge the results from the 3 different views into a final user table/view which contains no duplicates?
Or whether it is a good practice to do such deduplication in couchdb?
Or what would be a good way to do a deduplication in couch then?
ps. in the finial view, suppose for all dupes, we only keep the smallest userId.
Thanks.
Good question. Perhaps you could listen to _changes and search for the fields you want to be unique for the real user in the views you suggested (by_*).
Merge the views into one (emit different fields in one map):
function (doc) {
if (!doc.email || !doc.personal_blog_url || !doc.telephone) return;
emit([1, doc.email], [doc._id]);
emit([2, doc.personal_blog_url], [doc._id]);
emit([3, doc.telephone], [doc._id]);
}
Merge the lists of id's in reduce
When new doc in changes feed arrives, you can query the view with keys=[[1, email], [2, personal_blog_url], ...] and merge the three lists. If its minimal id is smaller then the changed doc, update the field realId, otherwise update the documents in the list with the changed id.
I suggest using different document to store { userId, realId } relation.
You can't create new documents by just using a view. You'd need a task of some sort to do the actual merging.
Here's one idea.
Instead of creating 3 views, you could create one view (that indexes the data if it exists):
Key Values
--- ------
[userId, 'phone'] 777-555-1212
[userId, 'email'] username#example.com
[userId, 'url'] favorite.url.example.com
I wouldn't store anything else except the raw value, as you'd end up with lots of unnecessary duplication of data (if you stored the full object for example).
Then, to query, you could do something like:
...startkey=[userId]&endkey=[userId,{}]
That would give you all of the duplicate information as a series of docs for that user Id. You'd still need to parse it apart to see if there were duplicates. But, this way, the results would be nicely merged into a single CouchDB call.
Here's a nice example of using arrays as keys on StackOverflow.
You'd still probably load the original "user" document if it had other data that wasn't part of the de-duplication process.
Once discovered, you could consider cleaning up the data on the fly and prevent new duplicates from occurring as new data is entered into your application.