RavenDB map reduce, duplicate entries in reduce - mapreduce

I have created a map function in Raven that looks like this
from order in docs.WebOrderModels
from orderLine in order.OrderLines
where order.OrderStatus.OrderStatusId == 3
select new{
orderLine.Sku,
orderLine.Quantity
}
together with the following reduce
from result in results
group result by new {result.Sku, result.Quantity} into g
select new{
Sku = g.Key.Sku,
Quantity = g.Sum(x => x.Quantity)
}
Running this mostly work, except that I get dupliacate entries for the Sku, See the image:
The same Sku number appears two times.
When I look through the data there does not seem to be any difference other than the quantities per order object.
I have tried to make two new order objects to see if happens when two order objects contains orderlines for the same sku number. But they are added together as I would expect.
I can't find any reason why the two entries are not reduced to one entry.

You are grouping the result with:
group result by new {result.Sku, result.Quantity} into g
which will give you result entries per different (Sku & Quantity) pairs
Use
group result by result.Sku
See:
https://demo.ravendb.net/demos/csharp/static-indexes/map-reduce-index#step-4

Related

Django Dynamically Calculate Average

I've multiple fields in my model, and I need to remove the average of only the columns user inputs
Could be
How can I do it dynamically?
I know I can do
mean = results.aggregate(Avg("student_score"))
This is one, I want to add multiple Avg statements dynamically
I tried making a loop as well to get all names and add all fields given by user one by one
eg - Avg('students'), Avg('playtime'), Avg('grade'), Avg('sales')
But I get
QuerySet.aggregate() received non-expression(s): <class 'django.db.models.aggregates.Avg'>('students'), <class 'django.db.models.aggregates.Avg'>('sales').
I've even tried raw query, but it needs a unique ID because of which that isn't working
Any workaround ideas?
I am using MySQL DB
Aggregate return single result from the list of objects. You need to annotate if you need multiple result like following,
YourModel.objects.values("YOUR GROUP BY VALUES HERE").annotate(Avg('students'), Avg('playtime'), Avg('grade'), Avg('sales')

Django - how to filter a queryset on multiple reverse lookup matches

I have two models: Order and OrderStatus.
Don't worry about Order, but OrderStatus has the following fields:
order = models.ForiegnKey(Order)
status = models.CharField (choice that can be either ORDERED, IN_TRANSIT, or RECEIVED)
OrderStatuses are created when the Order changes status, so initially there's just an ORDERED status, then later an ORDERED and IN_TRANSIT status, then later an ORDERED, IN_TRANSIT, and RECEIVED status all exist as foriegn keys to one Order. This is to keep track of timings, etc.
I want to find all Orders which have all three statuses. In other words, all orders that have been received and are valid because they have the other two statuses.
This is returning an empty set:
Order.objects.filter(Q(orderstatus__status=OrderStatus.ORDERED) &
Q(orderstatus__status=OrderStatus.IN_TRANSIT) &
Q(orderstatus__status=OrderStatus.RECEIVED))):
... but this is working fine:
Order.objects.filter(orderstatus__status=OrderStatus.ORDERED)
.filter(orderstatus__status=OrderStatus.IN_TRANSIT)
.filter(orderstatus__status=OrderStatus.RECEIVED)
What's the difference here? Is there any way to simplify? I thought this was what Q objects are for.
This means a query where all the fields are required
Order.objects.filter(Q(orderstatus__status=OrderStatus.ORDERED) &
Q(orderstatus__status=OrderStatus.IN_TRANSIT) &
Q(orderstatus__status=OrderStatus.RECEIVED))):
This means that the third filter is applying on the result of second filter and the second filter is applying on the result of first filter
Order.objects.filter(orderstatus__status=OrderStatus.ORDERED)
.filter(orderstatus__status=OrderStatus.IN_TRANSIT)
.filter(orderstatus__status=OrderStatus.RECEIVED)
If you want to do something where you want to get the Order objects if their status is ORDERED, RECEIVED OR IN_TRANSIT you can also do something like this
Order.objects.filter(orderstatus__status__in=[OrderStatus.ORDERED, OrderStatus.IN_TRANSIT, OrderStatus.RECEIVED])

How do I use django's Q with django taggit?

I have a Result object that is tagged with "one" and "two". When I try to query for objects tagged "one" and "two", I get nothing back:
q = Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
print len(q)
# prints zero, was expecting 1
Why does it not work with Q? How can I make it work?
The way django-taggit implements tagging is essentially through a ManytoMany relationship. In such cases there is a separate table in the database that holds these relations. It is usually called a "through" or intermediate model as it connects the two models. In the case of django-taggit this is called TaggedItem. So you have the Result model which is your model and you have two models Tag and TaggedItem provided by django-taggit.
When you make a query such as Result.objects.filter(Q(tags__name="one")) it translates to looking up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has the name="one".
Trying to match for two tag names would translate to looking up up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has both name="one" AND name="two". You obviously never have that as you only have one value in a row, it's either "one" or "two".
These details are hidden away from you in the django-taggit implementation, but this is what happens whenever you have a ManytoMany relationship between objects.
To resolve this you can:
Option 1
Query tag after tag evaluating the results each time, as it is suggested in the answers from others. This might be okay for two tags, but will not be good when you need to look for objects that have 10 tags set on them. Here would be one way to do this that would result in two queries and get you the result:
# get the IDs of the Result objects tagged with "one"
query_1 = Result.objects.filter(tags__name="one").values('id')
# use this in a second query to filter the ID and look for the second tag.
results = Result.objects.filter(pk__in=query_1, tags__name="two")
You could achieve this with a single query so you only have one trip from the app to the database, which would look like this:
# create django subquery - this is not evaluated, but used to construct the final query
subquery = Result.objects.filter(pk=OuterRef('pk'), tags__name="one").values('id')
# perform a combined query using a subquery against the database
results = Result.objects.filter(Exists(subquery), tags__name="two")
This would only make one trip to the database. (Note: filtering on sub-queries requires django 3.0).
But you are still limited to two tags. If you need to check for 10 tags or more, the above is not really workable...
Option 2
Query the relationship table instead directly and aggregate the results in a way that give you the object IDs.
# django-taggit uses Content Types so we need to pick up the content type from cache
result_content_type = ContentType.objects.get_for_model(Result)
tag_names = ["one", "two"]
tagged_results = (
TaggedItem.objects.filter(tag__name__in=tag_names, content_type=result_content_type)
.values('object_id')
.annotate(occurence=Count('object_id'))
.filter(occurence=len(tag_names))
.values_list('object_id', flat=True)
)
TaggedItem is the hidden table in the django-taggit implementation that contains the relationships. The above will query that table and aggregate all the rows that refer either to the "one" or "two" tags, group the results by the ID of the objects and then pick those where the object ID had the number of tags you are looking for.
This is a single query and at the end gets you the IDs of all the objects that have been tagged with both tags. It is also the exact same query regardless if you need 2 tags or 200.
Please review this and let me know if anything needs clarification.
first of all, this three are same:
Result.objects.filter(tags__name="one", tags__name="two")
Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
Result.objects.filter(tags__name_in=["one"]).filter(tags__name_in=["two"])
i think the name field is CharField and no record could be equal to "one" and "two" at same time.
in python code the query looks like this(always false, and why you are geting no result):
from random import choice
name = choice(["abtin", "shino"])
if name == "abtin" and name == "shino":
we use Q object for implement OR or complex queries
Into the example that works you do an end on two python objects (query sets). That gets applied to any record not necessarily to the same record that has one AND two as tag.
ps: Why do you use the in filter ?
q = Result.objects.filter(tags_name_in=["one"]).filter(tags_name_in=["two"])
add .distinct() to remove duplicates if expecting more than one unique object

Search a column for multiple words using Django queryset

I have an autocomplete box which needs to return results having input words. However the input words can be partial and located in different order or places.
Example:
The values in database column (MySQL)-
Expected quarterly sales
Sales preceding quarter
Profit preceding quarter
Sales 12 months
Now if user types quarter sales then it should return both of the first two results.
I tried:
column__icontains = term #searches only '%quarter sales% and thus gives no results
column__search = term #searches any of complete words and also returns last result
**{'ratio_name__icontains':each_term for each_term in term.split()} #this searches only sales as it is the last keyword argument
Any trick via regex or may be something I am missing inbuilt in Django since this is a common pattern?
Search engines are better for this task, but you can still do it with basic code.
If you're looking for strings containing "A" and "B", you can
Model.objects.filter(string__contains='A').filter(string__contains='B')
or
Model.objects.filter(Q(string__contains='A') & Q(string__contains='B'))
But really, you'd be better going with a simple full text search engine with little configuration, like Haystack/Whoosh
The above answers using chained .filter() require entries to match ALL the filters.
For those wanting "inclusive or" or ANY type behaviour, you can use functools.reduce to chain together django's Q operator for a list of search terms:
from functools import reduce
from django.db.models import Q
list_of_search_terms = ["quarter", "sales"]
query = reduce(
lambda a, b: a | b,
(Q(column__icontains=term) for term in list_of_search_terms),
)
YourModel.objects.filter(query)

Django - QuerySet filter - combing 2 conditions

I have a model(Delivery) with 2 fields called name and to_date. I just need to a object with the specific name and it's maximum to_date.
Delivery.objects.filter(name__exact = 'name1').aggregate(Max('valid_to'))
The above query will return the maximum date. Is it possible to fetch the complete object?
To get a single object ordered by valid_to:
obj = Delivery.objects.filter(name='name1', to_date=my_date).order_by('-valid_to')[0]
Try this:
maximum_to_date = Delivery.objects.filter(name__exact='name1').aggregate(maximum to_date=Max('valid_to'))
result = Delivery.objects.filter(valid_to=maximum_to_date)
Note that you need filter() in the second line, because two or more Deliveries might have the same valid_to value. In such case you can either accept them all, or e.g. take the one with the smallest ID, depending on what you need.