Mongoid 4 .or not working in complex query - ruby-on-rails-4

I know Mongoid 4 is still in beta and maybe I've found a bug, but I'm having a hard time understanding why the first query works and the second one returns nothing:
Product.or({sender_uid: params[:user_id]}, {receiver_uid: params[:user_id]})
Product.where({sender_uid: params[:user_id]}).or({receiver_uid: params[:user_id]})
It sort of making it hard to compose any complex queries, so any pointers would be appreciated.

See the following example:
Product 1: sender_uid = 1, receiver_uid = 2
Product 2: sender_uid = 2, receiver_uid = 1
Product 3: sender_uid = 1, receiver_uid = 2
params[:user_id] = 1
In the first query what you are getting is ALL the products where the sender_uid OR the receiver_uid is equal to 1. That is Product 1, 2 and 3.
In the second query you are querying all products where the sender_uid is 1. That is Product 1 and Product 3 and then (on that criteria), the products with receiver_id = 1. Neither the Product 1, not the Product 2 have a receiver with uid 1. So, that's why you're getting nothing. What you are doing in the second query is something like:
Product.where(sender_uid: params[:user_id]).where(receiver_uid: params[:user_id])
UPDATE:
Answering to a comment:
Product.or({ product_id: 1 }, { product_id: 2, sender_uid: 2 })
As you can see, the or method receive to Hashes of conditions. Each one is like a where query.

Related

How can I do to filter on max values using Django ORM?

I have that kind of entries :
id user number
1 Peter 1
2 Jack 3
3 Kate 2
4 Carla 3
The name of my table is User so I would like to get only the user with the highest number but in some cases I don't know this number.
I thought to do something like that :
max_users = User.objects.filter(number=3)
But the problem is in that case I suppose I know that the highest number is 3 whereas it is not always the case. Could you help me please ?
Thank you very much !
Try the following snippet:
from django.db.models import Max
max_number = User.objects.aggregate(Max('number'))['number__max'] # Returns the highest number.
max_users = User.objects.filter(number=max_number) # Filter all users by this number.

Weird behavior in Django queryset union of values

I want to join the sum of related values from users with the users that do not have those values.
Here's a simplified version of my model structure:
class Answer(models.Model):
person = models.ForeignKey(Person)
points = models.PositiveIntegerField(default=100)
correct = models.BooleanField(default=False)
class Person(models.Model):
# irrelevant model fields
Sample dataset:
Person | Answer.Points
------ | ------
3 | 50
3 | 100
2 | 100
2 | 90
Person 4 has no answers and therefore, points
With the query below, I can achieve the sum of points for each person:
people_with_points = Person.objects.\
filter(answer__correct=True).\
annotate(points=Sum('answer__points')).\
values('pk', 'points')
<QuerySet [{'pk': 2, 'points': 190}, {'pk': 3, 'points': 150}]>
But, since some people might not have any related Answer entries, they will have 0 points and with the query below I use Coalesce to "fake" their points, like so:
people_without_points = Person.objects.\
exclude(pk__in=people_with_points.values_list('pk')).\
annotate(points=Coalesce(Sum('answer__points'), 0)).\
values('pk', 'points')
<QuerySet [{'pk': 4, 'points': 0}]>
Both of these work as intended but I want to have them in the same queryset so I use the union operator | to join them:
everyone = people_with_points | people_without_points
Now, for the problem:
After this, the people without points have their points value turned into None instead of 0.
<QuerySet [{'pk': 2, 'points': 190}, {'pk': 3, 'points': 150}, {'pk': 4, 'points': None}]>
Anyone has any idea of why this happens?
Thanks!
I should mention that I can fix that by annotating the queryset again and coalescing the null values to 0, like this:
everyone.\
annotate(real_points=Concat(Coalesce(F('points'), 0), Value(''))).\
values('pk', 'real_points')
<QuerySet [{'pk': 2, 'real_points': 190}, {'pk': 3, 'real_points': 150}, {'pk': 4, 'real_points': 0}]>
But I wish to understand why the union does not work as I expected in my original question.
EDIT:
I think I got it. A friend instructed me to use django-debug-toolbar to check my SQL queries to investigate further on this situation and I found out the following:
Since it's a union of two queries, the second query annotation is somehow not considered and the COALESCE to 0 is not used. By moving that to the first query it is propagated to the second query and I could achieve the expected result.
Basically, I changed the following:
# Moved the "Coalesce" to the initial query
people_with_points = Person.objects.\
filter(answer__correct=True).\
annotate(points=Coalesce(Sum('answer__points'), 0)).\
values('pk', 'points')
# Second query does not have it anymore
people_without_points = Person.objects.\
exclude(pk__in=people_with_points.values_list('pk')).\
values('pk', 'points')
# We will have the values with 0 here!
everyone = people_with_points | people_without_points

Filtering on annotations with max date in Django

I have 3 models in Django-project:
class Hardware(models.Model):
inventory_number = models.IntegerField(unique=True,)
class Subdivision(models.Model):
name = models.CharField(max_length=50,)
class Relocation(models.Model):
hardware = models.ForeignKey('Hardware',)
subdivision = models.ForeignKey('Subdivision',)
relocation_date = models.DateField(verbose_name='Relocation Date', default=date.today())
Table 'Hardware_Relocation' with values for example:
id hardware subdivision relocation_date
1 1 1 01.01.2009
2 1 2 01.01.2010
3 1 1 01.01.2011
4 1 3 01.01.2012
5 1 3 01.01.2013
6 1 3 01.01.2014
7 1 3 01.01.2015 # Now hardware 1 located in subdivision 3 because relocation_date is max
I would like to write a filter to find hardwares in subdivisions on today.
I'm trying to write a filter:
subdivision = Subdivision.objects.get(pk=1)
hardware_list = Hardware.objects.annotate(relocation__relocation_date=Max('relocation__relocation_date')).filter(relocation__subdivision = subdivision)
Now hardware_list contains hardware 1, but it is wrong (because now hardware 1 in subdivision 3).
hardware_list must be None in this example.
The following code works wrong (hardware_list contains hardware 1, for subdivision 1).
limit_date = datetime.datetime.now()
q1 = Hardware.objects.filter(relocation__subdivision=subdivision, relocation__relocation_date__lte=limit_date)
q2 = q1.exclude(~Q(relocation__relocation_date__gt=F('relocation__relocation_date')), ~Q(relocation__subdivision=subdivision))
hardware_list = q2.distinct()
Maybe better use SQL?
This might work...
from django.db.models import F, Q
Hardware.objects
.filter(relocation__subdivision=target_subdivision, relocation__relocation_date__lte=limit_date)
.exclude(~Q(relocation__subdivision=target_subdivision), relocation__relocation_date__gt=F('relocation__relocation_date'))
.distinct()
The idea is, give me all hardware that have been relocated to target division before limit date, which DON'T have been relocated to other divisions after that.

cloudant index: count number of unique users per time period

A very similar post was made about this issue here. In cloudant, I have a document structure storing when users access an application, that looks like the following:
{"username":"one","timestamp":"2015-10-07T15:04:46Z"}---| same day
{"username":"one","timestamp":"2015-10-07T19:22:00Z"}---^
{"username":"one","timestamp":"2015-10-25T04:22:00Z"}
{"username":"two","timestamp":"2015-10-07T19:22:00Z"}
What I want to know is to count the # of unique users for a given time period. Ex:
2015-10-07 = {"count": 2} two different users accessed on 2015-10-07
2015-10-25 = {"count": 1} one different user accessed on 2015-10-25
2015 = {"count" 2} two different users accessed in 2015
This all just becomes tricky because for example on 2015-10-07, username: one has two records of when they accessed, but it should only return a count of 1 to the total of unique users.
I've tried:
function(doc) {
var time = new Date(Date.parse(doc['timestamp']));
emit([time.getUTCFullYear(),time.getUTCMonth(),time.getUTCDay(),doc.username], 1);
}
This suffers from several issues, which are highlighted by Jesus Alva who commented in the post I linked to above.
Thanks!
There's probably a better way of doing this, but off the top of my head ...
You could try emitting an index for each level of granularity:
function(doc) {
var time = new Date(Date.parse(doc['timestamp']));
var year = time.getUTCFullYear();
var month = time.getUTCMonth()+1;
var day = time.getUTCDate();
// day granularity
emit([year,month,day,doc.username], null);
// year granularity
emit([year,doc.username], null);
}
// reduce function - `_count`
Day query (2015-10-07):
inclusive_end=true&
start_key=[2015, 10, 7, "\u0000"]&
end_key=[2015, 10, 7, "\uefff"]&
reduce=true&
group=true
Day query result - your application code would count the number of rows:
{"rows":[
{"key":[2015,10,7,"one"],"value":2},
{"key":[2015,10,7,"two"],"value":1}
]}
Year query:
inclusive_end=true&
start_key=[2015, "\u0000"]&
end_key=[2015, "\uefff"]&
reduce=true&
group=true
Query result - your application code would count the number of rows:
{"rows":[
{"key":[2015,"one"],"value":3},
{"key":[2015,"two"],"value":1}
]}

Linear Programming: How to implement with multiple constraints?

I’m trying to solve a linear programing model and need some help. I’m not a programming expert, but I conceptually can draw up the problem and am hoping for some help implementing it.
I’m looking into an asset allocation problem for an investment portfolio from a theoretical perspective, but for simplicity of this post I’m going to use generic terms.
I have a list of 500+ choices that all have an assigned cost and value add. My goal is to maximize the sum of the value add, given a constraint on how much I can spend. These 500 choices are divided into 5 categories and there are restrictions on how many choices I can have from each category.
Category 1 = 1
Category 2 = 1
Category 3 = 2 or 3
Category 4 = 1 or 2
Category 5 = 2
Category 3 + Category 4 = 4
I figure I’ll need to use a binary X variable attached to each choice and 1 means I’m picking that choice and 0 doesn’t so in the end there should be 8 variables that have 1 and the rest have a 0 value that leads to the maximum value add given the constraints on cost each choice has.
I ultimately hope to be able to run and say for example “what is the nth highest value” so instead of getting the maximum value add I can get the second highest value add and so on.
Is this possible and what software/language would be best to do it? Thanks for your help!
Just to simplify writing everything down, let's assume you had 15 assets, with value added v_1, v_2, ..., v_15 and costs c_1, c_2, ..., c_15. Let's assume assets 1, 2, and 3 are in category 1, assets 4, 5, and 6 are in category 2, assets 7, 8, and 9 are in category 3, assets 10, 11, and 12 are in category 4, and assets 13, 14, and 15 are in category 5. Finally, let's assume a budget B.
We would create binary variables x_1, x_2, ..., x_15 to indicate whether we bought each asset. Now, the objective function of our integer program is:
max v_1*x_1 + v_2*x_2 + ... + v_15*x_15
Our budget constraint is:
c_1*x_1 + c_2*x_2 + ... + c_15*x_15 <= B
Exactly one choice from category 1:
x_1 + x_2 + x_3 = 1
Exactly one choice from category 2:
x_4 + x_5 + x_6 = 1
Either 2 or 3 choices from category 3:
x_7 + x_8 + x_9 >= 2
x_7 + x_8 + x_9 <= 3
Either 1 or 2 choices from category 4:
x_10 + x_11 + x_12 >= 1
x_10 + x_11 + x_12 <= 2
Exactly 2 choices from category 5:
x_13 + x_14 + x_15 = 2
Exactly 4 choices from categories 3 and 4 combined:
x_7 + x_8 + x_9 + x_10 + x_11 + x_12 = 4
Finally, you would specify all variables to be binary.
Note that the only adjustment you would need to your problem is to change the variables in each of these constraints to be the variables associated with each of your five categories.
All that remains would be to implement the model. There are a myriad of linear programming packages in all major languages; check out this survey for details. Since Stack Overflow is not a software recommendation site and you haven't really given any details about your situation (e.g. free vs. non-free solvers or the programming language you're using), I will refrain from suggesting a particular package.