Annotate with Count yields incorrect values with NullBooleanField - django

I am performing the following query with a few annotations:
(AwardIssueProcess.objects.filter(grant__client=client, completed=completed_status)
.annotate(total_awarded=Sum('award__awardissuedactivity__units_awarded'),
num_accepted=Count(Case(When(award__accepted=True, then=1))),
num_rejected=Count(Case(When(award__accepted=False, then=1))),
num_unaccepted=Count(Case(When(award__accepted=None, then=1)))))
However, the num_unaccepted yields incorrect values. Firstly, if I have awards the number is sometimes double what I expect. But if I remove
total_awarded=Sum('award__awardissuedactivity__units_awarded')
from the annotation, then the doubling problem goes away.
Secondly, the num_unaccepted has a value of 1 if there are no awards. But when there are awards, the value is correct (but not is all cases due to the doubling problem I mentioned previously). In this second issue, I suspect that this may be because it is evaluating the award to be None, but what I really want is for it to check if the accepted field is None. And then if the award doesn't exists, just don't count it. The accepted field is a NullBooleanField.
How should I be writing this differently to solve these two problems with num_unaccepted?
EDIT
I removed total_awarded=Sum('award__awardissuedactivity__units_awarded') from the annotation and created a separate function to get the total_awarded amount that I need. This addresses my first problem, but the second issue still remains.

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

In DynamoDB, how do I conditionally update a parameter if less than a value, but also execute if it does not yet exist?

I have a value I am trying to increment by 1 in a DynamoDB row. The first time this is executed, the parameter does not exist, so it would be set to one. However, I want it to fail if it gets up to a specific value so it can only be updated so many times.
I am using the correct Expected syntax, but I can't seem to figure out a way to allow this currently without a double update: One to update if it's not there, and one do it if it is there and below the threshold. Is this possible in a single update command?
Please note, this is not the same thing as simply wanting a value to be set to 1 or add up from there as suggested in the comments below, but instead, this single expression must handle the property being there or a number, and then continue to add up a certain threshold, and then fail and not allow an update once the threshold is met.
Ok, I believe that this is possible, but only if you switch the whole update command over to using expressions instead of using the "Expected" and "AttributeUpdates" field. This does make it much harder to use, but then you should be able to make compound conditions such as
params.ConditionExpression = 'attribute_not_exists(#A) OR #A < ' + maxValue.toString();

How to correctly use the Table.Repeat function?

I'm having some trouble getting the Table.Repeat function to work properly... I'm very new to PowerQuery/BI, so am just about getting my head wrapped around all the coding.
Following the syntax and it would appear everything is correct, given that the addition of columns is optional.
What I am aiming to achieve is to have the entire repeated a specific number of times, the repeat function described here sounds like it fits the bill. But when I have attempted to implement it, it results in an error.
I was previously using the Append Function, however, as I'm trying to append the query several thousand times, this results in the query crashing excel and has become uneditable after the initial setup.
I've tried implementing the Repeat code halfway through the query, where it's needed; and on a new sheet. Halfway through gave me an error stating: that it could not find a Value and that a table.
When I tired it on a new sheet, though I did not get an error, the applied sets disappeared and the data wasn't repeated. I tried the repeating tables, much lower I needed to test out, but this still went into error.
#"Repeat" = Table.Repeat(#"8-1", 2)
Essentially the entire table repeated X number of times.

Weka Resample to balance instances in binary dataset

I've only been using Weka for a couple of weeks but I am absolutely blown away by how great it is!
But I have a question, I have a dataset with a target column which is either True or False.
6709 instances in my dataset are True
25318 instances are False.
I want to randomly add duplicates of my True instances to produce a new dataset with 25318 True and 25318 False.
The only filter I can find which does this is the supervised Resample filter however I am having trouble understanding what parameters I should use.
(there might be a better filter to do what I want)
I've got some success with these parameters
biasToUniformClass = 1.0
invertSelection = False
noReplacement = False
randomSeed = 1
sampleSizePercent = 157.5 (a magic number I've arrived at by trial and error)
This produces 25277 True and 25165 False. Not exactly what I want, but quite close.
The problem is that I cant figure out how to arrive at the magic number. I'm also not getting exactly the numbers of instances that I really want.
Is there a better filter for this purpose?
If not, is there a way to calculate the sampleSizePercent magic number?
Any help is greatly appreciated :)
Supplemental question, am I best to run NominalToBinary on my boolean columns to ensure they are Binary? I'm using a NaiveBayes classifier (at the moment) and I don't have any missing instances.
Jason
I think the tricky part of this question is getting a perfect balance using the Resample Filter. This is because, as it is stated in the description, it 'Produces a random sub-sample of a dataset using either sampling with replacement or without replacement'. If these cases are being drawn randomly, there is no guarantee that you will get an equal measure between the two classes.
As for the magic number, this would be associated with the total number of cases that you would like to have when the filter is applied. In your case, it would be 50636 instead of 32027. In this case, your magic number would be 50636 / 32027 = 1.581. However, as stated above, you may not get an exact match of true and false cases.
If you really need an exact figure, you could use your favourite spreadsheet and preprocess the data. One possible method is to randomise the true cases (in a separate column), sort and copy all of the cases until the number matches the false one. It's not an automated solution, and the solution is outside of Weka, but I have used this method before and does the job reasonably quickly.
Hope this Helps!

Simple query working for years, then suddenly very slow

I've had a query that has been running fine for about 2 years. The database table has about 50 million rows, and is growing slowly. This last week one of my queries went from returning almost instantly to taking hours to run.
Rank.objects.filter(site=Site.objects.get(profile__client=client, profile__is_active=False)).latest('id')
I have narrowed the slow query down to the Rank model. It seems to have something to do with using the latest() method. If I just ask for a queryset, it returns an empty queryset right away.
#count returns 0 and is fast
Rank.objects.filter(site=Site.objects.get(profile__client=client, profile__is_active=False)).count() == 0
Rank.objects.filter(site=Site.objects.get(profile__client=client, profile__is_active=False)) == [] #also very fast
Here are the results of running EXPLAIN. http://explain.depesz.com/s/wPh
And EXPLAIN ANALYZE: http://explain.depesz.com/s/ggi
I tried vacuuming the table, no change. There is already an index on the "site" field (ForeignKey).
Strangely, if I run this same query for another client that already has Rank objects associated with her account, then the query returns very quickly once again. So it seems that this is only a problem when their are no Rank objects for that client.
Any ideas?
Versions:
Postgres 9.1,
Django 1.4 svn trunk rev 17047
Well, you've not shown the actual SQL, so that makes it difficult to be sure. But, the explain output suggests it thinks the quickest way to find a match is by scanning an index on "id" backwards until it finds the client in question.
Since you said it has been fast until recently, this is probably not a silly choice. However, there is always the chance that a particular client's record will be right at the far end of this search.
So - try two things first:
Run an analyze on the table in question, see if that gives the planner enough info.
If not, increase the stats (ALTER TABLE ... SET STATISTICS) on the columns in question and re-analyze. See if that does it.
http://www.postgresql.org/docs/9.1/static/planner-stats.html
If that's still not helping, then consider an index on (client,id), and drop the index on id (if not needed elsewhere). That should give you lightning fast answers.
latests is normally used for date comparison, maybe you should try to order by id desc and then limit to one.