My question is more mathematical. there is a post in the site. User can like and dislike it. And below the post is written for example -5 dislikes and +23 likes. On the base of these values I want to make a rating with range 0-10 or (-10-0 and 0-10). How to make it correctly?
This may not answer your question as you need a rating between [-10,10] but this blog post describes the best way to give scores to items where there are positive and negative ratings (in your case, likes and dislikes).
A simple method like
(Positive ratings) - (Negative ratings), or
(Positive ratings) / (Total ratings)
will not give optimal results.
Instead he uses a method called Binomial proportion confidence interval.
The relevant part of the blog post is copied below:
CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter
Say what: We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the "real" fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:
(source: evanmiller.org)
(Use minus where it says plus/minus to calculate the lower bound.) Here p is the observed fraction of positive ratings, zα/2 is the (1-α/2) quantile of the standard normal distribution, and n is the total number of ratings.
Here it is, implemented in Ruby, again from the blog post.
require 'statistics2'
def ci_lower_bound(pos, n, confidence)
if n == 0
return 0
end
z = Statistics2.pnormaldist(1-(1-confidence)/2)
phat = 1.0*pos/n
(phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end
This is extension to Shepherd's answer.
total_votes = num_likes + num_dislikes;
rating = round(10*num_likes/total_votes);
It depends on number of visitors to your app. Lets say if you expect about 100 users rate your app. When a first user click dislike, we will rate it as 0 based on above approach. But this is not logically right.. since our sample is very small to make it a zero. Same with only one positive - our app gets 10 rating.
A better thing would be to add a constant value to numerator and denominator. Lets say if our app has 100 visitors, its safe to assume that until we get 10 ups/downs, we should not go to extremes(neither 0 nor 10 rating). SO just add 5 to each likes and dislikes.
num_likes = num_likes + 5;
num_dislikes = num_dislikes + 5;
total_votes = num_likes + num_dislikes;
rating = round(10*(num_likes)/(total_votes));
It sounds like what you want is basically a percentage liked/disliked. I would do 0 to 10, rather than -10 to 10, because that could be confusing. So on a 0 to 10 scale, 0 would be "all dislikes" and 10 would be "all liked"
total_votes = num_likes + num_dislikes;
rating = round(10*num_likes/total_votes);
And that's basically it.
Related
I have that kind of entries :
id user number
1 Peter 1
2 Jack 3
3 Kate 2
4 Carla 3
The name of my table is User so I would like to get only the user with the highest number but in some cases I don't know this number.
I thought to do something like that :
max_users = User.objects.filter(number=3)
But the problem is in that case I suppose I know that the highest number is 3 whereas it is not always the case. Could you help me please ?
Thank you very much !
Try the following snippet:
from django.db.models import Max
max_number = User.objects.aggregate(Max('number'))['number__max'] # Returns the highest number.
max_users = User.objects.filter(number=max_number) # Filter all users by this number.
I’m trying to solve a linear programing model and need some help. I’m not a programming expert, but I conceptually can draw up the problem and am hoping for some help implementing it.
I’m looking into an asset allocation problem for an investment portfolio from a theoretical perspective, but for simplicity of this post I’m going to use generic terms.
I have a list of 500+ choices that all have an assigned cost and value add. My goal is to maximize the sum of the value add, given a constraint on how much I can spend. These 500 choices are divided into 5 categories and there are restrictions on how many choices I can have from each category.
Category 1 = 1
Category 2 = 1
Category 3 = 2 or 3
Category 4 = 1 or 2
Category 5 = 2
Category 3 + Category 4 = 4
I figure I’ll need to use a binary X variable attached to each choice and 1 means I’m picking that choice and 0 doesn’t so in the end there should be 8 variables that have 1 and the rest have a 0 value that leads to the maximum value add given the constraints on cost each choice has.
I ultimately hope to be able to run and say for example “what is the nth highest value” so instead of getting the maximum value add I can get the second highest value add and so on.
Is this possible and what software/language would be best to do it? Thanks for your help!
Just to simplify writing everything down, let's assume you had 15 assets, with value added v_1, v_2, ..., v_15 and costs c_1, c_2, ..., c_15. Let's assume assets 1, 2, and 3 are in category 1, assets 4, 5, and 6 are in category 2, assets 7, 8, and 9 are in category 3, assets 10, 11, and 12 are in category 4, and assets 13, 14, and 15 are in category 5. Finally, let's assume a budget B.
We would create binary variables x_1, x_2, ..., x_15 to indicate whether we bought each asset. Now, the objective function of our integer program is:
max v_1*x_1 + v_2*x_2 + ... + v_15*x_15
Our budget constraint is:
c_1*x_1 + c_2*x_2 + ... + c_15*x_15 <= B
Exactly one choice from category 1:
x_1 + x_2 + x_3 = 1
Exactly one choice from category 2:
x_4 + x_5 + x_6 = 1
Either 2 or 3 choices from category 3:
x_7 + x_8 + x_9 >= 2
x_7 + x_8 + x_9 <= 3
Either 1 or 2 choices from category 4:
x_10 + x_11 + x_12 >= 1
x_10 + x_11 + x_12 <= 2
Exactly 2 choices from category 5:
x_13 + x_14 + x_15 = 2
Exactly 4 choices from categories 3 and 4 combined:
x_7 + x_8 + x_9 + x_10 + x_11 + x_12 = 4
Finally, you would specify all variables to be binary.
Note that the only adjustment you would need to your problem is to change the variables in each of these constraints to be the variables associated with each of your five categories.
All that remains would be to implement the model. There are a myriad of linear programming packages in all major languages; check out this survey for details. Since Stack Overflow is not a software recommendation site and you haven't really given any details about your situation (e.g. free vs. non-free solvers or the programming language you're using), I will refrain from suggesting a particular package.
I have the following snippet:
print '\nfitting'
rfr = RandomForestRegressor(
n_estimators=10,
max_features='auto',
criterion='mse',
max_depth=None,
)
rfr.fit(X_train, y_train)
# scores
scores = cross_val_score(
estimator=rfr,
X=X_test,
y=y_test,
verbose=1,
cv=10,
n_jobs=4,
)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
1) Does running the cross_val_score do more training on the regressor?
2) Do I need to pass in a trained regressor or just a new one, e.g. estimator=RandomForestRegressor(). How then do I test the accuracy of a regressor, i.e. must I use another function in scikit?
3) My accuracy is about 2%. Is that the MSE score, where lower is better or is it the actual accuracy. If it is the actual accuracy, can you explain it, because it doesn't make sense how a regressor will accurately predict on a range.
It re-trains the estimator, k times in fact.
Untrained (or trained, but then the model is deleted and you're just wasting time).
It's the R² score, so that's not actually 2% but .02; R² is capped at 1 but can be negative. Accuracy is not well-defined for regression. (You can define it as for classification, but that makes no sense.)
Problem: I have a large number of scanned documents that are linked to the wrong records in a database. Each image has the correct ID on it somewhere that says where it belongs in the db.
I.E. A DB row could be:
| user_id | img_id | img_loc |
| 1 | 1 | /img.jpg|
img.jpg would have the user_id (1) on the image somewhere.
Method/Solution: Loop through the database. Pull the image text in to a variable with OCR and check if user_id is found anywhere in the variable. If not, flag the record/image in a log, if so do nothing and move on.
My example is simple, in the real world I have a guarantee that user_id wouldn't accidentally show up on the wrong form (it is of a specific format that has its own significance)
Right now it is working. However, it is incredibly strict. If you've worked with OCR you understand how fickle it can be. Sometimes a 7 = 1 or a 9 = 7, etc. The result is a large number of false positives. Especially among images with low quality scans.
I've addressed some of the image quality issues with some processing on my side - increase image size, adjust the black/white threshold and had satisfying results. I'd like to add the ability for the prog to recognize, for example, that "81*7*23103" is not very far from "81*9*23103"
The only way I know how to do that is to check for strings >= to the length of what I'm looking for. Calculate the distance between each character, calc an average and give it a limit on what is a good average.
Some examples:
Ex 1
81723103 - Looking for this
81923103 - Found this
--------
00200000 - distances between characters
0 + 0 + 2 + 0 + 0 + 0 + 0 + 0 = 2
2/8 = .25 (pretty good match. 0 = perfect)
Ex 2
81723103 - Looking
81158988 - Found
--------
00635885 - distances
0 + 0 + 6 + 3 + 5 + 8 + 8 + 5 = 35
35/8 = 4.375 (Not a very good match. 9 = worst)
This way I can tell it "Flag the bottom 30% only" and dump anything with an average distance > 6.
I figure I'm reinventing the wheel and wanted to share this for feedback. I see a huge increase in run time and a performance hit doing all these string operations over what I'm currently doing.
I hope my Q will make sense but I am really out of ideas.
I will explain by example: http://www.xtdeco.ro/fototapet/texturat/Bloom-R12241-6
There is a standard product, with some attributes.
What I need to do is make this product configurable, as you may notice the two text inputs. The plan is to calculate the Lățime*Înălțime, multiply by price / sqm, verify against the actual product price (no problem this far) and then add or subtract a value to the product price, or add a option to the product for the current cart that would do the same.
Is there anyone with a idea of how this could be done without hacking to much of the sources?
Thank you.
The easiest way is to not let the user to input his concrete dimensions but to let him choose from prepared one.
If this is a wallpaper and You know the role of this wallpaper is always only 1m wide (just for simplicity) then for example sell only this 1m2 and let the user to enter the amount of pieces which will result in that long piece cut from the role (so 8 piecese (m2) ordered result in 8m long piece of role that is 1m wide). In this case You may change the word pieces or quantity for m2.
If this is a wall print with concrete dimensions (or aspect ratio), let the user choose from some predefined sizes, e.g.
XS (120 x 170 cm) + $0
S (150 x 212.5 cm) + $10
M (200 x 283 cm) + $20
L (250 x 354 cm) + $35
XL (300 x 425 cm) + $50
This may be handled by the product options which is again easier than what are You requesting... Don't You think?
EDIT based on comment:
Then there is only one possibility that comes to my mind:
hide the quantity field (don't remove, make it hidden)
create some JS onChange event handling function, that will listen to onChange, onBlur, onKeyUp (whetever) events on both text fields (for dimensions) and this will calculate the resulting area size in m2 which will be shown to the costumer as well as price per that m2 while price per 1m2 is also known and displayed to the customer
this function will also fill the calculated float value into the hidden quantity field so after adding to the cart the cart should contain smth like
4.73m2 WallPrint1 $18.92
(because I was calculating the price $4 for 1m2, thus 4m2 x 4.73 $/m2 = $18.92)
I'm not sure but maybe You will have to edit some other pieces of code to allow You to add float quantity values into cart annd also to order them...