how to normalize rating in scale of 1 to 5? - rating

In Yahoo! Movie dataset the rating scale is from 1 to 13. here, 1 represent good rating and 13 represent the lowest rating to the movie.
if there is 0 then it represents that user didn't rate that movie.
rating { 13 12 11 10 9 8 7 6 5 4 3 2 1 0} OR
rating { A+ A A- B+ B B- C+ C C- C+ D D- F 0}
eg. user m1 m2 m3
1 2 3 13
2 0 1 7
but I don't know how to normalize rating in the scale of 1 to 13 into a scale of 1 to 5.
simply I can do one thing i.e.
{A+,A,A-} = 5
{B+,B,B-} = 4
{C+,C,C-} = 3
{D+,D,D-} = 2
{F} = 1
is there any other method or by using any formula ?

If floating points are allowed, simply multiply with 5/13. Round to full numbers if necessary.
If 5 is the best, substract the result from 6 (handle 0 with an if clause)

Related

Subtracting values based on a index column and using a condition in the same column in DAX

I've a lot of material on Stack about this, but i'm still not able to reproduce it.
Sample data set.
Asset
Value
Index
A
10
1
B
15
1
C
20
1
A
11
2
B
17
2
C
24
2
A
18
3
B
25
3
C
30
3
What i want to do is, subtract the Asset values individually based on the index column.
Ex:
Asset A [1] -> 10
Asset A [2] -> 11
11 - 10 = 1
So the table would be like this.
Asset
Value
Index
Diff
A
10
1
0
B
15
1
0
C
20
1
0
A
11
2
1
B
17
2
2
C
24
2
4
A
18
3
7
B
25
3
8
C
30
3
6
This need's to be done using DAX.
Can you guys help me ?
Best Regards!
I just did this and it worked.
Diff =
var Assets = 'Table'[Asset]
var Ind = 'Table'[Index] - 1
Return
IF(Ind = -1, 0, 'Table'[Value] - CALCULATE(SUM('Table'[Value]),FILTER('Table','Table'[Asset] = Assets && 'Table'[Index] = Ind)))

Sorted list of random repeated numbers to sorted list of repeated and continuos numbers in google sheets

I think the best way to show the problem is with an example. Column A is what i have now, and column B is what I would want.
A
B
1
1
1
1
2
2
2
2
5
3
5
3
5
3
8
4
8
4
9
5
9
5
14
6
14
6
17
7
17
7
17
7
Update: Based on your comment, use this formula
=ArrayFormula(IF(ISNUMBER(A1:A), VLOOKUP(A1:A, {UNIQUE(A1:A), ArrayFormula(RANK(UNIQUE(A1:A), UNIQUE(A1:A), 1))}, 2, 0), ""))
Previous answer: Have you already used the SORT formula?
Try =SORT(A1:A, 1, 1) in cell B1
Assuming your data starts at row 2 through row 10 column A. In B2 :
=arrayformula(1/COUNTIF($A$2:$A$10,$A$2:$A$10))
in C2
=sumproduct(($B$1:$B1)*($A$1:$A1<A2))+1

How to retain calculated values between rows when calculating running totals?

I have a tricky question about conditional sum in SAS. Actually, it is very complicated for me and therefore, I cannot explain it by words. Therefore I want to show an example:
A B
5 3
7 2
8 6
6 4
9 5
8 2
3 1
4 3
As you can see, I have a datasheet that has two columns. First of all, I calculated the conditional cumulative sum of column A ( I can do it by myself-So no need help for that step):
A B CA
5 3 5
7 2 12
8 6 18
6 4 8 ((12+8)-18)+6
9 5 17
8 2 18
3 1 10 (((17+8)-18)+3
4 3 14
So my condition value is 18. If the cumulative more than 18, then it equal 18 and next value if sum of the first value after 18 and exceeds amount over 18. ( As I said I can do it by myself )
So the tricky part is I have to calculate the cumulative sum of column B according to column A:
A B CA CB
5 3 5 3
7 2 12 5
8 6 18 9.5 (5+(6*((18-12)/8)))
6 4 8 5.5 ((5+6)-9.5)+4
9 5 17 10.5 (5.5+5)
8 2 18 10.75 (10.5+(2*((18-7)/8)))
3 1 10 2.75 ((10.5+2)-10.75)+1
4 3 14 5.75 (2.75+3)
As you can see from example the cumulative sum of column B is very specific. When column CA is equal to our condition value (18), then we calculate the proportion of the last value for getting our condition value (18) and then use this proportion for computing cumulative sum of column B.
Looks like when the sum of A reaches 18 or more you want to split the values of A and B between the current and the next record. One way is to remember the left over values for A and B and carry them forward in your new cumulative variables. Just make sure to output the observation before resetting those variables.
data want ;
set have ;
ca+a;
cb+b;
if ca >= 18 then do;
extra_a=ca - 18;
extra_b=b - b*((a - extra_a)/a) ;
ca=18;
cb=cb-extra_b ;
end;
output;
if ca=18 then do;
ca=extra_a;
cb=extra_b;
end;
drop extra_a extra_b ;
run;

How to find maximum distance apart of values within a variable

I create a working example dataset:
input ///
group value
1 3
1 2
1 3
2 4
2 6
2 7
3 4
3 4
3 4
3 4
4 17
4 2
5 3
5 5
5 12
end
My goal is to figure out the maximum distance between incremental values within group. For group 2, this would be 2, because the next highest value after 4 is 6. Note that the only value relevant to 4 is 6, not 7, because 7 is not the next highest value after 4. The result for group 3 is 0 because there is only one value in group 3. There will only be one result per group.
What I want to get:
input ///
group value result
1 3 1
1 2 1
1 3 1
2 4 2
2 6 2
2 7 2
3 4 0
3 4 0
3 4 0
3 4 0
4 17 15
4 2 15
5 3 7
5 5 7
5 12 7
end
The order is not important, so the order just above can change with no problem.
Any tips?
I may have figured it out:
bys group (value): gen d = value[_n+1] - value[_n]
bys group: egen result = max(d)
drop d

Stata moving products

Using Stata I want a formula (line of code) that takes all of the previous entries for a given group G at a given cell and returns the product for all of the values at that cell and above. For example:
G X Y
1 1 1
1 2 2
1 6 12
1 3 36
2 2 2
2 4 8
3 2 2
4 2 2
4 11 22
4 7 154
G = Group ID, X = Value, Y = Moving Product
The way I have been doing this is pretty long and involves creating a good number of variables. There must be a way in Stata to just have it do a moving product by group ID (G).
Any insight is helpful
Here is the solution:
sort G
by G: gen moving_product = exp(sum(ln(X)))
This should make X = Y