Generate ranks across multiple variables in Stata - stata

I want to generate ranks of values from lowest to highest across multiple variables in Stata. In the table below, the columns 2–4 show observed data values for variables x, y, and z, and columns 5–7 show ranks—including tied ranks—across all three variables.
Notice that "across all three variables" means that, for example, the lowest rank = 1 is applied only to the smallest value out of all three variables (i.e. only to the value 0.2 for variable x).
id
x
y
z
rank(x)
rank(y)
rank(z)
1
1.2
2.6
2.0
5
12
10.5
2
0.2
2.0
0.9
1
10.5
3.5
3
0.6
1.5
1.7
2
6
7
4
1.8
0.9
1.9
8
3.5
9
I was hoping egen would provide a one-line kind of solution, but I think it only creates a single rank variable.
Is there a function or one-liner a la (an imagined) rankvars x y z that would accomplish this? Or would it require writing a program to do so?

Correct: egen creates one outcome variable at a time, and you need other code to do this. That is not a program; it could be a few lines in a do-file.
A better way would push the data into Mata and pull out the results.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float(x y z) byte rankx float(ranky rankz)
1 1.2 2.6 2 5 12 10.5
2 .2 2 .9 1 10.5 3.5
3 .6 1.5 1.7 2 6 7
4 1.8 .9 1.9 8 3.5 9
end
rename (x y z) (v=)
reshape long v, i(id) j(which) string
egen Rank = rank(v)
reshape wide Rank v, i(id) j(which) string
rename v* *
order id x y z rankx Rankx ranky Ranky rankz
list
+----------------------------------------------------------------------+
| id x y z rankx Rankx ranky Ranky rankz Rankz |
|----------------------------------------------------------------------|
1. | 1 1.2 2.6 2 5 5 12 12 10.5 10.5 |
2. | 2 .2 2 .9 1 1 10.5 10.5 3.5 3.5 |
3. | 3 .6 1.5 1.7 2 2 6 6 7 7 |
4. | 4 1.8 .9 1.9 8 8 3.5 3.5 9 9 |
+----------------------------------------------------------------------+

Related

scatter plot in sas does not show

I am new in SAS and I'm trying to do scatter plot to see X vs residual but when I run the code this error appears
ERROR: Procedure SQPLOT not found.
this is my code:
data EC
input x e;
datalines;
2 3.2
3 2.9
4 -1.7
5 -2.0
6 -2.3
7 -1.2
8 -0.9
9 0.8
10 0.7
11 0.5
;
run;
proc sqplot data = EC;
scatter x = x y=residual;
run;
could you help me where is the wrong?
There is no procedure name SQPLOT. You probably want to use SGPLOT.
data EC;
input x e;
datalines;
2 3.2
3 2.9
4 -1.7
5 -2.0
6 -2.3
7 -1.2
8 -0.9
9 0.8
10 0.7
11 0.5
;
run;
proc sgplot data=EC;
scatter x = x y=e;
run;
For the situations where your code tries to use a procedure which is not licensed (or installed) the log will show a similar ERROR: message.

How to retain calculated values between rows when calculating running totals?

I have a tricky question about conditional sum in SAS. Actually, it is very complicated for me and therefore, I cannot explain it by words. Therefore I want to show an example:
A B
5 3
7 2
8 6
6 4
9 5
8 2
3 1
4 3
As you can see, I have a datasheet that has two columns. First of all, I calculated the conditional cumulative sum of column A ( I can do it by myself-So no need help for that step):
A B CA
5 3 5
7 2 12
8 6 18
6 4 8 ((12+8)-18)+6
9 5 17
8 2 18
3 1 10 (((17+8)-18)+3
4 3 14
So my condition value is 18. If the cumulative more than 18, then it equal 18 and next value if sum of the first value after 18 and exceeds amount over 18. ( As I said I can do it by myself )
So the tricky part is I have to calculate the cumulative sum of column B according to column A:
A B CA CB
5 3 5 3
7 2 12 5
8 6 18 9.5 (5+(6*((18-12)/8)))
6 4 8 5.5 ((5+6)-9.5)+4
9 5 17 10.5 (5.5+5)
8 2 18 10.75 (10.5+(2*((18-7)/8)))
3 1 10 2.75 ((10.5+2)-10.75)+1
4 3 14 5.75 (2.75+3)
As you can see from example the cumulative sum of column B is very specific. When column CA is equal to our condition value (18), then we calculate the proportion of the last value for getting our condition value (18) and then use this proportion for computing cumulative sum of column B.
Looks like when the sum of A reaches 18 or more you want to split the values of A and B between the current and the next record. One way is to remember the left over values for A and B and carry them forward in your new cumulative variables. Just make sure to output the observation before resetting those variables.
data want ;
set have ;
ca+a;
cb+b;
if ca >= 18 then do;
extra_a=ca - 18;
extra_b=b - b*((a - extra_a)/a) ;
ca=18;
cb=cb-extra_b ;
end;
output;
if ca=18 then do;
ca=extra_a;
cb=extra_b;
end;
drop extra_a extra_b ;
run;

OpenOffice Calc: Finding max value

Let's say that i have an array:
|A B C D
---------------
1 |2 8 6 3
2 |1 2 5 2
Where first row stand for "Goals Scored" and the second row for "Goals Lost". Columns stands for games/matches
I want to find the maximum total number of goals scored and lost in one match. In case above it would be 11 (C1 + C2).
I don't want to use
I spent few days trying functions like: MAX, ADDRESS, CELL, SUBTOTAL, SUM, MMULT, TRANSPOSE, etc. and even combined but i didn't get satisfying result.
The MAX function have to work:
Returns the maximum of a list of arguments, ignoring text entries.
Example:
MAX(B1:B3)
where cells B1, B2, B3 contain 1.1, 2.2, and apple returns 2.2.
Select cell A3 and set the formula to =A1+A2. Then click the square in the lower-right corner of the selected cell and drag to D3. This gives:
|A B C D
---------------
1 |2 8 6 3
2 |1 2 5 2
3 |3 10 11 5
Now =MAX(A3:D3) produces 11.

Counting values to get a matrix in Stata

I have a variable age, 13 variables x1 to x13, and 802 observations in a Stata dataset. age has values ranging 1 to 9. x1 to x13 have values ranging 1 to 13.
I want to know how to count the number of 1 .. 13 in x1 to x13 according to different values of age. For example, for age 1, in x1 to x13, count the number of 1,2,3,4,...13.
I first change x1 to x13 as a matrix by using
mkmat x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13, matrix (a)
Then, I want to count using the following loop:
gen count = 0
quietly forval i = 1/802 {
quietly forval j = 1/13 {
replace count = count + inrange(a[r'i', x'j'], 0, 1), if age==1
}
}
I failed.
I am still somewhat uncertain as to what you like to achieve. But if I am understanding you correctly, here is one way to do it.
First, a simple data that has age ranging from one to three, and four variables x1-x4, each with values of integers ranging between 5 and 7.
clear
input age x1 x2 x3 x4
1 5 6 6 6
1 7 5 6 5
2 5 7 6 6
3 5 6 7 7
3 7 6 6 6
end
Then we create three count variables (n5, n6 and n7) that counts the number of 5s, 6s, and 7s for each subject across x1-x4.
forval i=5/7 {
egen n`i'=anycount(x1 x2 x3 x4),v(`i')
}
Below is how the data looks like now. To explain, the first "1" under n5 indicates that there is only one "5" for the subject across x1-x4.
+----------------------------------------+
| age x1 x2 x3 x4 n5 n6 n7 |
|----------------------------------------|
1. | 1 5 6 6 6 1 3 0 |
2. | 1 7 5 6 5 2 1 1 |
3. | 2 5 7 6 6 1 2 1 |
4. | 3 5 6 7 7 1 1 2 |
5. | 3 7 6 6 6 0 3 1 |
+----------------------------------------+
It sounds to me like your ultimate goal is to have sums calculated separately for each value in age. Assuming this is true, let's create a 3x3 matrix to store such results.
mat A=J(3,3,.) // age (1-3) and values (5-7)
mat rown A=age1 age2 age3
mat coln A=value5 value6 value7
forval i=5/7 {
forval j=1/3 {
qui su n`i' if age==`j'
loca k=`i'-4 // the first column for value5
mat A[`j',`k']=r(sum)
}
}
The matrix looks like this. To explain, the first "3" under value5 indicates that for all children of the age of 1, the value 5 appears a total of three times across x1-x4
A[3,3]
value5 value6 value7
age1 3 4 1
age2 1 2 1
age3 1 4 3
With Aspen's example, you could do this:
gen id = _n
reshape long x, i(id)
tab age x
Note that your sample code doesn't loop over different ages and there is an incorrect comma in the count command. I won't try to fix the code, as there are many more direct methods, one of which is above. tabulate has an option to save the table as a matrix.
Here is another solution closer to the original idea. Warning: code not tested.
matrix count = J(9, 13, 0)
forval i = 1/9 {
forval j = 1/13 {
forval J = 1/13 {
qui count if age == `i' & x`J' == `j'
matrix count[`i', `j'] = count[`i', `j'] + r(N)
}
}
}

Stata moving products

Using Stata I want a formula (line of code) that takes all of the previous entries for a given group G at a given cell and returns the product for all of the values at that cell and above. For example:
G X Y
1 1 1
1 2 2
1 6 12
1 3 36
2 2 2
2 4 8
3 2 2
4 2 2
4 11 22
4 7 154
G = Group ID, X = Value, Y = Moving Product
The way I have been doing this is pretty long and involves creating a good number of variables. There must be a way in Stata to just have it do a moving product by group ID (G).
Any insight is helpful
Here is the solution:
sort G
by G: gen moving_product = exp(sum(ln(X)))
This should make X = Y