Counting values to get a matrix in Stata - stata

I have a variable age, 13 variables x1 to x13, and 802 observations in a Stata dataset. age has values ranging 1 to 9. x1 to x13 have values ranging 1 to 13.
I want to know how to count the number of 1 .. 13 in x1 to x13 according to different values of age. For example, for age 1, in x1 to x13, count the number of 1,2,3,4,...13.
I first change x1 to x13 as a matrix by using
mkmat x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13, matrix (a)
Then, I want to count using the following loop:
gen count = 0
quietly forval i = 1/802 {
quietly forval j = 1/13 {
replace count = count + inrange(a[r'i', x'j'], 0, 1), if age==1
}
}
I failed.

I am still somewhat uncertain as to what you like to achieve. But if I am understanding you correctly, here is one way to do it.
First, a simple data that has age ranging from one to three, and four variables x1-x4, each with values of integers ranging between 5 and 7.
clear
input age x1 x2 x3 x4
1 5 6 6 6
1 7 5 6 5
2 5 7 6 6
3 5 6 7 7
3 7 6 6 6
end
Then we create three count variables (n5, n6 and n7) that counts the number of 5s, 6s, and 7s for each subject across x1-x4.
forval i=5/7 {
egen n`i'=anycount(x1 x2 x3 x4),v(`i')
}
Below is how the data looks like now. To explain, the first "1" under n5 indicates that there is only one "5" for the subject across x1-x4.
+----------------------------------------+
| age x1 x2 x3 x4 n5 n6 n7 |
|----------------------------------------|
1. | 1 5 6 6 6 1 3 0 |
2. | 1 7 5 6 5 2 1 1 |
3. | 2 5 7 6 6 1 2 1 |
4. | 3 5 6 7 7 1 1 2 |
5. | 3 7 6 6 6 0 3 1 |
+----------------------------------------+
It sounds to me like your ultimate goal is to have sums calculated separately for each value in age. Assuming this is true, let's create a 3x3 matrix to store such results.
mat A=J(3,3,.) // age (1-3) and values (5-7)
mat rown A=age1 age2 age3
mat coln A=value5 value6 value7
forval i=5/7 {
forval j=1/3 {
qui su n`i' if age==`j'
loca k=`i'-4 // the first column for value5
mat A[`j',`k']=r(sum)
}
}
The matrix looks like this. To explain, the first "3" under value5 indicates that for all children of the age of 1, the value 5 appears a total of three times across x1-x4
A[3,3]
value5 value6 value7
age1 3 4 1
age2 1 2 1
age3 1 4 3

With Aspen's example, you could do this:
gen id = _n
reshape long x, i(id)
tab age x
Note that your sample code doesn't loop over different ages and there is an incorrect comma in the count command. I won't try to fix the code, as there are many more direct methods, one of which is above. tabulate has an option to save the table as a matrix.
Here is another solution closer to the original idea. Warning: code not tested.
matrix count = J(9, 13, 0)
forval i = 1/9 {
forval j = 1/13 {
forval J = 1/13 {
qui count if age == `i' & x`J' == `j'
matrix count[`i', `j'] = count[`i', `j'] + r(N)
}
}
}

Related

comparison among input numbers

i want have comparison among input numbers .general shape of input numbers is like this :
n
x1 x2
x3 x4
x5 x6
x7 x8
n indicates how many input line we have and on each line we have 2 number with space ,just 2 number,
the first part of line is price and the second is quality
for example like this :
4
1 5
7 9
5 6
20 30
in line two '1' is price and '5' is quality
if i can find more quality with lower price i print 'yes' if not print 'no'
i write this but i can not change list to dictionary
x = int(input())
count =0
y= []
while count!=x:
y.append(input())
count+=1
z= []
for i in y:
z.append(i.split())
def dictionary(x):
d={x[0]:x[1]}
return d
for h in z:
a=dictionary(h)
for example i want
2
1 10
7 3
'yes'
or another example :
4
1 5
7 9
5 6
20 30
'no'
i hope someone answer this
n = int(input())
q,p = map(int, input().split())
for _ in range(n-1):
q1,p1 = map(int, input().split())
if q1 > q and p1 < p:
print("yes")
break
else:
print("no")
This works. The code is really simple.

SAS Function to calculate percentage for row for two stratifications

I have a dataset that looks like this
data test;
input id1$ id2$ score1 score2 score3 total;
datalines;
A D 9 36 6 51
A D 9 8 6 23
A E 5 3 2 10
B D 5 3 3 11
B E 7 4 7 18
B E 5 3 3 11
C D 8 7 9 24
C E 8 52 6 66
C D 4 5 3 12
;
run;
I want to add a column that calculates what percentage of the corresponding total is of the summation within id1 and id2.
What I mean is this; id1 has a value of A. Within the value of A, there are twoid2 values ; D and E. There are two values of D, and one of E. The two total values of D are 51 and 23, and they sum to 74. The one total value of E is 10, and it sums to 10. The column I'd like to create would hold the values of .68 (51/74), .31 (23/74), and 1 (10/10) in row 1 ,row 2, and row 3 respectively.
I need to perform this calculations for the rest of the id1 and their corresponding id2. So when complete, I want a table that would like like this:
id1 id2 score1 score2 score3 total percent_of_total
A D 9 36 6 51 0.689189189
A D 9 8 6 23 0.310810811
A E 5 3 2 10 1
B D 5 3 3 11 1
B E 7 4 7 18 0.620689655
B E 5 3 3 11 0.379310345
C D 8 7 9 24 0.666666667
C E 8 52 6 66 1
C D 4 5 3 12 0.333333333
I realize a loop might be able to solve the problem I've given, but I'm dealing with EIGHT levels of stratification, with as many as 98 sublevels within those levels. A loop is not practical. I'm thinking something along the lines of PROC SUMMARY but I'm not too familiar with the function.
Thank you.
It is easy to do with a data step. Make sure the records are sorted.
You can find the grand total for the ID1*ID2 combination and then use it to calculate the percentage.
proc sort data=test;
by id1 id2;
run;
data want ;
do until (last.id2);
set test ;
by id1 id2 ;
grand = sum(grand,total);
end;
do until (last.id2);
set test ;
by id1 id2 ;
precent_of_total = total/grand ;
output;
end;
run;

How to select the 5 minimum values with SAS Proc IML?

I would like to know if it's possible to select the 5 minimum or maximum values by rows with IML ?
This is my code :
Proc iml ;
use table;
read all var {&varlist} into matrix ;
n=nrow(matrix) ; /* n=369 here*/
p=ncol(matrix); /* p=38 here*/
test=J(n,5,.) ;
Do i=1 to n ;
test[i,1]=MIN(taux[i,]);
End;
Quit ;
So I would like to obtain a matrix test that contains for the 1rst column the maximal minimum value, then for the 2nd column the minimum value of my row EXCEPTING the 1rst value, etc...
If you have any idea ! :)
Event if it's not with IML (but with SAS : base, sql..)
So for example :
Data test; input x1-x10 ; cards;
1 9 8 7 3 4 2 6
9 3 2 1 4 7 12 -2
;run;
And I would like to obtain the results sorted by row:
1 2 3 4 6 7 8 9
-2 1 2 3 4 7 12
in order to select my 5 minimum values in another table :
y1 y2 y3 y4 y5
1 2 3 4 6
-2 1 2 3 4
Read the article "Compute the kth smallest data value in SAS"
Define the modules as in the article. Then use the following:
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
x = have`; /* transpose */
ord = j(5,ncol(x));
do j = 1 to ncol(x);
ord[,j] = ordinal(1:5, x[,j]);
end;
print ord;
If you have missing values in your data and want to exclude them, use the SMALLEST module instead of the ORDINAL module.
You can use call sort() in PROC IML to sort a column. Because you want to separate the columns and not sort the whole matrix, extract the column, sort it, and then update the original.
You want to sort rows, so transpose your matrix, do the sorting, and then transpose back.
proc iml;
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
print have;
n = nrow(have);
have = have`; /*Transpose because sort works on columns*/
do i=1 to n;
tmp = have[,i];
call sort(tmp,1);
have[,i]=tmp;
end;
have = have`;
want = have[,1:5];
print want;
quit;

Formula that uses previous value

In Stata I want to have a variable calculated by a formula, which includes multiplying by the previous value, within blocks defined by a variable ID. I tried using a lag but that did not work for me.
In the formula below the Y-1 is intended to signify the value above (the lag).
gen Y = 0
replace Y = 1 if count == 1
sort ID
by ID: replace Y = (1+X)*Y-1 if count != 1
X Y count ID
. 1 1 1
2 3 2 1
1 6 3 1
3 24 4 1
2 72 5 1
. 1 1 2
1 2 2 2
7 16 3 2
Your code can be made a little more concise. Here's how:
input X count ID
. 1 1
2 2 1
1 3 1
3 4 1
2 5 1
. 1 2
1 2 2
7 3 2
end
gen Y = count == 1
bysort ID (count) : replace Y = (1 + X) * Y[_n-1] if count > 1
The creation of a dummy (indicator) variable can exploit the fact that true or false expressions are evaluated as 1 or 0.
Sorting before by and the subsequent by command can be condensed into one. Note that I spelled out that within blocks of ID, count should remain sorted.
This is really a comment, not another answer, but it would be less clear if presented as such.
Y-1, the lag in the formula would be translated as seen in the below.
gen Y = 0
replace Y = 1 if count == 1
sort ID
by ID: replace Y = (1+X)*Y[_n-1] if count != 1

SAS- Combining multiple variables into one by choosing the maximum value

Combining multiple variables into one by choosing the maximum value
id v1 v2 v3 v4 v5 v6
1 1 2 5 3 1 1
2 4 2 3 5 1
3 3 2 2 1 3
4 2 1 2 5 7
5 6 7 1 2 1 7
into n1=max(v1,v2), n2=v3, n3=max(v4,v5,v6)
id n1 n2 n3
1 2 5 3
2 4 3 5
3 3 2 3
4 2 2 7
5 7 1 7
How do I do this in SAS? (It's so easy in excel.. It's relatively intuitive in R.. But I can't figure it out in SAS! Please help!)
Thank you for your time!
MAX function is your friend.
data want;
set have;
n1 = max(of v1 v2);
n2 = v3;
n3 = max(of v4 v5 v6);
run;
Arrays and variable lists also work (such as, n3 = max(of v4-v6);).
I agree that the MAX function is what you want, but I would code it differently.
data want;
set have;
n1 = max(v1, v2);
n2 = v3;
n3 = max(v4, v5, v6);
run;
Alternatively:
data want;
set have;
n1 = max(v1, v2);
n2 = v3;
n3 = max(of v4-v6);
run;