I have a data set which essence is the following
data have;
input Name $ ab gh vz iz jh pq ch km eo lk;
datalines;
adam 7 8 7 0 0 0 0 0 0 0
bob 0 1 0 3 4 6 0 1 6 0
clint 0 0 0 5 4 3 1 0 0 2
;
run;
Now I would like to count how many times I have a number greater than zero in the variables iz, jh, chand km. The result should look like this
/* want
Name ab gh vz iz jh pq ch km eo lk count_of_iz_jh_ch_km
adam 7 8 7 0 2 3 0 0 0 0 1
bob 0 1 0 3 0 6 0 1 6 0 2
clint 5 0 0 5 4 3 1 2 0 2 4
*/
I would greatly appreciate any help since I wasn't successful searching the internet for a solution.
Gerit
The below code will initialize the required variables from have into an array called vars, then for each row, count every time one of these variables is > 0.
data want;
set have;
array vars[*] iz jh ch km;
count_of_iz_ch_km = 0;
do i = 1 to dim(vars);
if(vars[i] > 0) then count_of_iz_ch_km+1;
end;
drop i;
run;
Related
In SAS, I would like to create a label that check the previous sell indicator: if the sell indicator of the previous time period is 1/0 and in the current is 0/1 (meaning that it has changed) then I assign a value 1 to the ind variable.
The dataset looks like:
Customer Time Sell_Ind
1 2 1
1 3 0
1 4 0
2 23 0
2 24 0
2 30 0
5 12 1
5 11 0
And so on.
My expected output would be
Customer Time Sell_Ind Ind
1 2 1 0
1 3 0 1
1 4 0 0
2 23 0 0
2 24 0 0
2 30 0 0
5 12 1 0
5 11 0 1
The previous/current check is meant by customer.
I have tried as follows
data mydata;
set original;
By customer;
Lag_sell_ind=lag(sell_ind);
If first.customer then Lag_sell_ind=.;
Run;
But it does not return the expected output.
In sql I would probably use partition by customer over time but I do not know how to do the same in SAS.
You were halfway through, you only need to add one if statement to achieve the desired output.
data want;
set have;
by customer;
lag=lag(sell_ind);
if first.customer then lag=.;
if sell_ind ne lag and lag ne . then ind = 1;
else ind = 0;
drop lag;
run;
You can simplify this using the IFN Function like below.
data have;
input Customer Time Sell_Ind;
datalines;
1 2 1
1 3 0
1 4 0
2 23 0
2 24 0
2 30 0
5 12 1
5 11 0
;
data want;
set have;
by customer;
Lag_sell_ind = ifn(first.customer, 0, lag(sell_ind));
Run;
I have the following data that has been prepared with stset. The resulting variables signify cohort entry and exit times along with event status. In addition, a numerical variable - prob has been calculated based on the riskset size.
For those subjects that are not cases (where _d == 0), I need to sum all values of the prob variable where _t falls within that subject's follow-up time.
For example, subject 8 enters the cohort at _t0 == 0 and exits at _t == 8. Between these times, there are three prob values 0.9, 0.875 and 0.875 - giving the desired answer for subject 8 as 2.65.
* Example generated by -dataex-. To install: ssc install dataex
clear
input long id byte(_t0 _t _d) float prob
1 0 1 0 .
2 0 2 0 .
3 1 3 1 .9
4 0 4 0 .
5 0 5 1 .875
6 0 6 1 .875
7 5 7 0 .
8 0 8 0 .
9 0 9 1 .8333333
10 0 10 1 .8
11 0 11 0 .
12 8 12 1 .6666667
13 0 13 0 .
14 0 14 0 .
15 0 15 0 .
end
The desired output would return all of the data with an additional variable signifying the summed values of prob.
Thanks so much in advance.
I have a dataset consisting of variables ObservationNumber, MeasurementNumber, SubjectID, and many dummy variables.
I would like to consolidate all non-zero values into one row by SubjectID GroupNumber.
Have:
ObsNum MeasurementNum SubjectID Dummy0 Dummy1 ... Dummy999
----------------------------------------------------...---------------
01 1 1 0 1 ... 0
02 2 1 0 1 ... 0
03 3 1 0 1 ... 0
04 4 1 0 0 ... 0
05 5 1 - - ... -
06 6 1 0 0 ... 0
07 1 2 1 0 ... 0
08 2 2 0 0 ... 0
09 3 2 0 1 ... 0
10 4 2 1 0 ... 0
11 4 2 0 1 ... 0
12 5 2 0 0 ... 1
13 6 2 0 0 ... 0
14 6 2 0 0 ... 1
15 6 2 0 0 ... 0
16 6 2 0 0 ... 0
17 6 2 0 1 ... 0
18 6 2 0 0 ... 0
19 6 2 0 0 ... 0
20 6 2 0 0 ... 0
21 6 2 1 0 ... 0
22 1 3 1 0 ... 0
23 2 3 0 1 ... 0
24 3 3 0 0 ... 1
25 4 3 - - ... -
26 5 3 0 0 ... 0
27 6 3 0 0 ... 0
28 1 4 - - ... -
29 2 4 0 0 ... 0
30 3 4 0 1 ... 0
31 4 4 1 0 ... 0
32 4 4 0 1 ... 0
33 4 4 0 0 ... 1
34 5 4 0 0 ... 1
35 6 4 0 1 ... 0
36 6 4 0 0 ... 1
Want:
MeasurementNum SubjectID Dummy0 Dummy1 ... Dummy999
----------------------------------------------------...---------------
1 1 0 1 ... 0
2 1 0 1 ... 0
3 1 0 1 ... 0
4 1 0 0 ... 0
5 1 - - ... -
6 1 0 0 ... 0
1 2 1 0 ... 0
2 2 0 0 ... 0
3 2 0 1 ... 0
4 2 1 1 ... 0
5 2 0 0 ... 1
6 2 1 1 ... 1
1 3 1 0 ... 0
2 3 0 1 ... 0
3 3 0 0 ... 1
4 3 - - ... -
5 3 0 0 ... 0
6 3 0 0 ... 0
1 4 - - ... -
2 4 0 0 ... 0
3 4 0 1 ... 0
4 4 1 1 ... 1
5 4 0 0 ... 1
6 4 0 1 ... 1
Each SubjectID has six measurement in which a set of dummyvariables are measured without outcome 0, 1 or missing. If a missing value occurs, all dummy variables for the respective observation are missing--and only one observation will be present in the dataset for that `MeasurementNumber.
I have tried to use the UPDATE statement, but it seems to not be able to deal with '0' and '-'.
Is there a direct way of condensing all dummyvariables in this dataset for each SubjectID grouped by MeasurementNumber?
Use Proc MEANS with BY and OUTPUT statements.
data have;
rownum = 0;
do rowid = 1 to 1000;
subjectid + 1;
do measurenum = 1 to 6;
do repeat = 1 to ceil(4 * ranuni(123));
array flags flag1-flag999;
do _n_ = 1 to dim(flags);
flags(_n_) = ranuni(123) < 0.10;
if subjectid < 7 and measurenum = subjectid then flags(_n_) = .;
end;
rownum + 1;
output;
end;
end;
end;
keep rownum measurenum subjectid flag:;
run;
proc means noprint data=have;
by subjectid measurenum;
var flag:;
output max=;
run;
The following code produces the picture below.
As you can see, the group statement results in different colours for the data points.
Question: How can I also have different symbols for the two groups?
proc sgplot data=test;
scatter x=time y=Y / group=group;
run;
group time Y
0 0 10085.472039
0 0 10085.472039
0 0 10085.472039
0 1 9950.3642122
0 2 9817.0663279
0 4 9555.8037259
0 6 9301.4941325
0 8 9053.9525066
0 8 9053.9525066
0 8 9053.9525066
1 0 2954.7558871
1 0 2954.7558871
1 0 2954.7558871
1 1 2987.6191302
1 2 3020.8478832
1 4 3088.4182255
1 6 3157.4999815
1 8 3228.1269586
1 8 3228.1269586
1 8 3228.1269586
0 0 3929.2678194
0 0 3929.2678194
0 0 3929.2678194
0 1 3903.7639936
0 2 3878.4257063
0 4 3828.2414563
0 6 3778.7065572
0 8 3729.8126068
0 8 3729.8126068
0 8 3729.8126068
1 0 2694.5952697
1 0 2694.5952697
1 0 2694.5952697
1 1 2580.159876
1 2 2470.5843807
1 4 2265.1962804
1 6 2076.8827929
1 8 1904.2244475
1 8 1904.2244475
1 8 1904.2244475
Using http://www.ats.ucla.edu/stat/sas/faq/gr2grps_new.htm:
symbol1 v=star c=red h=1;
symbol2 v=triangle c=blue h=1;
proc gplot data=temp;
plot y*time=group;
run;
quit;
I have two tables A and B that look like below.
Table A
rowno flag1 flag2 flag3
1 1 0 0
2 0 1 1
3 0 0 0
4 0 1 1
5 0 0 1
6 0 0 0
7 0 0 0
8 0 1 0
9 0 0 0
10 1 0 0
Table B
rowno flag1 flag2 flag3
Table A and B have the same column names but B is an empty table initially.
So what I want to accomplish is to insert the values from A to B row by row using macro, iteration by rowno. And each time I insert one row from A to B, I want to calculate the sum of each flag column.
If after insert each row, the sum(flag1) > 1 or sum(flag2) >1 or sum(flag3) >1, I need to delete that inserted row from table B. Then the iteration keeps running till the end of the observation in Table A. The final output in Table B is to have 5 observations from table A.
the code I have so far is below:
%macro iteration;
%do rowno=1 %to 10;
proc sql;
insert into table.B
select *
from table.A
where rowno = &rowno;
quit;
set table.B;
if
sum(flag1) > 1
or
sum(flag2) > 1
or
sum(flag3) > 1
then delete;
run;
%end;
%mend iteration;
%iteration
I received a lot of error messages.
Looking forward to your help and suggestions. Thanks.
The ideal output data would look like this
rowno flag1 flag2 flag3
1 1 0 0
2 0 1 1
3 0 0 0
6 0 0 0
7 0 0 0
Instead of a macro, use a running sum to calculate the running sum of each row. If you need to delete a row remember to reverse the increment to the running sum. Based on your data, I think Row 9 should also be kept.
data TableA;
input rowno flag1 flag2 flag3;
cards;
1 1 0 0
2 0 1 1
3 0 0 0
4 0 1 1
5 0 0 1
6 0 0 0
7 0 0 0
8 0 1 0
9 0 0 0
10 1 0 0
;
run;
data TableB;
set TableA;
retain sum_:;
*Increment running sum for flag;
sum_flag1+flag1;
sum_flag2+flag2;
sum_flag3+flag3;
*Check flag amounts;
if sum_flag1>1 or sum_flag2>1 or sum_flag3>1 then do;
*if flag is tripped then delete increment to flag and remove record;
sum_flag1 +-flag1;
sum_flag2 +-flag2;
sum_flag3 +-flag3;
delete;
end;
run;