I want to transpose a simple dataset as left, to become a dataset at the right. They are all numeric variables. Please also make the variable names as I put there (I have a lot of variables I want to follow this pattern), would prefer not to rename them by hand one by one if possible. Thank you!
Here is a simple approach. I added another id for demonstration. You can re-arrange the columns if you like.
data have;
input id Vistime v1 v2;
datalines;
1 1 2 5
1 2 3 6
1 3 4 7
2 1 2 5
2 2 3 6
2 3 4 7
;
proc transpose data=have out=temp;
by id Vistime;
var v1 v2;
run;
proc transpose data=temp delim=_ out=want(drop=_:);
by id;
var col1;
id _name_ Vistime;
run;
Result
id v1_1 v2_1 v1_2 v2_2 v1_3 v2_3
1 2 5 3 6 4 7
2 2 5 3 6 4 7
Related
I need to produce a few reports and I always struggle with the report procedure to obtain the required result without messing too much with the initial dataset what takes a lot of time.
The dataset is of the following form:
ID VAR1 VAR2 VAR3
1 1 2 3
1 4 5 6
2 7 8 9
2 10 11 12
and the output should be:
ID VAR1
VAR2
VAR3
1 1
2
3
4
5
6
2 7
8
9
10
11
12
Is there a good way (i.e. efficient) of producing the output using proc report? Becuase the only way I see is to manually create blank characters in ID variable and manually create the column with variables and using put function heavily to create spaces. The only thing proc report is used for is to create blank spaces between subjects by using sorting variables. The amount of time it takes is insane. I tried to find some good resources on that but with no success.
I will appreciate any suggestions. Thanks.
Sounds like you just want to use data step to produce your "report".
Here is an outline:
data _null_;
set have;
by id;
array cols var1-var3;
if first.id then put #1 id #;
do index=1 to dim(cols);
put #5+index cols[index] ;
end;
put;
run;
Results:
1 1
2
3
4
5
6
2 7
8
9
10
11
12
Add a FILE statement to direct the output somewhere else. For example use FILE PRINT; to send the report to the listing output instead of the LOG.
I have data structured as follows in Table 1:
ID Variable1 Variable2
1 2 5
2 10 2
3 14 3
4 4 3
I need to add the following data to the above table for each row in Table 2:
Coef Value
Variable1C 4.2
Variable2C 5.6
The final result should be:
ID Variable1 Variable2 Variable1C Variable2C
1 2 5 4.2 5.6
2 10 2 4.2 5.6
3 14 3 4.2 5.6
4 4 3 4.2 5.6
How might I pursue this? So far, I've only be able to get one of data by transforming table 2 and then adding it, but this is not what I want.
A simple data step should do that.
data want ;
set have ;
Variable1C=4.2 ;
Variable2=5.6;
run;
If you have the data in a table then transpose it and combine them.
proc transpose data=table2 out=wide ;
id coef ;
var value ;
run;
data want ;
set table1;
if _n_=1 then set wide ;
run;
I have data that's tracking a certain eye phenomena. Some patients have it in both eyes, and some patients have it in a single eye. This is what some of the data looks like:
EyeID PatientID STATUS Gender
1 1 1 M
2 1 0 M
3 2 1 M
4 3 0 M
5 3 1 M
6 4 1 M
7 4 0 M
8 5 1 F
9 6 1 F
10 6 0 F
11 7 1 F
12 8 1 F
13 8 0 F
14 9 1 F
As you can see from the data above, there are 9 patients total and all of them have the particular phenomena in one eye.
I need the count the number of patients with this eye phenomena.
To get the number of total patients in the dataset, I used:
PROC FREQ data=new nlevels;
tables PatientID;
run;
To count the number of patients with this eye phenomena, I used:
PROC SORT data=new out=new1 nodupkey;
by Patientid Status;
run;
proc freq data=new1 nlevels;
tables Status;
run;
However, it gave the correct number of patients with the phenomena (9), but not the correct number without (0).
I now need to calculate the gender distribution of this phenomena. I used:
proc freq data=new1;
tables gender*Status/chisq;
run;
However, in the cross table, it has the correct number of patients who have the phenomena (9), but not the correct number without (0). Does anyone have any thoughts on how to do this chi-square, where if the has this phenomena in at least 1 eye, then they are positive for this phenomena?
Thanks!
PROC FREQ is doing what you told it to: counting the status=0 cases.
In general here you are using sort of blunt tools to accomplish what you're trying to accomplish, when you probably should use a more precise tool. PROC SORT NODUPKEY is sort of overkill for example, and it doesn't really do what you want anyway.
To set up a dataset of has/doesn't have, for example, let's do a few things. First I add one more row - someone who actually doesn't have - so we see that working.
data have;
input eyeID patientID status gender $;
datalines;
1 1 1 M
2 1 0 M
3 2 1 M
4 3 0 M
5 3 1 M
6 4 1 M
7 4 0 M
8 5 1 F
9 6 1 F
10 6 0 F
11 7 1 F
12 8 1 F
13 8 0 F
14 9 1 F
15 10 0 M
;;;;
run;
Now we use the data step. We want a patient-level dataset at the end, where we have eye-level now. So we create a new patient-level status.
data patient_level;
set have;
by patientID;
retain patient_status;
if first.patientID then patient_status =0;
patient_status = (patient_Status or status);
if last.patientID then output;
keep patientID patient_Status gender;
run;
Now, we can run your second proc freq. Also note you have a nice dataset of patients.
title "Patients with/without condition in any eye";
proc freq data=patient_level;
tables patient_status;
run;
title;
You also may be able to do your chi-square analysis, though I'm not a statistician and won't dip my toe into whether this is an appropriate analysis. It's likely better than your first, anyway - as it correctly identifies has/doesn't have status in at least one eye. You may need a different indicator, if you need to know number of eyes.
title "Crosstab of gender by patient having/not having condition";
proc freq data=patient_level;
tables gender*patient_Status/chisq;
run;
title;
If your actual data has every single patient having the condition, of course, it's unlikely a chi-square analysis is appropriate.
I have a dataset like this (but with several hundred vars):
id q1 g7 q3 b2 zz gl az tre
1 1 2 1 1 1 2 1 1
2 2 3 3 2 2 2 1 1
3 1 2 3 3 2 1 3 3
4 3 1 2 2 3 2 1 1
5 2 1 2 2 1 2 3 3
6 3 1 1 2 2 1 3 3
I'd like to keep id, b2, and tre, but set everything else to missing. In a dataset this small, I can easily use call missing (q1, g7, q3, zz, gl, az) - but in a set with many more variables, I would effectively like to say call missing (of _ALL_ *except ID, b2, tre*).
Obviously, SAS can't read my mind. I've considered workarounds that involve another data step or proc sql where I copy the original variables to a new ds and merge them back on post, but I'm trying to find a more elegant solution.
This technique uses an un-executed set statement (compile time function only) to define all variables in the original data set. Keeps the order and all variable attributes type, labels, format etc. Basically setting all the variables to missing. The next SET statement which will execute brings in only the variables the are NOT to be set to missing. It doesn't explicitly set variables to missing but achieves the same result.
data nomiss;
input id q1 g7 q3 b2 zz gl az tre;
cards;
1 1 2 1 1 1 2 1 1
2 2 3 3 2 2 2 1 1
3 1 2 3 3 2 1 3 3
4 3 1 2 2 3 2 1 1
5 2 1 2 2 1 2 3 3
6 3 1 1 2 2 1 3 3
;;;;
run;
proc print;
run;
data manymiss;
if 0 then set nomiss;
set nomiss(keep=id b2 tre:);
run;
proc print;
run;
Another fairly simple option is to set them missing using a macro, and basic code writing techniques.
For example, let's say we have a macro:
%call_missing(var=);
call missing(&var.);
%mend call_missing;
Now we can write a query that uses dictionary.columns to identify the variables we want set to missing:
proc sql;
select name
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not (name in ('ID','B2','TRE')); *note UPCASE for all these;
quit;
Now, we can combine these two things to get a macro variable containing code we want, and use that:
proc sql;
select cats('%call_missing(var=',name ,')')
into :misslist separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not (name in ('ID','B2','TRE')); *note UPCASE for all these;
quit;
data want;
set have;
&misslist.;
run;
This has the advantage that it doesn't care about the variable types, nor the order. It has the disadvantage that it's somewhat more code, but it shouldn't be particularly long.
If the variables are all of the same type (numeric or character) then you could use an array.
data want ;
set have;
array _all_ _numeric_ ;
do over _all_;
if upcase(vname(_all_)) not in ('ID','B2') then _all_=.;
end;
run;
If you don't care about the order then just drop the variables and add them back on with 0 observations.
data want;
set have (keep=ID B2 TRE:) have (obs=0 drop=ID B2 TRE:);
run;
I am wanting to count the number of time a certain value appears in a particular column in sas. For example in the following dataset the value 1 appears 3 times
value 2 appears twice, value 3 appears once, value 4 appears 4 times and value 5 appears four times.
Game_ball
1
1
1
2
2
3
4
4
4
5
5
5
5
5
I want the dataset to represented like the following:
Game_ball Count
1 3
2 2
3 1
4 4
5 4
. .
. .
. .
Thanks in advance
As per #Dwal, proc freq is the easiest solution.
Using your sample data,
proc freq data=sample;
table game_ball/out=output;
run;
Or do it in one-pass data step
proc sort data = sample;by game_ball;run;
data output;
set sample;
retain count;
if first.game_ball then count = 0;
count + 1;
if last.game_ball then output;
by game_ball;
run;
Or in SQL
proc sql;
create table output as
select game_ball, count(*) as count
from sample
group by game_ball;
quit;