I'm here asking for help for a problem with proc transpose.
I have a dataset made this way (I'm going to show only 3 variables but I have lots of them)
PR ID VAR1a VAR1b VAR1c VAR2a VAR2b VAR2c VAR3a VAR3b VAR3c
1 1 x x x x x x x x x
1 2 x x x x x x x x x
1 3 x x x x x x x x x
2 1 x x x x x x x x x
2 2 x x x x x x x x x
2 3 x x x x x x x x x
I need an output dataset like this:
PREID ID VAR(name) A B C
1 1 VAR1(name) x x x
1 1 VAR2(name) x x x
1 1 VAR3(name) x x x
1 2 VAR1(name) x x x
1 2 VAR2(name) x x x
1 2 VAR3(name) x x x
1 3 VAR1(name) x x x
1 3 VAR2(name) x x x
1 3 VAR3(name) x x x
etc with preid 2 id 1 2 3, preid 3 id 1 2 3.
So I need to transpose but using the name (discriminating from a b c), I really have no idea from where I could start.
Can you help me please?
If i'm able to understand the output correctly. I think to achieve the result, first each observation of your input data would be broken into several different observation. So single observation would be converted into 9(var1a to var3c) observations( You can achive that using proc transpose by pr & id variable and transpose var1a to var3c variables). After this using a datastep, you would need to break _NAME__ variable into var1/2/3 and the a/b/c. After getting this done, you should be able to transpose the data to achieve your result.
I tried to write down the code based on your input data. Let me know if it helps.
data input;
infile datalines dsd dlm=',' missover;
input PR :$8.
ID :$8.
VAR1a :$8.
VAR1b :$8.
VAR1c :$8.
VAR2a :$8.
VAR2b :$8.
VAR2c :$8.
VAR3a :$8.
VAR3b :$8.
VAR3c :$8.;
datalines4;
1,1,x,x,x,x,x,x,x,x,x
1,2,x,x,x,x,x,x,x,x,x
1,3,x,x,x,x,x,x,x,x,x
2,1,x,x,x,x,x,x,x,x,x
2,2,x,x,x,x,x,x,x,x,x
2,3,x,x,x,x,x,x,x,x,x
;;;;
run;
proc transpose data=input out=staging ;
by pr id ;
var VAR1a--VAR3c;
run;
data staging;
set staging;
var=substrn(strip(_name_),1,length(strip(_name_))-1);
dummy=substrn(strip(_name_),length(strip(_name_)),1);
drop _name_;
run;
proc transpose data=staging out=final(drop=_name_);
by pr id var;
id dummy;
var col1;
run;
proc print data=final;run;
Similar to #sushil solution above, but one less step. Since you have to go into a data step anyways, you may as well transpose the data in that step as well. So in this solution the Proc Transpose/Data step are combined. If you had few enough variables I'd remove the last transpose as well, but this is more flexible if you have quite a few variables.
data input;
infile datalines dsd dlm=',' missover;
input PR :$8.
ID :$8.
VAR1a :$8.
VAR1b :$8.
VAR1c :$8.
VAR2a :$8.
VAR2b :$8.
VAR2c :$8.
VAR3a :$8.
VAR3b :$8.
VAR3c :$8.;
datalines4;
1,1,x,x,x,x,x,x,x,x,x
1,2,x,x,x,x,x,x,x,x,x
1,3,x,x,x,x,x,x,x,x,x
2,1,x,x,x,x,x,x,x,x,x
2,2,x,x,x,x,x,x,x,x,x
2,3,x,x,x,x,x,x,x,x,x
;;;;
run;
data out1;
set input;
array vars(*) var1a--var3c;
do i=1 to dim(vars);
name=vname(vars(i));
varname=substr(name,1,length(name)-1);
group=substr(name,length(name));
value=vars(i);
output;
end;
drop var1a--var3c;
run;
proc transpose data=out1 out=out2;
by pr id varname;
id group;
var value;
run;
Related
I have a series of string values with missing observations. I would like to use flat substitution. For instance variable x has 3 available values. There should be a 33.333% chance that a missing value will be assigned to the available values for x under this substitution method. How would I do this?
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
Run;
You could use temporary arrays to store the possible values. Then generate a random index into the array.
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
data want ;
set have ;
array possible_b (2) $8 ('Male','Female') ;
if missing(b) then b=possible_b(1+int(rand('uniform')*dim(possible_b)));
run;
I did this with generating random numbers and hard coding the limits. There should be an easier way to do this, but for the purposes of the question this should work.
option missing='';
data begin;
input a $;
cards;
a
.
b
c
.
e
.
f
g
h
.
.
j
.
;
run;
data intermediate;
set begin;
if a EQ '' then help= rand("uniform");
else help=.;
run;
data wanted;
set intermediate;
format help populated.;
if a EQ '' then do;
if 0<=help<0.33 then a='V1';
else if 0.33<=help<0.66 then a='V2';
else if 0.66<=help then a='V3';
end;
drop help;
run;
I'm trying to pull only last 4 working days data in SAS...I tried following code but I'm not getting what I'm intended to...
data input;
Input id $ id1 $ id2 $ num date date9.;
Format Date Date9.;
datalines;
x y z 3 19JUL2015
x y z 2 18JUL2015
x y z 3 17JUL2015
x y z 2 16JUL2015
x y z 3 15JUL2015
x y z 2 14JUL2015
x y z 3 13JUL2015
a b c 1 12JUL2015
a b c 1 11JUL2015
a b c 1 10JUL2015
a b c 1 09JUL2015
a b c 1 08JUL2015
a b c 2 07JUL2015
x y z 1 06JUL2015
;
Run;
Data test;
Set input;
Weekday=Weekday(Date);
intck=intck('weekday',Date,today());
*if intck('weekday',Date,today()) >4;
if 1<Weekday(Date)<7 and Date>=today()-4;
Run;
I think you need to reverse the > in your code, and add a qualification that you only want weekdays:
Data test;
Set input;
Weekday=Weekday(Date);
intck=intck('weekday',Date,today());
if intck('weekday',Date,'20JUL2015'd) le 4 and 1<weekday(Date)<7;
*if 1<Weekday(Date)<7 and Date>='20JUL2015'd-5;
Run;
I have two datasets in SAS. They both contain the same variable x. In the first data set, I want to remove those observations whose x value is also in the x values in the second data set.
Example,
data set1;
input x y z;
datalines;
1 1.5 2.2
1 2.1 9.0
2 4.2 4.4
3 4.5 2.4
;
run;
data set2;
input x y;
datalines;
1 15
2 44
;
run;
In set 1, I want to remove those observations if x=1 or x=2 where 1 and 2 come from the x values from second data set. I only want to keep the last row in set 1.
So your final answer should only include the 3? There are a few ways, but I find this the clearest method for understanding.
proc sql;
create table want as
select *
from set1
where x not in (select x from set2);
quit;
Data step version:
data want;
merge set1(in = _1)
set2(in = _2 keep = x);
by x;
if _1 and not(_2);
run;
This assumes that set1 and set2 have both either been sorted by x or have an index on x.
I have the following data set:
data data_one;
length X 3
Y $ 20;
input x y ;
datalines;
1 test
2 test
3 test1
4 test1
5 test
6 test
7 test1
run;
data data_two;
length Z 3
A $ 20;
input Z A;
datalines;
1 test
2 test1
3 test2
run;
What I would like to have is a data set which tells me how often column Y in data_one contains the same string of column A in data_two. The result should look like this one:
Obs test test1 test2
1 4 3 0
Thanks in advance!
First we need the counts for those values of Y present in data_one.
Then we create a sorted (for the next merge) list of the values present in data_two.
The data_one Y counts from 1. are merged with the list from 2.
The Y values present in data_two but not in data_one (b and not a) are assigned count=0, the Y values not present in data_two are discarded (if b).
The last passage transposes the vertical list of counts in an horizontal set of variables.
proc freq data=data_one noprint;
table y / out=count_one (keep=y count);
run;
proc sort data=data_two out=list_two (keep=a rename=(a=y)) nodupkey;
by a;
run;
data count_all;
merge count_one (in=a) list_two (in=b);
by y;
if (b and not a) then count=0;
if b;
run;
proc transpose data=count_all out=final (drop=_name_ _label_);
id y;
run;
The first 3 steps can be replaced with one proc SQL:
proc sql;
create table count_all as
select distinct
coalesce(t1.y,t2.a) as y,
case
when missing(t1.y) then 0
else count(t1.y)
end as N
from data_one as t1
right join data_two as t2
on t1.y=t2.a
group by 1
order by 1;
quit;
proc transpose data=count_all out=final (drop=_name_);
id y;
run;
How to add new observation to already created dataset in SAS ? For example, if I have dataset 'dataX' with variable 'x' and 'y' and I want to add new observation which is multiplication by two of the
of the observation number n, how can I do it ?
dataX :
x y
1 1
1 21
2 3
I want to create :
dataX :
x y
1 1
1 21
2 3
10 210
where observation number four is multiplication by ten of observation number two.
data X;
input x y;
datalines;
1 1
1 21
2 3
;
run;
data X ;
set X end=eof;
if eof then do;
output;
x=10 ;y=210;
end;
output;
run;
Here is one way to do this:
data dataX;
input x y;
datalines;
1 1
1 21
2 3
run;
/* Create a new observation into temp data set */
data _addRec;
set dataX(firstobs=2); /* Get observation 2 */
x = x * 10; /* Multiply each by 10 */
y = y * 10;
output; /* Output new observation */
stop;
run;
/* Add new obs to original data set */
proc append base=dataX data=_addRec;
run;
/* Delete the temp data set (to be safe) */
proc delete data=_addRec;
run;
data a ;
do kk=1 to 5 ;
output ;
end ;
run;
data a2 ;
kk=999 ;
output ;
run;
data a; set a a2 ;run ;
proc print data=a ;run ;
Result:
The SAS System 1
OBS kk
1 1
2 2
3 3
4 4
5 5
6 999
You can use macro to obtain your desired result :
Write a macro which will read first DataSet and when _n_=2 it will multiply x and y with 10.
After that create another DataSet which will hold only your muliplied value let say x'=10x and y'=10y.
Pass both DataSet in another macro which will set the original datset and newly created dataset.
Logic is you have to create another dataset with value 10x and 10y and after that set wih previous dataset.
I hope this will help !