I have dataset M
number id_no date
1 123 3/3/2012
2 123 3/3/2012
3 . .
4 . .
How do I copy 123 and 3/3/2012 into the obs 4 & 5.
This should get you there.
data one;
input
number id_no date mmddyy10.;
format date mmddyy10.;
datalines;
1 123 3/3/2012
2 123 3/3/2012
3 . .
4 . .
5 456 .
;
run;
proc sort data = one;
by number;
run;
data two;
set one;
retain _id_no _date;
if missing(_id_no) then _id_no = id_no;
if missing(id_no) then id_no = _id_no;
if missing(_date) then _date = date;
if missing(date) then date = _date;
drop _id_no _date;
run;
Related
I'd like to assign to an empty field a value based on many values of other entries. Here my dataset:
input ID date $10. type typea $10. ;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 . 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 . 1 cb
5 . 2 b
;
run;
My goal is the following: for all empty entries of the variable "date", assign to it the same date of the record which has the same ID, the same type, but a different typea. If there aren't other records with the criteria described, leave the date field empty. So the output should be:
data temp;
input ID date $10. type typea $10.;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 10/12/2006 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 11/09/2008 1 cb
5 . 2 b
;
run;
I tried with something like that based on another answer on SO (SAS: get the first value where a condition is verified by group), but it doesn't work:
by ID type typea ;
run;
data temp;
set temp;
by ID type typea ;
if cat(first.ID, first.type, first.typea) then date_store=date;
if cat(ID eq ID and type ne type and typea eq typea) then do;
date_change_type1to2=date_store;
end;
run;
Do you have any hints? Thanks a lot!
You could use UPDATE statement to help you carry-forward the DATE values for a group.
data have;
input ID type typea :$10. date :yymmdd. ;
format date yymmdd10.;
datalines;
1 1 a 2006-11-10
2 2 a 2006-12-10
2 2 b .
3 5 p 2007-01-20
4 1 r .
5 1 ca 2008-09-11
5 1 cb .
5 2 b .
;
data want;
update have(obs=0) have;
by id type ;
output;
run;
If there are also missing values of TYPEA then those will also be carried forward. If you don't want that to happen you could re-read just those variables after the update.
data want;
update have(obs=0) have;
by id type ;
set have(keep=typea);
output;
run;
I tried searching but couldn't exactly find what I was looking for. I have a dataset with multiple rows per ID. I'd like to add a variable called maxdec and show a 1 for each row that has the max dec for each ID.
Sample Dataset:
ID DEC
123 1
123 2
123 2
123 2
456 2
456 3
456 3
Desired Output:
ID DEC MAXDEC
123 1 .
123 2 1
123 2 1
123 2 1
456 2 .
456 2 .
456 3 1
It is easier to define it with 1 or 0 instead of 1 or missing.
proc sql;
create table want as
select id,dec, dec=max(dec) as maxdec
from have
group by id
;
quit;
proc sort data=have;
by id;
proc summary data=have;
class id;
var dec;
output out=max_info max=max_value;
run;
data want;
merge have
max_info (keep=id max_value)
;
by id;
if dec=max_value then maxdec=1;
run;
The proc summary calculates the maximum value of DEC for each ID, and outputs as variable MAX_VALUE in dataset MAX_INFO. The subsequent data step assigns MAXDEC=1 if the current value of DEC is equal to MAX_VALUE for that ID.
Here is a DoW loop approach
data have;
input ID DEC;
datalines;
123 1
123 2
123 2
123 2
456 2
456 3
456 3
;
data want(drop = m);
do _N_ = 1 by 1 until (last.id);
set have;
by id;
m = max(maxdex, dec);
end;
do _N_ = 1 to _N_;
set have;
maxdex = ifn(dec = m, 1, .);
output;
end;
run;
I have a series of string values with missing observations. I would like to use flat substitution. For instance variable x has 3 available values. There should be a 33.333% chance that a missing value will be assigned to the available values for x under this substitution method. How would I do this?
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
Run;
You could use temporary arrays to store the possible values. Then generate a random index into the array.
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
data want ;
set have ;
array possible_b (2) $8 ('Male','Female') ;
if missing(b) then b=possible_b(1+int(rand('uniform')*dim(possible_b)));
run;
I did this with generating random numbers and hard coding the limits. There should be an easier way to do this, but for the purposes of the question this should work.
option missing='';
data begin;
input a $;
cards;
a
.
b
c
.
e
.
f
g
h
.
.
j
.
;
run;
data intermediate;
set begin;
if a EQ '' then help= rand("uniform");
else help=.;
run;
data wanted;
set intermediate;
format help populated.;
if a EQ '' then do;
if 0<=help<0.33 then a='V1';
else if 0.33<=help<0.66 then a='V2';
else if 0.66<=help then a='V3';
end;
drop help;
run;
I have data like below
p_id E_id
---- ----
1 1
1 2
1 3
1 4
2 1
3 1
3 2
3 3
4 1
For each primary_id I have to create a table of the corresponding E_id.
How do I do it in SAS;
I am using:
proc freq data = abc;
where p_id = 1;
tables p_id * E_id;
run;
How do I generalize the where statement for all the primary keys??
The by statement is how you get a separate table for each ID. It requires data to be sorted by the variable.
proc freq data = abc;
by p_id;
tables p_id * E_id;
run;
Here is a solution allowing you to select p_id to generate frequency tables.
data have;
input p_id e_id;
datalines;
1 1
1 2
1 3
1 4
2 1
3 1
3 2
3 3
4 1
;
run;
proc sort data = have;
by p_id;
run;
%let pid_list = (1,2); ** only generate two tables;
data _null_;
set have;
by p_id;
if first.p_id and p_id in &pid_list then do;
call execute('
proc freq data = have(where = (p_id = '||p_id||'));
tables p_id * e_id;
run;
');
end;
run;
Given the following dataset:.
obs var1 var2 var3
1 123 456 .
2 123 . 789
3 . 456 789
How does one go about to append all the variables into a single variable whilst ignoring the empty observations (denoted by ".")?
Desired output:.
obs var4
1 123
2 123
3 456
4 456
5 789
6 789
Data step:.
data have;
input
var1 var2 var3; cards;
123 456 .
123 . 789
. 456 789
;run;
Not sure why you read the numbers in as char, but if I change to num, it could be done like this:
data have;
input var1 var2 var3;
cards;
123 456 .
123 . 789
. 456 789
;run;
data want (keep=var4);
set have;
var4=var1;if var4 ne . then output;
var4=var2;if var4 ne . then output;
var4=var3;if var4 ne . then output;
run;
OK, let's assume you have a file vith the values in it, and you do not know how many variables are in each row. First I need to create a sample textfile:
filename x temp;
data _nulL_;
file x;
put "123 456 . ";
put "123 . 789 ";
put ". 456 789 ";
run;
Then I need to read the first line and count the number of variables:
data _null_;
infile x;
input;
call symputx("number_of_variables",put(countw(_infile_," ","c"),best.));
stop;
run;
%put &number_of_variables;
Now I can dynamically read the variables:
%macro doit();
data have;
infile x;
input
%do i=1 %to &number_of_variables;
var&i
%end;
;
run;
data want (keep=var%eval(&number_of_variables + 1));
set have;
%do i=1 %to &number_of_variables;
var%eval(&number_of_variables + 1)=var&i;
if var%eval(&number_of_variables + 1) ne . then output;
%end;
run;
%mend;
%doit;
You can use proc transpose to do this but there is a trick to doing so. You will need to append a unique identifier to each row, prior to doing the transpose.
I've taken #Stig's sample data and added the observation number to use as a unique identifier:
data have;
input var1 var2 var3;
x = _n_; * ADDING A UNIQUE IDENTIFIER TO EVERY ROW;
cards;
123 456 .
123 . 789
. 456 789
;run;
Then it's simply a case of running proc transpose:
proc transpose data=have out=xx;
by x;
run;
And finally, remove any results where col1 is missing, and add in the observation number:
data want;
obs = _n_;
set xx (keep=col1);
where col1 ne .;
run;
As the order is not important then you can do this in one step, using arrays. As the data step moves through each row, the array enables the variable values to be stored in memory, so you can loop through them. I've set it up so that each time a non-missing value is found, then output it to the new variable.
In creating the array, I've set it to var1--var3, the double dash means all variables between var1 and var3 inclusive. If your real variables are numbered the same way then you can use var1-var3, which means all sequential numbers between the two variables.
data have;
input var1 var2 var3;
datalines;
123 456 .
123 . 789
. 456 789
;
run;
data want;
set have;
array allnums var1--var3;
do i = 1 to dim(allnums);
if not missing(allnums{i}) then do;
var4 = allnums{i};
output;
end;
end;
drop var1--var3 i;
run;