How to complete data in a SAS table? - sas

I have a table in SAS and it looks like this:
The primary key is Name-Surname.
Row Name Surname Country Sec Salary
1 Foo Bar SP 1 1500
2 Foo Bar SP 2
3 Foo Bar 3 1500
4 Foo1 Bar1 1 2000
5 Foo1 Bar1 IT 2 2000
6 Foo1 Bar1 IT 3 2000
7 Foo1 Bar1 IT 4
8 Foo2 Bar2 PO 1
8 Foo2 Bar2 2 850
9 Foo2 Bar2 3
10 Foo2 Bar2 PO 4
It has empty fields, how can I fill it so that they are as in the table below?
Row Name Surname Country Sec Salary
1 Foo Bar SP 1 1500
2 Foo Bar SP 2 1500
3 Foo Bar SP 3 1500
4 Foo1 Bar1 IT 1 2000
5 Foo1 Bar1 IT 2 2000
6 Foo1 Bar1 IT 3 2000
7 Foo1 Bar1 IT 4 2000
8 Foo2 Bar2 PO 1 850
8 Foo2 Bar2 PO 2 850
9 Foo2 Bar2 PO 3 850
10 Foo2 Bar2 PO 4 850
Thank you.

A DOW loop can to process the by groups to identify the 1st non-missing value which is then to be used as the imputation value.
data have; input
Row Name $ Surname $ Country $ Sec Salary; datalines;
1 Foo Bar SP 1 1500
2 Foo Bar SP 2 .
3 Foo Bar . 3 1500
4 Foo1 Bar1 . 1 2000
5 Foo1 Bar1 IT 2 2000
6 Foo1 Bar1 IT 3 2000
7 Foo1 Bar1 IT 4 .
8 Foo2 Bar2 PO 1 .
8 Foo2 Bar2 . 2 850
9 Foo2 Bar2 . 3 .
10 Foo2 Bar2 PO 4 .
;
data want;
do _n_ = 1 by 1 until (last.surname);
set
have (obs=0 rename=(country=_1st_country salary=_1st_salary))
have
;
by name surname;
if missing(_1st_country) then if not missing(country) then _1st_country = country;
if missing(_1st_salary ) then if not missing(salary ) then _1st_salary = salary;
end;
do _n_ = 1 to _n_;
set have;
if missing(country) then country = _1st_country;
if missing(salary ) then salary = _1st_salary;
OUTPUT;
end;
drop _1st:;
run;

Assuming your data is sorted by Name and Surname and you want to take over values only from lines with the same name and surname,
For each combination of name and surname read in all data twice.
data want;
set have (in=first_visit) have (in=second_visit);
by Name Surname;
The first visit remember the Country and Salary from the lines it is filled in.
In case different non missing values exist, put a warning in the log.
if first_visit then do;
if first.Surname then do;
_Country = Country;
_Salary = Salary;
end;
else do;
if missing(_Country) then _Country = Country;
else if _Country ne Country and not missing(Country) then put
'WARNING: different values:' Country= ' and ' _Country
' for ' Name= Surname=;
if missing(_Salary) then _Salary = Salary;
else if _Salary ne Salary and not missing(Salary) then put
'WARNING: different values:' Salary= ' and ' _Salary
' for ' Name= Surname=;
end;
end;
The second visit, fill in the blanks with the values retained from the first visit. (Note that we don't need the variable second_visit, but it is easier to understand if I define it anyway.)
else do; * this is the _second_visit ;
if missing(Country) then Country = _Country;
if missing(Salary) then Salary = _Salary;
end;
To make this work, we must explicitly retain the temporary values, because SAS initialises all variables for each observation by default. (I started all their names with _, because I can then refer them with a wildcard, but that only works if you put the retain statement after the creation of the variables.)
retain _:;
As the retained values have no further use, drop them from the result. _(Note that first_visit and second_visit are also dropped, because of the way we defined them.)_
drop _:;
run;

Related

SAS: assign a value to a variable based on characteristics of other variables

I'd like to assign to an empty field a value based on many values of other entries. Here my dataset:
input ID date $10. type typea $10. ;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 . 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 . 1 cb
5 . 2 b
;
run;
My goal is the following: for all empty entries of the variable "date", assign to it the same date of the record which has the same ID, the same type, but a different typea. If there aren't other records with the criteria described, leave the date field empty. So the output should be:
data temp;
input ID date $10. type typea $10.;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 10/12/2006 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 11/09/2008 1 cb
5 . 2 b
;
run;
I tried with something like that based on another answer on SO (SAS: get the first value where a condition is verified by group), but it doesn't work:
by ID type typea ;
run;
data temp;
set temp;
by ID type typea ;
if cat(first.ID, first.type, first.typea) then date_store=date;
if cat(ID eq ID and type ne type and typea eq typea) then do;
date_change_type1to2=date_store;
end;
run;
Do you have any hints? Thanks a lot!
You could use UPDATE statement to help you carry-forward the DATE values for a group.
data have;
input ID type typea :$10. date :yymmdd. ;
format date yymmdd10.;
datalines;
1 1 a 2006-11-10
2 2 a 2006-12-10
2 2 b .
3 5 p 2007-01-20
4 1 r .
5 1 ca 2008-09-11
5 1 cb .
5 2 b .
;
data want;
update have(obs=0) have;
by id type ;
output;
run;
If there are also missing values of TYPEA then those will also be carried forward. If you don't want that to happen you could re-read just those variables after the update.
data want;
update have(obs=0) have;
by id type ;
set have(keep=typea);
output;
run;

SAS flag each row that contains the max value

I tried searching but couldn't exactly find what I was looking for. I have a dataset with multiple rows per ID. I'd like to add a variable called maxdec and show a 1 for each row that has the max dec for each ID.
Sample Dataset:
ID DEC
123 1
123 2
123 2
123 2
456 2
456 3
456 3
Desired Output:
ID DEC MAXDEC
123 1 .
123 2 1
123 2 1
123 2 1
456 2 .
456 2 .
456 3 1
It is easier to define it with 1 or 0 instead of 1 or missing.
proc sql;
create table want as
select id,dec, dec=max(dec) as maxdec
from have
group by id
;
quit;
proc sort data=have;
by id;
proc summary data=have;
class id;
var dec;
output out=max_info max=max_value;
run;
data want;
merge have
max_info (keep=id max_value)
;
by id;
if dec=max_value then maxdec=1;
run;
The proc summary calculates the maximum value of DEC for each ID, and outputs as variable MAX_VALUE in dataset MAX_INFO. The subsequent data step assigns MAXDEC=1 if the current value of DEC is equal to MAX_VALUE for that ID.
Here is a DoW loop approach
data have;
input ID DEC;
datalines;
123 1
123 2
123 2
123 2
456 2
456 3
456 3
;
data want(drop = m);
do _N_ = 1 by 1 until (last.id);
set have;
by id;
m = max(maxdex, dec);
end;
do _N_ = 1 to _N_;
set have;
maxdex = ifn(dec = m, 1, .);
output;
end;
run;

PROC SQL MERGE MISMATCH

ATTACHED SCREENSHOT OF DESIRED OUTPUTthe required condition is
"SUBJECT in A = SUBJECT in B
and
VISIT in A NE(not equal to) VISIT in B"
I would like to find the exact mismatch and missing VISIT from the below Tables A and B by using Proc SQL procedure, Can anyone help me please?
Table A
SUBJECT Test VISIT
1001 ABCB 1
1001 ABCD 2
1001 ABCD 3
1001 ABCD 5
Table B
SUBJECT Test VISIT1
1001 ABCD 2
1001 ABCD 1
1001 ABCD 4
Expected output:
SUBJECT Test VISIT VISIT1
1001 ABCD 3
1001 ABCD 5
1001 ABCD 4
VISIT 3 AND 5 IS PRESENT IN DATASET A NOT IN B AND VISIT 4 IS PRESENT IN DATASET2 NOT IN DATASET A , LIKE WISE
CODE FOR DATASET-
DATA A;
LENGTH SUBJECT 8 Test $10 visit 8;
INPUT SUBJECT Test $ visit ;
DATALINES;
1001 ABCD 1
1001 ABCD 2
1001 ABCD 3
1001 ABCD 5
;
RUN;
DATA B;
LENGTH SUBJECT 8 Test $10 visit1 8;
INPUT SUBJECT Test $ visit1 ;
DATALINES;
1001 ABCD 2
1001 ABCD 1
1001 ABCD 4
;
RUN;
Thanks in advance!
the code i tried is below (but not working as expected)-
****************(VISIT ) in A and not in B****;
proc sql;
create table SS1 as
select distinct a.* FROM
A a where a.visit not in(select s.visit1 from B s WHERE A.SUBJECT = S.SUBJECT );
create table INRAVE as
select * from SS1 A
left join
B B
on a.subject=b.SUBJECT and a.VISIT NE b.VISIT1
where b.SUBJECT is not null
;
quit;
****************VISIT in B and not in A****;
proc sql;
create table SS2 as
select distinct a.* from
B a where a.VISIT1 not in(select S.VISIT from A s WHERE A.SUBJECT = S.SUBJECT );
create table INVENDOR as
select * from SS2 A
left join
A B
on a.subject=b.SUBJECT and a.VISIT1 NE b.VISIT
where b.SUBJECT is not null
;
quit;
data ALL;;
set inrave invendor;
where subject=subject ;
RUN;
Seems you know SQL very well, why not try union all, just like this:
proc sql noprint;
create table C as
select *, 'A' as Source from A
where catx('#',SUBJECT,Test,visit) not in (
select distinct catx('#',SUBJECT,Test,visit1) from B
)
union all corr
select *, 'B' as Source from B(rename=VISIT1=VISIT)
where catx('#',SUBJECT,Test,visit) not in (
select distinct catx('#',SUBJECT,Test,visit) from A
)
;
create table D(drop=TmpVISIT Source) as
select *,
case when Source = 'B' then . else TmpVISIT end as VISIT,
case when Source = 'B' then TmpVISIT else . end as VISIT1
from C(rename=VISIT=TmpVISIT);
quit;
I get all obs from dataset A where not repeat in dataset B and do the oppsite with dataset B.
Well, I also get another solution, which is shorter:
proc sql noprint;
select catx('#',SUBJECT,Test,visit) into :Ununique separated by '" "' from (
select * from A union all select * from B(rename=visit1=visit)
)
group by SUBJECT, Test, visit
having count(*) > 1;
quit;
data D;
set A B;
if catx('#',SUBJECT,Test,coalesce(visit1,visit)) in ("&Ununique") then delete;
run;
Whereas, this method is limited by the max lenth of macro variable.

Automatically replace outlying values with missing values

Suppose the data set have contains various outliers which have been identified in an outliers data set. These outliers need to be replaced with missing values, as demonstrated below.
Have
Obs group replicate height weight bp cholesterol
1 1 A 0.406 0.887 0.262 0.683
2 1 B 0.656 0.700 0.083 0.836
3 1 C 0.645 0.711 0.349 0.383
4 1 D 0.115 0.266 666.000 0.015
5 2 A 0.607 0.247 0.644 0.915
6 2 B 0.172 333.000 555.000 0.924
7 2 C 0.680 0.417 0.269 0.499
8 2 D 0.787 0.260 0.610 0.142
9 3 A 0.406 0.099 0.263 111.000
10 3 B 0.981 444.000 0.971 0.894
11 3 C 0.436 0.502 0.563 0.580
12 3 D 0.814 0.959 0.829 0.245
13 4 A 0.488 0.273 0.463 0.784
14 4 B 0.141 0.117 0.674 0.103
15 4 C 0.152 0.935 0.250 0.800
16 4 D 222.000 0.247 0.778 0.941
Want
Obs group replicate height weight bp cholesterol
1 1 A 0.4056 0.8870 0.2615 0.6827
2 1 B 0.6556 0.6995 0.0829 0.8356
3 1 C 0.6445 0.7110 0.3492 0.3826
4 1 D 0.1146 0.2655 . 0.0152
5 2 A 0.6072 0.2474 0.6444 0.9154
6 2 B 0.1720 . . 0.9241
7 2 C 0.6800 0.4166 0.2686 0.4992
8 2 D 0.7874 0.2595 0.6099 0.1418
9 3 A 0.4057 0.0988 0.2632 .
10 3 B 0.9805 . 0.9712 0.8937
11 3 C 0.4358 0.5023 0.5626 0.5799
12 3 D 0.8138 0.9588 0.8293 0.2448
13 4 A 0.4881 0.2731 0.4633 0.7839
14 4 B 0.1413 0.1166 0.6743 0.1032
15 4 C 0.1522 0.9351 0.2504 0.8003
16 4 D . 0.2465 0.7782 0.9412
The "get it done" approach is to manually enter each variable/value combination in a conditional which replaces with missing when true.
data have;
input group replicate $ height weight bp cholesterol;
datalines;
1 A 0.4056 0.8870 0.2615 0.6827
1 B 0.6556 0.6995 0.0829 0.8356
1 C 0.6445 0.7110 0.3492 0.3826
1 D 0.1146 0.2655 666 0.0152
2 A 0.6072 0.2474 0.6444 0.9154
2 B 0.1720 333 555 0.9241
2 C 0.6800 0.4166 0.2686 0.4992
2 D 0.7874 0.2595 0.6099 0.1418
3 A 0.4057 0.0988 0.2632 111
3 B 0.9805 444 0.9712 0.8937
3 C 0.4358 0.5023 0.5626 0.5799
3 D 0.8138 0.9588 0.8293 0.2448
4 A 0.4881 0.2731 0.4633 0.7839
4 B 0.1413 0.1166 0.6743 0.1032
4 C 0.1522 0.9351 0.2504 0.8003
4 D 222 0.2465 0.7782 0.9412
;
run;
data outliers;
input parameter $ 11. group replicate $ measurement;
datalines;
cholesterol 3 A 111
height 4 D 222
weight 2 B 333
weight 3 B 444
bp 2 B 555
bp 1 D 666
;
run;
EDIT: Updated outliers so that parameter avoids truncation and changed measurement to be numeric type so as to match the corresponding height, weight, bp, cholesterol. This shouldn't change the responses.
data want;
set have;
if group = 3 and replicate = 'A' and cholesterol = 111 then cholesterol = .;
if group = 4 and replicate = 'D' and height = 222 then height = .;
if group = 2 and replicate = 'B' and weight = 333 then weight = .;
if group = 3 and replicate = 'B' and weight = 444 then weight = .;
if group = 2 and replicate = 'B' and bp = 555 then bp = .;
if group = 1 and replicate = 'D' and bp = 666 then bp = .;
run;
This, however, doesn't utilize the outliers data set. How can the replacement process be made automatic?
I immediately think of the IN= operator, but that won't work. It's not the entire row which needs to be matched. Perhaps an SQL key matching approach would work? But to match the key, don't I need to use a where statement? I'd then effectively be writing everything out manually again. I could probably create macro variables which contain the various if or where statements, but that seems excessive.
I don't think generating statements is excessive in this case. The complexity arises here because your outlier dataset cannot be merged easily since the parameter values represent variable names in the have dataset. If it is possible to reorient the outliers dataset so you have a 1 to 1 merge, this logic would be simpler.
Let's assume you cannot. There are a few ways to use a variable in a dataset that corresponds to a variable in another.
You could use an array like array params{*} height -- cholesterol; and then use the vname function as you loop through the array to compare to the value in the parameter variable, but this gets complicated in your case because you have a one to many merge, so you would have to retain the replacements and only output the last record for each by group... so it gets complicated.
You could transpose the outliers data using proc transpose, but that will get lengthy because you will need a transpose for each parameter, and then you'd need to merge all the transposed datasets back to the have dataset. My main issue with this method is that code with a bunch of transposes like that gets unwieldy.
You create the macro variable logic you are thinking might be excessive. But compared to the other ways of getting the values of the parameter variable to match up with the variable names in the have dataset, I don't think something like this is excessive:
data _null_;
set outliers;
call symput("outlierstatement"||_n_,"if group = "||group||" and replicate = '"||replicate||"' and "||parameter||" = "||measurement||" then "|| parameter ||" = .;");
call symput("outliercount",_n_);
run;
%macro makewant();
data want;
set have;
%do i = 1 %to &outliercount;
&&outlierstatement&i;
%end;
run;
%mend;
Lorem:
Transposition is the key to a fully automatic programmatic approach. The transposition that will occur is of the filter data, not the original data. The transposed filter data will have fewer rows than the original. As John indicated, transposition of the want data can create a very tall table and has to be transposed back after applying the filters.
As to the the filter data, the presence of a filter row for a specific group, replicate and parameter should be enough to mark a cell for filtering. This is on the presumption that you have a system for automatic outlier detection and the filter values will always be in concordance with the original values.
So, what has to be done to automate the filter application process without code generating a wall of test and assign statements ?
Transpose filter data into same form as want data, call it Filter^
Merge Want and Filter^ by record key (which is the by group of Group and Replicate)
Array process the data elements, looking for filtering conditions.
For your consideration, try the following SAS code. There is an erroneous filter record added to the mix.
data have;
input group replicate $ height weight bp cholesterol;
datalines;
1 A 0.4056 0.8870 0.2615 0.6827
1 B 0.6556 0.6995 0.0829 0.8356
1 C 0.6445 0.7110 0.3492 0.3826
1 D 0.1146 0.2655 666 0.0152
2 A 0.6072 0.2474 0.6444 0.9154
2 B 0.1720 333 555 0.9241
2 C 0.6800 0.4166 0.2686 0.4992
2 D 0.7874 0.2595 0.6099 0.1418
3 A 0.4057 0.0988 0.2632 111
3 B 0.9805 444 0.9712 0.8937
3 C 0.4358 0.5023 0.5626 0.5799
3 D 0.8138 0.9588 0.8293 0.2448
4 A 0.4881 0.2731 0.4633 0.7839
4 B 0.1413 0.1166 0.6743 0.1032
4 C 0.1522 0.9351 0.2504 0.8003
4 D 222 0.2465 0.7782 0.9412
5 E 222 0.2465 0.7782 0.9412 /* test record for filter value misalignment test */
;
run;
data outliers;
length parameter $32; %* <--- widened parameter so it can transposed into column via id;
input parameter $ group replicate $ measurement ; %* <--- changed measurement to numeric variable;
datalines;
cholesterol 3 A 111
height 4 D 222
height 5 E 223 /* test record for filter value misalignment test */
weight 2 B 333
weight 3 B 444
bp 2 B 555
bp 1 D 666
;
run;
data want;
set have;
if group = 3 and replicate = 'A' and cholesterol = 111 then cholesterol = .;
if group = 4 and replicate = 'D' and height = 222 then height = .;
if group = 2 and replicate = 'B' and weight = 333 then weight = .;
if group = 3 and replicate = 'B' and weight = 444 then weight = .;
if group = 2 and replicate = 'B' and bp = 555 then bp = .;
if group = 1 and replicate = 'D' and bp = 666 then bp = .;
run;
/* Create a view with 1st row having all the filtered parameters
* This is necessary so that the first transposed filter row
* will have the parameters as columns in alphabetic order;
*/
proc sql noprint;
create view outliers_transpose_ready as
select distinct parameter from outliers
union
select * from outliers
order by group, replicate, parameter
;
/* Generate a alphabetic ordered list of parameters for use
* as a variable (aka column) list in the filter application step */
select distinct parameter
into :parameters separated by ' '
from outliers
order by parameter
;
quit;
%put NOTE: &=parameters;
/* tranpose the filter data
* The ID statement pivots row data into column names.
* The prefix=_filter_ ensure the new column names
* will not collide with the original data, and can be
* the shortcut listed with _filter_: in an array statement.
*/
proc transpose data=outliers_transpose_ready out=outliers_apply_ready prefix=_filter_;
by group replicate notsorted;
id parameter;
var measurement;
run;
/* Robust production code should contain a bin for
* data that does not conform to the filter application conditions
*/
data
want2(label="Outlier filtering applied" drop=_i_ _filter_:)
want2_warnings(label="Outlier filtering: misaligned values")
;
merge have outliers_apply_ready(keep=group replicate _filter_:);
by group replicate;
/* The arrays are for like named columns
* due to the alphabetic ordering enforced in data and codegen preparation
*/
array value_filter_check _filter_:;
array value &parameters;
if group ne .;
do _i_ = 1 to dim(value);
if value(_i_) EQ value_filter_check(_i_) then
value(_i_) = .;
else
if not missing(value_filter_check(_i_)) AND
value(_i_) NE value_filter_check(_i_)
then do;
put 'WARNING: Filtering expected but values do not match. ' group= replicate= value(_i_)= value_filter_check(_i_)=;
output want2_warnings;
end;
end;
output want2;
run;
Confirm your want and automated want2 agree.
proc compare noprint data=want compare=want2 outnoequal out=diffs;
by group replicate;
run;
Enjoy your SAS
You could use a hash table. Load a hash table with the outlier dataset, with parameter-group-replicate defined as the key. Then read in the data, and as you read each record, check each of the variables to see if that combination of parameter-group-replicate can be found in the hash table. I think below works (I'm no hash expert):
data want;
if 0 then set outliers (keep=parameter group replicate);
if _N_ = 1 then
do;
declare hash h(dataset:'outliers') ;
h.defineKey('parameter', 'group', 'replicate') ;
h.defineDone() ;
end;
set have ;
array vars {*} height weight bp cholesterol ;
do i=1 to dim(vars);
parameter=vname(vars{i});
if h.check()=0 then call missing(vars{i});
end;
drop i parameter;
run;
I like #John's suggestion:
You could use an array like array params{*} height -- cholesterol; and
then use the vname function as you loop through the array to compare
to the value in the parameter variable, but this gets complicated in
your case because you have a one to many merge, so you would have to
retain the replacements and only output the last record for each by
group... so it gets complicated.
Generally in a one to many merge I would avoid recoding variables from the dataset that is unique, because variables are retained within BY groups. But in this case, it works out well.
proc sort data=outliers;
by group replicate;
run;
data want (keep=group replicate height weight bp cholesterol);
merge have (in=a)
outliers (keep=group replicate parameter in=b)
;
by group replicate;
array vars {*} height weight bp cholesterol ;
do i=1 to dim(vars);
if vname(vars{i})=parameter then call missing(vars{i});
end;
if last.replicate;
run;
Thank you #John for providing a proof of concept. My implementation is a little different and I think worth making a separate entry for posterity. I went with a macro variable approach because I feel it is the most intuitive, being a simple text replacement. However, since a macro variable can contain only 65534 characters, it is conceivable that there could be sufficient outliers to exceed this limit. In such a case, any of the other solutions would make fine alternatives. Note that it is important that the put statement use something like best32. Too short a width will truncate the value.
If you desire to have a dataset containing the if statements (perhaps for verification), simply remove the into : statement and place a create table statements as line at the beginning of the PROC SQL step.
data have;
input group replicate $ height weight bp cholesterol;
datalines;
1 A 0.4056 0.8870 0.2615 0.6827
1 B 0.6556 0.6995 0.0829 0.8356
1 C 0.6445 0.7110 0.3492 0.3826
1 D 0.1146 0.2655 666 0.0152
2 A 0.6072 0.2474 0.6444 0.9154
2 B 0.1720 333 555 0.9241
2 C 0.6800 0.4166 0.2686 0.4992
2 D 0.7874 0.2595 0.6099 0.1418
3 A 0.4057 0.0988 0.2632 111
3 B 0.9805 444 0.9712 0.8937
3 C 0.4358 0.5023 0.5626 0.5799
3 D 0.8138 0.9588 0.8293 0.2448
4 A 0.4881 0.2731 0.4633 0.7839
4 B 0.1413 0.1166 0.6743 0.1032
4 C 0.1522 0.9351 0.2504 0.8003
4 D 222 0.2465 0.7782 0.9412
;
run;
data outliers;
input parameter $ 11. group replicate $ measurement;
datalines;
cholesterol 3 A 111
height 4 D 222
weight 2 B 333
weight 3 B 444
bp 2 B 555
bp 1 D 666
;
run;
proc sql noprint;
select
cat('if group = '
, strip(put(group, best32.))
, " and replicate = '"
, strip(replicate)
, "' and "
, strip(parameter)
, ' = '
, strip(put(measurement, best32.))
, ' then '
, strip(parameter)
, ' = . ;')
into : listIfs separated by ' '
from outliers
;
quit;
%put %quote(&listIfs);
data want;
set have;
&listIfs;
run;

SAS: Copy one observation onto another?

I have dataset M
number id_no date
1 123 3/3/2012
2 123 3/3/2012
3 . .
4 . .
How do I copy 123 and 3/3/2012 into the obs 4 & 5.
This should get you there.
data one;
input
number id_no date mmddyy10.;
format date mmddyy10.;
datalines;
1 123 3/3/2012
2 123 3/3/2012
3 . .
4 . .
5 456 .
;
run;
proc sort data = one;
by number;
run;
data two;
set one;
retain _id_no _date;
if missing(_id_no) then _id_no = id_no;
if missing(id_no) then id_no = _id_no;
if missing(_date) then _date = date;
if missing(date) then date = _date;
drop _id_no _date;
run;