Replace the missing values in SAS - sas

I want to replace the missing values with the next variables by pushing the values towards H1, Please see the example below. I have placed the desired output below.
b
SN OP_NAME H1 H2 H3 H4 H5
115060 NORS . 2331
115060 WIDE .
115061 .
115061 AIR . 7680
115061 ALLI .
115061 SKYW 1594
115062 NORS . .
115062 WIDE 3130 .
115063 NORS . 5414
115063 WIDE .
115064 ATLA 5231 . 11259 .
115066 ATLA 9637 . 5191 .
115067 LUXA .
115069 ATLA . 5963 .
115070 AMER 7457
115070 ATLA 10181
115070 WEST .
115072 JETS 10517
115073 SKYW . . 5515 . .
115074 MIDW .
115075 SKYW . . 4291 3499 11549
115076 DLTN 3918
Output looks like:`
SN OP_NAME H1 H2 H3
115060 NORS 2331
115060 WIDE .
115061 .
115061 AIR 7680
115061 ALLI .
115061 SKYW 1594
115062 NORS . .
115062 WIDE 3130 .
115063 NORS 5414
115063 WIDE .
115064 ATLA 5231 11259
115066 ATLA 9637 5191
115067 LUXA .
115069 ATLA 5963 .
115070 AMER 7457
115070 ATLA 10181
115070 WEST .
115072 JETS 10517
115073 SKYW 5515 .
115074 MIDW .
115075 SKYW 4291 3499 11549
115076 DLTN 3918

A cheeky double proc transpose ought to do the trick (the first datastep is some test code):
data test_code;
serial=1; h1=3; h2=.; h3=55; output;
serial=2; h1=.; h2=.; h3=32; h4=.; output;
serial=3; h1=45; h2=23; h3=.; h4=99; output;
serial=4; h1=.; h2=.; h3=5; output;
proc sort;
by serial;
run;
proc transpose data=test_code out=test_code_tran(drop=_:);
by serial;
var h:;
proc transpose data=test_code_tran prefix=h out=final_output(drop=_:);
by serial;
var col1;
where col1;
run;
As programmed above though, it will only work with numeric values in the h* variables

The simplest way is probably a double-counter loop.
data want;
set have;
array hs h:;
_counter=2;
do _t = 1 to dim(hs)-1 while (_counter le dim(hs));
if missing(hs[_t]) then do;
do while (missing(hs[_counter]));
_counter+1;
if _counter > dim(hs) then leave;
end;
put _t= _counter=;
if _counter le dim(hs) then do;
hs[_t] = hs[_counter];
call missing(hs[_counter]);
_counter+1;
end;
end;
end;
run;
The PROC TRANSPOSE option is less code and more flexible; this may be faster if you have a ton of rows.

In a similar method to Joe's, I'd use arrays for this kind of processing...
%LET NVARS = 5 ;
data want ;
set have ;
array _t{&NVARS} _TEMPORARY_ ;
array _n H1-H&NVARS ;
t = 0 ;
/* Load non-missing values into temporary array */
do i = 1 to dim(_n) ;
if not missing(_n{i}) then do ;
t + 1 ;
_t{t} = _n{i} ;
end ;
end ;
/* Load temporary array back into source array */
call missing(of _n{*}) ;
do i = 1 to t ;
_n{i} = _t{i} ;
end ;
drop i t ;
run ;

Related

SAS for loop questions

Players from 1 to 50 are placed in a row in order. The coach said: "Odds number athletes out!" The remaining athletes re-queue and re-number. The coach ordered again: "Odds number athletes out!" In this way, there is only one person left at last. What number of athletes is he? What if the coach's keep ordering "Even number athletes out!" Who is left at the end?
I know it requires me to use loop in SAS to answer the question. But can only write code below:
data a;
do i=1 to 50;
output;
end;
run;
proc sql;
select i
from a
where mod(i,2**5)=0;
quit;
But it won't work for keeping the last odd number athelete. Could you guys figure out a way to simulate this process by using loop? Thanks so much
#Doris welcome :-)
Try this. The Final_Player data set contains the number of the final player in the simulation.
Simply change the mod(N, 2) = 0 to = 1 for the even problem. Feel free to ask.
data _null_;
dcl hash h(ordered : 'y');
h.definekey('p');
h.definedone();
dcl hiter ih('h');
dcl hash i(ordered : 'Y');
i.definekey('id');
i.definedone();
dcl hiter ii('i');
do p = 1 to 50;
h.add();
end;
id = .;
do while (h.num_items > 1);
do _N_ = 1 by 1 while (ih.next() = 0);
if mod(_N_, 2) = 1 then do;
i.add(key : p, data : p);
end;
end;
do while (ii.next() = 0);
rc = h.remove(key : id);
end;
i.clear();
end;
h.output(dataset : 'Final_Player');
run;
Just use algebra.
want = 2 ** floor( log2(n) );
So if you are starting with an arbitrary dataset you can find the one observation you need directly.
data want;
point = 2**floor(log2(nobs));
set a point=point nobs=nobs;
output;
stop;
put i= ;
run;
Here is example using array showing how it works.
373 data test;
374 array x [15];
375 do index=1 to dim(x); x[index]=index; end;
376 do iteration=1 by 1 while(n(of x[*])>1);
377 do index= 2**(iteration-1) to dim(x) by 2**iteration ;
378 x[index]=.;
379 end;
380 put iteration= (x[*]) (3.);
381 end;
382 do index=1 to dim(x) until(x[index] ne .);
383 end;
384 put index= x[index]= ;
385
386 run;
iteration=1 . 2 . 4 . 6 . 8 . 10 . 12 . 14 .
iteration=2 . . . 4 . . . 8 . . . 12 . . .
iteration=3 . . . . . . . 8 . . . . . . .
index=8 x8=8

sas sum based on row by row and flag

I have a data like below
laonno debit childno credit
1234 4162.98 . .
1234 0.02 . .
. . 1234 1387.66
. . 1234 1387.66
. . 1234 1387.66
I need output as when the debit sum is equals to credit sum then for those observations flag should be generated as mentioned below
laonno debit childno credit flag
1234 4162.98 . . matched
1234 0.02 . . N
. . 1234 1387.66 matched
. . 1234 1387.66 matched
. . 1234 1387.66 matched
The data rows will be dynamic but when the sum of debit matches credit then the following flag should be as "MATCHED" .
If your data is representative, here is one way
data want (drop=s);
if _N_ = 1 then do;
dcl hash h ();
h.definekey ('childno');
h.definedata ('s');
h.definedone ();
dcl hash hh ();
hh.definekey ('laonno');
hh.definedone ();
do until (lr);
set yy(where=(childno)) end=lr;
if h.find() ne 0 then s = credit;
else s = sum(s, credit);
h.replace();
end;
end;
set yy;
s = .;
if h.find(key : laonno) = 0 & round(s, .001) = debit then do;
flag = 'Matched';
hh.ref();
end;
else flag = 'N';
if hh.check(key : childno) = 0 then flag = 'Matched';
run;

SAS retain statement not working as I hoped

I have the following dataset
data have;
input SUBJID VISIT$ PARAMN ABLF$ AVAL;
cards;
1 screen 1 . 151
1 random 1 YES .
1 visit1 1 . .
1 screen 2 . 65.5
1 random 2 YES 65
1 visit1 2 . .
1 screen 3 . .
1 random 3 YES 400
1 visit1 3 . 420
;
run;
I want to create another variable called BASE that captures the value of AVAL (when there is an actual value in place) when ABLF=YES and and then drag it down until a new PARAMN is encountered.
Basically I want the output to look like this
SUBJID VISIT$ PARAMN ABLF$ AVAL BASE;
1 screen 1 . 151 .
1 random 1 YES . .
1 visit1 1 . . .
1 screen 2 . 65.5 65
1 random 2 YES 65 65
1 visit1 2 . . 65
1 screen 3 . . 400
1 random 3 YES 400 400
1 visit1 3 . 420 400
I used the the following code
data want;
set have;
by SUBJID PARAMN;
if first.PARAMN and ABLF=' ' then BASE=.;
if ABLF='YES' then BASE=AVAL;
retain BASE;
run;
however when I run this I don't the data to look exactly as I want above
RETAIN does not look like the right tool for this. RETAIN can only move data forward in the file. It cannot move it backwards.
Looks like there is just one observation with the "BASE" value. So just merge it back onto the data.
data want;
merge have
have(keep=subjid paramn aval ablf rename=(aval=BASE ablf=xx)
where=(xx='YES'))
;
by SUBJID PARAMN;
drop xx;
run;
Pro SQL:
proc sql;
select a.*,b.aval as BASE from have a left join have(drop=visit where=(ablf='YES')) b
on a.subjid=b.subjid and a.paramn=b.paramn;
quit;
Double do loop:
data want;
do until(last.visit);
set have;
retain temp;
by subjid paramn notsorted;
if ablf='YES' then temp=aval;
end;
do until(last.visit);
set have;
by subjid paramn notsorted;
base=temp;
end;
drop temp;
run;

Create graph using PROC GPLOT with forced sorting for time

I am trying to create a graph in SAS Enterprise guide. The data that I am plotting is as follows:
col1 col2 col3
1 12:00 20
2 13:00 30
3 14:00 15
. . .
. . .
25 24:00 90
26 01:00 25
27 02:00 45
. . .
. . .
36 11:00 35
I need col2 on horizontal axis and col3 on vertical axis. col1 is reference for col2(time values).
The problem is sorting col2. if there is a way i can force the sorting for col2, i think it will work.
This what i have
SYMBOL1
INTERPOL=JOIN
HEIGHT=10pt
VALUE=NONE
LINE=1
WIDTH=2
CV = _STYLE_
;
SYMBOL2
INTERPOL=JOIN
HEIGHT=10pt
VALUE=NONE
LINE=1
WIDTH=2
CV = _STYLE_
;
Legend1
FRAME;
Axis1
STYLE=1
WIDTH=1
MINOR=NONE
ORDER=0 TO 200 BY 10;
Axis2
STYLE=1
WIDTH=1
ORDER=0 TO 36 BY 1
MINOR=
(NUMBER=1
);
TITLE;
TITLE1 "test_graph";
FOOTNOTE;
PROC GPLOT DATA = input_data;
PLOT col2 * col3 /
VAXIS=AXIS1
HAXIS=AXIS2
FRAME LHREF=34
CHREF=BLACK
HREF=0 TO 36 BY 1
LEGEND=LEGEND1;
RUN; QUIT;
other than that i tried adding the below statement to force sorting, but it doesn't work.
order=(12:00,13:00,14:00,......23:00,0:00,1:00,2:00,....11:00)
Please advice.
thanks
Used the code below and it works fine.
proc sgplot data=input_data ;
xaxis values=("12:00" "13:00" "14:00"......"23:00" "0:00" "1:00" "2:00"...."11:00")
label="time";
yaxis integer values=(0 TO 200 BY 10) label="numb";
series x=col2 y=col3 ;
run;

PROC Trasnpose/query builder in SAS Enterprise Guide

After I run Transpose in query builder, i get '.' for empty fields. Is there a way to avoid these '.' ? I can remove those in the next step by adding a case statement but doing this for more than 100 columns won't be a good idea.
123019 1 . . .
166584 . 1 . .
171198 . . 1 .
285703 . . . 1
309185 . . . 2
324756 . . . 1
335743 . . . .
348340 . . . .
Please help.
Thanks
Dot (missing) is identical to "blank" in SAS. If you're actually printing the data out, you can use the statement:
options missing=' '; *or 0 or any other character;
That will be shown for missing (null/blank) values. In some contexts that may not be preserved, in which case you either use a data step to convert to zero, or use PROC STDIZE:
proc stdize data=mydataset missing=0 reponly;
run;
which may be faster/easier to code, if you have SAS/STAT licensed.
You can use this code:
data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then a(i) = 0;
end;
drop i;
run;
Or you can just run all the steps and add this at the bottom of your datastep:
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then a(i) = 0;
end;
drop i; .
BTW this will replace the . to zeros, the "." represents a missing value in SAS, you can replace the 0 on the code that I provided for any other value you want to show instead of .
EDIT:given your inputs the code should be like this:
PROC SORT DATA=ABC
OUT=ABC1 ;
BY EMP;
RUN;
PROC TRANSPOSE DATA=ABC1 OUT=ABC2 NAME=Source LABEL=Label;
BY EM;
ID VC;
VAR FQ;
/* ------------------------------------------------------------------- End of task code. ------------------------------------------------------------------- /
RUN; QUIT;
/* Start of custom user code. */
data ABC2;
set ABC2;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then a(i) = 0;
end;
drop i;
run;