PROC Trasnpose/query builder in SAS Enterprise Guide - sas

After I run Transpose in query builder, i get '.' for empty fields. Is there a way to avoid these '.' ? I can remove those in the next step by adding a case statement but doing this for more than 100 columns won't be a good idea.
123019 1 . . .
166584 . 1 . .
171198 . . 1 .
285703 . . . 1
309185 . . . 2
324756 . . . 1
335743 . . . .
348340 . . . .
Please help.
Thanks

Dot (missing) is identical to "blank" in SAS. If you're actually printing the data out, you can use the statement:
options missing=' '; *or 0 or any other character;
That will be shown for missing (null/blank) values. In some contexts that may not be preserved, in which case you either use a data step to convert to zero, or use PROC STDIZE:
proc stdize data=mydataset missing=0 reponly;
run;
which may be faster/easier to code, if you have SAS/STAT licensed.

You can use this code:
data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then a(i) = 0;
end;
drop i;
run;
Or you can just run all the steps and add this at the bottom of your datastep:
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then a(i) = 0;
end;
drop i; .
BTW this will replace the . to zeros, the "." represents a missing value in SAS, you can replace the 0 on the code that I provided for any other value you want to show instead of .
EDIT:given your inputs the code should be like this:
PROC SORT DATA=ABC
OUT=ABC1 ;
BY EMP;
RUN;
PROC TRANSPOSE DATA=ABC1 OUT=ABC2 NAME=Source LABEL=Label;
BY EM;
ID VC;
VAR FQ;
/* ------------------------------------------------------------------- End of task code. ------------------------------------------------------------------- /
RUN; QUIT;
/* Start of custom user code. */
data ABC2;
set ABC2;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then a(i) = 0;
end;
drop i;
run;

Related

SAS for loop questions

Players from 1 to 50 are placed in a row in order. The coach said: "Odds number athletes out!" The remaining athletes re-queue and re-number. The coach ordered again: "Odds number athletes out!" In this way, there is only one person left at last. What number of athletes is he? What if the coach's keep ordering "Even number athletes out!" Who is left at the end?
I know it requires me to use loop in SAS to answer the question. But can only write code below:
data a;
do i=1 to 50;
output;
end;
run;
proc sql;
select i
from a
where mod(i,2**5)=0;
quit;
But it won't work for keeping the last odd number athelete. Could you guys figure out a way to simulate this process by using loop? Thanks so much
#Doris welcome :-)
Try this. The Final_Player data set contains the number of the final player in the simulation.
Simply change the mod(N, 2) = 0 to = 1 for the even problem. Feel free to ask.
data _null_;
dcl hash h(ordered : 'y');
h.definekey('p');
h.definedone();
dcl hiter ih('h');
dcl hash i(ordered : 'Y');
i.definekey('id');
i.definedone();
dcl hiter ii('i');
do p = 1 to 50;
h.add();
end;
id = .;
do while (h.num_items > 1);
do _N_ = 1 by 1 while (ih.next() = 0);
if mod(_N_, 2) = 1 then do;
i.add(key : p, data : p);
end;
end;
do while (ii.next() = 0);
rc = h.remove(key : id);
end;
i.clear();
end;
h.output(dataset : 'Final_Player');
run;
Just use algebra.
want = 2 ** floor( log2(n) );
So if you are starting with an arbitrary dataset you can find the one observation you need directly.
data want;
point = 2**floor(log2(nobs));
set a point=point nobs=nobs;
output;
stop;
put i= ;
run;
Here is example using array showing how it works.
373 data test;
374 array x [15];
375 do index=1 to dim(x); x[index]=index; end;
376 do iteration=1 by 1 while(n(of x[*])>1);
377 do index= 2**(iteration-1) to dim(x) by 2**iteration ;
378 x[index]=.;
379 end;
380 put iteration= (x[*]) (3.);
381 end;
382 do index=1 to dim(x) until(x[index] ne .);
383 end;
384 put index= x[index]= ;
385
386 run;
iteration=1 . 2 . 4 . 6 . 8 . 10 . 12 . 14 .
iteration=2 . . . 4 . . . 8 . . . 12 . . .
iteration=3 . . . . . . . 8 . . . . . . .
index=8 x8=8

arrays in SAS with more columns

I have a dataset
Data have;
input A B C;
cards;
1 . .
. . 1
1 1 .
run;
And I am looking for an output which is like this.
A B C OUT
1 . . A
. . 1 C
1 1 . A,B
I wrote the program this way:
Data want;
set have;
array U(3)A B C;
do i=1 to 3;
if U(i)^=. then OUT=cat(vname(u(i),',');
end;
run;
This gives only the last VNAME and not the concatenation.
When using a separator with concatenation, then catx is the function to use, or even better call catx which negates the need to put out = and using out in the concatenation as well. Both these functions will trim any leading or trailing blanks.
The other problem with your code is that because out is derived from numeric variables, SAS will default the type to numeric as well. You need to define the type to character beforehand (I've done this with a length statement.
The following code achieves your goal.
Data have;
input A B C;
cards;
1 . .
. . 1
1 1 .
run;
data want;
set have;
length out $20;
array U{3} A B C;
do i = 1 to 3;
if not missing(U{i}) then call catx(',',out,vname(U{i}));
end;
drop i;
run;

arrays in sas with different dimensions

Beginning with a table,
A B C D E
1 . 1 . 1
. . 1 . .
. 1 . 1 .
I am trying to get an output like this:
A B C D E X Y Z
1 . 1 . 1 1 1 1
. . 1 . . 1
. 1 . 1 . 1 1
Here is my code:
data want;
set have;
array GG(5) A-E;
array BB(3) X Y Z;
do i=1 to 5;
do j=1 to 3;
if gg(i)=1 then BB(j)=1;
end;
end;
run;
I understand that the result that I get is wrong, as the dimensions of both the arrays are not co-operating. Is there another way to do this?
data want;
set have;
array v1 a--e;
array v2 x y z;
i=1;
do over v1;
if not missing(of v1) then do;
v2(i)=v1;
i+1;
end;
end;
drop i;
run;
why not something like this using a counter to identify which position is the actual non missing value?
data try;
infile datalines delimiter=',';
input var1 var2 var3 var4 var5;
datalines;
1,.,.,1,1,
.,1,.,.,.,
.,.,.,.,.,
.,1,.,1,.,
;
data want;
set try;
array vars[5] var1 var2 var3 var4 var5;
array newvars[5] nvar1 nvar2 nvar3 nvar4 nvar5;
do i=1 to 5;
if i=1 then count=0;
if vars[i] ne . then do;
count=count+1;
newvars[count]=vars[i];
end;
end;
drop i count var:;
run;
My method is to copy the existing values to new variables X Y X temp1 temp2, then sort the values using call sortn, which will put the 1's and missing values together. Because call sortn only sorts in ascending order, with missing values coming first, I've reversed the variables in the array statement (creating them first in the correct order with a retain statement.
The unwanted variables temp1 and temp2 can then be dropped.
data have;
input A B C D E;
datalines;
1 . 1 . 1
. . 1 . .
. 1 . 1 .
;
run;
data want;
set have;
retain X Y Z .; /* create new variables in required order */
array GG{5} A--E;
array BB{5} temp1 temp2 Z Y X; /* array in reverse order due to ascending sort later on */
do i = 1 to dim(GG);
BB{i} = GG{i};
end;
call sortn(of BB{*}); /* sort array (missing values come first, hence the reverse array order) */
drop temp: i; /* drop unwanted variables */
run;
Alternatively, here's a simpler solution as your criteria is pretty basic. As you're just dealing with 1's and missings, you can loop through the number of non-missing values in A-E and assign 1 to the new array.
data want;
set have;
array GG{5} A--E;
array BB{3} X Y Z;
do i = 1 to n(of GG{*});
BB{i}=1;
end;
drop i; /* drop unwanted variable*/
run;

modify multiple observations in a by variable

data a1
col1 col2 flag
a 2 .
b 3 .
a 4 .
c 1 .
For data a1, flag is always missing. I want to update multiple rows using a2.
data a2
col1 flag
a 1
Ideal output:
col1 col2 flag
a 2 1
b 3 .
a 4 1
c 1 .
But this doesn't update all the records in by statement.
data a1;
modify a1 a2;
by col1;
run;
Question edited
Actually a1 is a very large data set on server. Hence I prefer to modify it (if possible) instead of creating a new one. Otherwise I have to drop previous a1 first and copy a new a1 from local to server, which will take much more time.
If you want to do this with MODIFY, you have to loop over the modify dataset in some fashion or it will only replace the first row (because the other dataset will then run out of records - normally this behaves like merge, where once it finds a match it advances to next record). Here's one option - there are others.
data a1(index=(col1));
input col1 $ col2 flag;
datalines;
a 2 .
b 3 .
a 4 .
c 1 .
;;;;
run;
data a2(index=(col1));
col1='a';
flag=1;
run;
data a1;
set a2(rename=flag=flag2);
do _n_ = 1 to nobs_a1;
modify a1 key=col1 nobs=nobs_a1;
if _iorc_=0 then do;
flag=flag2;
replace;
end;
end;
if _iorc_=%sysrc(_DSENOM) then _error_=0;
run;
If you're not using Merge statement for the sorting problem, you can simply change your merging approach.
If flag in A1 is always missing, you can drop it, otherwise you should temporary rename it for not losing those informations.
Here I will merge A1 and A2 using hash objects, this approach doesn't require any prior sorting on datasets.
data final_merged(drop = finder);
length flag 8.; /*please change length with the real one, use $ if char*/
if _N_ = 1 then do;
declare hash merger(dataset:'A2');
merger.definekey('col1');
merger.DefineData ('flag');
merger.definedone();
end;
set A1(drop=flag);
finder = merger.find();
if finder ne 0 then flag = .;
/*then flag='' or then flag='unknown' as you want if flag is a character var*/
run;
Please, let me know if this will help.
You could do the following but SQL sorts the observations so not sure how useful this would be for you? (you could always preprocess with ordvar=_n_; and then sort the SQL statement on it if that helps):
Data:
data a1 ;
input col1 $ col2 flag ;
cards ;
a 2 .
b 3 .
a 4 .
c 1 .
;run ;
data a2 ;
input col1 $ flag ;
cards ;
a 1
;run ;
Merge:
proc sql ;
create table output as
select a.col1, a.col2, b.flag
from a1 a
left join
a2 b
on a.col1=b.col1
;quit ;
To try and do it in one pass, how about creating two macros variables containing the mapping from a2?
proc sql ;
select distinct col1, flag
into :colvals separated by '', :flagvals separated by ''
from a2
;quit ;
Set flag to the corresponding character position between the two macro variables:
data a1 ;
set a1 ;
if findc("&colvals",col1) then
flag=input(substr("&flagvals", findc("&colvals",col1),1),8.) ;
run ;

How to delete blank observations in a data set in SAS

I want to delete ALL blank observations from a data set.
I only know how to get rid of blanks from one variable:
data a;
set data(where=(var1 ne .)) ;
run;
Here I set a new data set without the blanks from var1.
But how to do it, when I want to get rid of ALL the blanks in the whole data set?
Thanks in advance for your answers.
If you are attempting to get rid of rows where ALL variables are missing, it's quite easy:
/* Create an example with some or all columns missing */
data have;
set sashelp.class;
if _N_ in (2,5,8,13) then do;
call missing(of _numeric_);
end;
if _N_ in (5,6,8,12) then do;
call missing(of _character_);
end;
run;
/* This is the answer */
data want;
set have;
if compress(cats(of _all_),'.')=' ' then delete;
run;
Instead of the compress you could also use OPTIONS MISSING=' '; beforehand.
If you want to remove ALL Rows with ANY missing values, then you can use NMISS/CMISS functions.
data want;
set have;
if nmiss(of _numeric_) > 0 then delete;
run;
or
data want;
set have;
if nmiss(of _numeric_) + cmiss(of _character_) > 0 then delete;
run;
for all char+numeric variables.
You can do something like this:
data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then delete;
end;
drop i;
This will scan trough all the numeric variables and will delete the observation where it finds a missing value
Here you go. This will work irrespective of the variable being character or numeric.
data withBlanks;
input a$ x y z;
datalines;
a 1 2 3
b 1 . 3
c . . 3
. . .
d . 2 3
e 1 . 3
f 1 2 3
;
run;
%macro removeRowsWithMissingVals(inDsn, outDsn, Exclusion);
/*Inputs:
inDsn: Input dataset with some or all columns missing for some or all rows
outDsn: Output dataset with some or all columns NOT missing for some or all rows
Exclusion: Should be one of {AND, OR}. AND will only exclude rows if any columns have missing values, OR will exclude only rows where all columns have missing values
*/
/*get a list of variables in the input dataset along with their types (i.e., whether they are numericor character type)*/
PROC CONTENTS DATA = &inDsn OUT = CONTENTS(keep = name type varnum);
RUN;
/*put each variable with its own comparison string in a seperate macro variable*/
data _null_;
set CONTENTS nobs = num_of_vars end = lastObs;
/*use NE. for numeric cols (type=1) and NE '' for char types*/
if type = 1 then call symputx(compress("var"!!varnum), compbl(name!!" NE . "));
else call symputx(compress("var"!!varnum), compbl(name!!" NE '' "));
/*make a note of no. of variables to check in the dataset*/
if lastObs then call symputx("no_of_obs", _n_);
run;
DATA &outDsn;
set &inDsn;
where
%do i =1 %to &no_of_obs.;
&&var&i.
%if &i < &no_of_obs. %then &Exclusion;
%end;
;
run;
%mend removeRowsWithMissingVals;
%removeRowsWithMissingVals(withBlanks, withOutBlanksAND, AND);
%removeRowsWithMissingVals(withBlanks, withOutBlanksOR, OR);
Outout of withOutBlanksAND:
a x y z
a 1 2 3
f 1 2 3
Output of withOutBlanksOR:
a x y z
a 1 2 3
b 1 . 3
c . . 3
e 1 . 3
f 1 2 3
Really weird nobody provided this elegant answer:
if missing(cats(of _all_)) then delete;
Edit: indeed, I didn't realized the cats(of _all_) returns a dot '.' for missing numeric value.
As a fix, I suggest this, which seems to be more reliable:
*-- Building a sample dataset with test cases --*;
data test;
attrib a format=8.;
attrib b format=$8.;
a=.; b='a'; output;
a=1; b=''; output;
a=.; b=''; output; * should be deleted;
a=.a; b=''; output; * should be deleted;
a=.a; b='.'; output;
a=1; b='b'; output;
run;
*-- Apply the logic to delete blank records --*;
data test2;
set test;
*-- Build arrays of numeric and characters --*;
*-- Note: array can only contains variables of the same type, thus we must create 2 different arrays --*;
array nvars(*) _numeric_;
array cvars(*) _character_;
*-- Delete blank records --*;
*-- Blank record: # of missing num variables + # of missing char variables = # of numeric variables + # of char variables --*;
if nmiss(of _numeric_) + cmiss(of _character_) = dim(nvars) + dim(cvars) then delete;
run;
The main issue being if there is no numeric at all (or not char at all), the creation of an empty array will generate a WARNING and the call to nmiss/cmiss an ERROR.
So, I think so far there is not other option than building a SAS statement outside the data step to identify empty records:
*-- Building a sample dataset with test cases --*;
data test;
attrib a format=8.;
attrib b format=$8.;
a=.; b='a'; output;
a=1; b=''; output;
a=.; b=''; output; * should be deleted;
a=.a; b=''; output; * should be deleted;
a=.a; b='.'; output;
a=1; b='b'; output;
run;
*-- Create a SAS statement which test any missing variable, regardless of its type --*;
proc sql noprint;
select distinct 'missing(' || strip(name) || ')'
into :miss_stmt separated by ' and '
from dictionary.columns
where libname = 'WORK'
and memname = 'TEST'
;
quit;
/*
miss_stmt looks like missing(a) and missing(b)
*/
*-- Delete blank records --*;
data test2;
set test;
if &miss_stmt. then delete;
run;