I have a dataset
Data have;
input A B C;
cards;
1 . .
. . 1
1 1 .
run;
And I am looking for an output which is like this.
A B C OUT
1 . . A
. . 1 C
1 1 . A,B
I wrote the program this way:
Data want;
set have;
array U(3)A B C;
do i=1 to 3;
if U(i)^=. then OUT=cat(vname(u(i),',');
end;
run;
This gives only the last VNAME and not the concatenation.
When using a separator with concatenation, then catx is the function to use, or even better call catx which negates the need to put out = and using out in the concatenation as well. Both these functions will trim any leading or trailing blanks.
The other problem with your code is that because out is derived from numeric variables, SAS will default the type to numeric as well. You need to define the type to character beforehand (I've done this with a length statement.
The following code achieves your goal.
Data have;
input A B C;
cards;
1 . .
. . 1
1 1 .
run;
data want;
set have;
length out $20;
array U{3} A B C;
do i = 1 to 3;
if not missing(U{i}) then call catx(',',out,vname(U{i}));
end;
drop i;
run;
Related
Beginning with a table,
A B C D E
1 . 1 . 1
. . 1 . .
. 1 . 1 .
I am trying to get an output like this:
A B C D E X Y Z
1 . 1 . 1 1 1 1
. . 1 . . 1
. 1 . 1 . 1 1
Here is my code:
data want;
set have;
array GG(5) A-E;
array BB(3) X Y Z;
do i=1 to 5;
do j=1 to 3;
if gg(i)=1 then BB(j)=1;
end;
end;
run;
I understand that the result that I get is wrong, as the dimensions of both the arrays are not co-operating. Is there another way to do this?
data want;
set have;
array v1 a--e;
array v2 x y z;
i=1;
do over v1;
if not missing(of v1) then do;
v2(i)=v1;
i+1;
end;
end;
drop i;
run;
why not something like this using a counter to identify which position is the actual non missing value?
data try;
infile datalines delimiter=',';
input var1 var2 var3 var4 var5;
datalines;
1,.,.,1,1,
.,1,.,.,.,
.,.,.,.,.,
.,1,.,1,.,
;
data want;
set try;
array vars[5] var1 var2 var3 var4 var5;
array newvars[5] nvar1 nvar2 nvar3 nvar4 nvar5;
do i=1 to 5;
if i=1 then count=0;
if vars[i] ne . then do;
count=count+1;
newvars[count]=vars[i];
end;
end;
drop i count var:;
run;
My method is to copy the existing values to new variables X Y X temp1 temp2, then sort the values using call sortn, which will put the 1's and missing values together. Because call sortn only sorts in ascending order, with missing values coming first, I've reversed the variables in the array statement (creating them first in the correct order with a retain statement.
The unwanted variables temp1 and temp2 can then be dropped.
data have;
input A B C D E;
datalines;
1 . 1 . 1
. . 1 . .
. 1 . 1 .
;
run;
data want;
set have;
retain X Y Z .; /* create new variables in required order */
array GG{5} A--E;
array BB{5} temp1 temp2 Z Y X; /* array in reverse order due to ascending sort later on */
do i = 1 to dim(GG);
BB{i} = GG{i};
end;
call sortn(of BB{*}); /* sort array (missing values come first, hence the reverse array order) */
drop temp: i; /* drop unwanted variables */
run;
Alternatively, here's a simpler solution as your criteria is pretty basic. As you're just dealing with 1's and missings, you can loop through the number of non-missing values in A-E and assign 1 to the new array.
data want;
set have;
array GG{5} A--E;
array BB{3} X Y Z;
do i = 1 to n(of GG{*});
BB{i}=1;
end;
drop i; /* drop unwanted variable*/
run;
data a1
col1 col2 flag
a 2 .
b 3 .
a 4 .
c 1 .
For data a1, flag is always missing. I want to update multiple rows using a2.
data a2
col1 flag
a 1
Ideal output:
col1 col2 flag
a 2 1
b 3 .
a 4 1
c 1 .
But this doesn't update all the records in by statement.
data a1;
modify a1 a2;
by col1;
run;
Question edited
Actually a1 is a very large data set on server. Hence I prefer to modify it (if possible) instead of creating a new one. Otherwise I have to drop previous a1 first and copy a new a1 from local to server, which will take much more time.
If you want to do this with MODIFY, you have to loop over the modify dataset in some fashion or it will only replace the first row (because the other dataset will then run out of records - normally this behaves like merge, where once it finds a match it advances to next record). Here's one option - there are others.
data a1(index=(col1));
input col1 $ col2 flag;
datalines;
a 2 .
b 3 .
a 4 .
c 1 .
;;;;
run;
data a2(index=(col1));
col1='a';
flag=1;
run;
data a1;
set a2(rename=flag=flag2);
do _n_ = 1 to nobs_a1;
modify a1 key=col1 nobs=nobs_a1;
if _iorc_=0 then do;
flag=flag2;
replace;
end;
end;
if _iorc_=%sysrc(_DSENOM) then _error_=0;
run;
If you're not using Merge statement for the sorting problem, you can simply change your merging approach.
If flag in A1 is always missing, you can drop it, otherwise you should temporary rename it for not losing those informations.
Here I will merge A1 and A2 using hash objects, this approach doesn't require any prior sorting on datasets.
data final_merged(drop = finder);
length flag 8.; /*please change length with the real one, use $ if char*/
if _N_ = 1 then do;
declare hash merger(dataset:'A2');
merger.definekey('col1');
merger.DefineData ('flag');
merger.definedone();
end;
set A1(drop=flag);
finder = merger.find();
if finder ne 0 then flag = .;
/*then flag='' or then flag='unknown' as you want if flag is a character var*/
run;
Please, let me know if this will help.
You could do the following but SQL sorts the observations so not sure how useful this would be for you? (you could always preprocess with ordvar=_n_; and then sort the SQL statement on it if that helps):
Data:
data a1 ;
input col1 $ col2 flag ;
cards ;
a 2 .
b 3 .
a 4 .
c 1 .
;run ;
data a2 ;
input col1 $ flag ;
cards ;
a 1
;run ;
Merge:
proc sql ;
create table output as
select a.col1, a.col2, b.flag
from a1 a
left join
a2 b
on a.col1=b.col1
;quit ;
To try and do it in one pass, how about creating two macros variables containing the mapping from a2?
proc sql ;
select distinct col1, flag
into :colvals separated by '', :flagvals separated by ''
from a2
;quit ;
Set flag to the corresponding character position between the two macro variables:
data a1 ;
set a1 ;
if findc("&colvals",col1) then
flag=input(substr("&flagvals", findc("&colvals",col1),1),8.) ;
run ;
I have a dataset similar to the one below
ID A B C D E
1 1
1 1
1 1
2 1
2 1
3 1
3 1
4 1
5 1
I want to condense the data into one row for each ID. So the dataset would look like the one below.
ID A B C D E
1 1 1 1
2 1 1
3 1 1
4 1
5 1
Well I created another table and removed the duplicate ID's. So I have two tables--A and B. I then tried merging the two datasets together. I was playing around with following SAS code.
data C;
merge A B;
by ID;
run;
Here's a neat trick I picked up from another forum. There's no need to split up the original dataset, the first update statement creates the structure and the second updates the values. The BY statement ensures you only get 1 record per ID.
data have;
infile datalines dsd;
input ID A B C D E;
datalines;
1,1,,,,,
1,,,1,,,
1,,1,,,,
2,,1,,,,
2,,,,1,,
3,,,,,1,
3,1,,,,,
4,,,1,,,
5,,1,,,
;
run;
data want;
update have (obs=0) have;
by id;
run;
This could be solved using the retain statement.
data B(rename=(A2=A B2=B C2=C D2=D));
set A;
by id;
retain A2 B2 C2 D2;
if first.id then do;
A2 = .;
B2 = .;
C2 = .;
D2 = .;
end;
if A ne . then A2=A;
if B ne . then B2=B;
if C ne . then C2=C;
if D ne . then D2=D;
if last.id then output;
drop A B C D;
run;
There are other ways to solve this, but hopefully this is helpful.
PROC MEANS is a great tool for something like this. PROC SQL would also give you a reasonable solution, but MEANS is faster.
proc means data=yourdata;
var a b c d e;
class id;
types id; *to avoid the 'overall' row;
output out=yourdata max=; *output the maximum of each var for each ID - use SUM instead if you want more than 1;
run;
After I run Transpose in query builder, i get '.' for empty fields. Is there a way to avoid these '.' ? I can remove those in the next step by adding a case statement but doing this for more than 100 columns won't be a good idea.
123019 1 . . .
166584 . 1 . .
171198 . . 1 .
285703 . . . 1
309185 . . . 2
324756 . . . 1
335743 . . . .
348340 . . . .
Please help.
Thanks
Dot (missing) is identical to "blank" in SAS. If you're actually printing the data out, you can use the statement:
options missing=' '; *or 0 or any other character;
That will be shown for missing (null/blank) values. In some contexts that may not be preserved, in which case you either use a data step to convert to zero, or use PROC STDIZE:
proc stdize data=mydataset missing=0 reponly;
run;
which may be faster/easier to code, if you have SAS/STAT licensed.
You can use this code:
data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then a(i) = 0;
end;
drop i;
run;
Or you can just run all the steps and add this at the bottom of your datastep:
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then a(i) = 0;
end;
drop i; .
BTW this will replace the . to zeros, the "." represents a missing value in SAS, you can replace the 0 on the code that I provided for any other value you want to show instead of .
EDIT:given your inputs the code should be like this:
PROC SORT DATA=ABC
OUT=ABC1 ;
BY EMP;
RUN;
PROC TRANSPOSE DATA=ABC1 OUT=ABC2 NAME=Source LABEL=Label;
BY EM;
ID VC;
VAR FQ;
/* ------------------------------------------------------------------- End of task code. ------------------------------------------------------------------- /
RUN; QUIT;
/* Start of custom user code. */
data ABC2;
set ABC2;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then a(i) = 0;
end;
drop i;
run;
I want to delete ALL blank observations from a data set.
I only know how to get rid of blanks from one variable:
data a;
set data(where=(var1 ne .)) ;
run;
Here I set a new data set without the blanks from var1.
But how to do it, when I want to get rid of ALL the blanks in the whole data set?
Thanks in advance for your answers.
If you are attempting to get rid of rows where ALL variables are missing, it's quite easy:
/* Create an example with some or all columns missing */
data have;
set sashelp.class;
if _N_ in (2,5,8,13) then do;
call missing(of _numeric_);
end;
if _N_ in (5,6,8,12) then do;
call missing(of _character_);
end;
run;
/* This is the answer */
data want;
set have;
if compress(cats(of _all_),'.')=' ' then delete;
run;
Instead of the compress you could also use OPTIONS MISSING=' '; beforehand.
If you want to remove ALL Rows with ANY missing values, then you can use NMISS/CMISS functions.
data want;
set have;
if nmiss(of _numeric_) > 0 then delete;
run;
or
data want;
set have;
if nmiss(of _numeric_) + cmiss(of _character_) > 0 then delete;
run;
for all char+numeric variables.
You can do something like this:
data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then delete;
end;
drop i;
This will scan trough all the numeric variables and will delete the observation where it finds a missing value
Here you go. This will work irrespective of the variable being character or numeric.
data withBlanks;
input a$ x y z;
datalines;
a 1 2 3
b 1 . 3
c . . 3
. . .
d . 2 3
e 1 . 3
f 1 2 3
;
run;
%macro removeRowsWithMissingVals(inDsn, outDsn, Exclusion);
/*Inputs:
inDsn: Input dataset with some or all columns missing for some or all rows
outDsn: Output dataset with some or all columns NOT missing for some or all rows
Exclusion: Should be one of {AND, OR}. AND will only exclude rows if any columns have missing values, OR will exclude only rows where all columns have missing values
*/
/*get a list of variables in the input dataset along with their types (i.e., whether they are numericor character type)*/
PROC CONTENTS DATA = &inDsn OUT = CONTENTS(keep = name type varnum);
RUN;
/*put each variable with its own comparison string in a seperate macro variable*/
data _null_;
set CONTENTS nobs = num_of_vars end = lastObs;
/*use NE. for numeric cols (type=1) and NE '' for char types*/
if type = 1 then call symputx(compress("var"!!varnum), compbl(name!!" NE . "));
else call symputx(compress("var"!!varnum), compbl(name!!" NE '' "));
/*make a note of no. of variables to check in the dataset*/
if lastObs then call symputx("no_of_obs", _n_);
run;
DATA &outDsn;
set &inDsn;
where
%do i =1 %to &no_of_obs.;
&&var&i.
%if &i < &no_of_obs. %then &Exclusion;
%end;
;
run;
%mend removeRowsWithMissingVals;
%removeRowsWithMissingVals(withBlanks, withOutBlanksAND, AND);
%removeRowsWithMissingVals(withBlanks, withOutBlanksOR, OR);
Outout of withOutBlanksAND:
a x y z
a 1 2 3
f 1 2 3
Output of withOutBlanksOR:
a x y z
a 1 2 3
b 1 . 3
c . . 3
e 1 . 3
f 1 2 3
Really weird nobody provided this elegant answer:
if missing(cats(of _all_)) then delete;
Edit: indeed, I didn't realized the cats(of _all_) returns a dot '.' for missing numeric value.
As a fix, I suggest this, which seems to be more reliable:
*-- Building a sample dataset with test cases --*;
data test;
attrib a format=8.;
attrib b format=$8.;
a=.; b='a'; output;
a=1; b=''; output;
a=.; b=''; output; * should be deleted;
a=.a; b=''; output; * should be deleted;
a=.a; b='.'; output;
a=1; b='b'; output;
run;
*-- Apply the logic to delete blank records --*;
data test2;
set test;
*-- Build arrays of numeric and characters --*;
*-- Note: array can only contains variables of the same type, thus we must create 2 different arrays --*;
array nvars(*) _numeric_;
array cvars(*) _character_;
*-- Delete blank records --*;
*-- Blank record: # of missing num variables + # of missing char variables = # of numeric variables + # of char variables --*;
if nmiss(of _numeric_) + cmiss(of _character_) = dim(nvars) + dim(cvars) then delete;
run;
The main issue being if there is no numeric at all (or not char at all), the creation of an empty array will generate a WARNING and the call to nmiss/cmiss an ERROR.
So, I think so far there is not other option than building a SAS statement outside the data step to identify empty records:
*-- Building a sample dataset with test cases --*;
data test;
attrib a format=8.;
attrib b format=$8.;
a=.; b='a'; output;
a=1; b=''; output;
a=.; b=''; output; * should be deleted;
a=.a; b=''; output; * should be deleted;
a=.a; b='.'; output;
a=1; b='b'; output;
run;
*-- Create a SAS statement which test any missing variable, regardless of its type --*;
proc sql noprint;
select distinct 'missing(' || strip(name) || ')'
into :miss_stmt separated by ' and '
from dictionary.columns
where libname = 'WORK'
and memname = 'TEST'
;
quit;
/*
miss_stmt looks like missing(a) and missing(b)
*/
*-- Delete blank records --*;
data test2;
set test;
if &miss_stmt. then delete;
run;