arrays in sas with different dimensions - sas

Beginning with a table,
A B C D E
1 . 1 . 1
. . 1 . .
. 1 . 1 .
I am trying to get an output like this:
A B C D E X Y Z
1 . 1 . 1 1 1 1
. . 1 . . 1
. 1 . 1 . 1 1
Here is my code:
data want;
set have;
array GG(5) A-E;
array BB(3) X Y Z;
do i=1 to 5;
do j=1 to 3;
if gg(i)=1 then BB(j)=1;
end;
end;
run;
I understand that the result that I get is wrong, as the dimensions of both the arrays are not co-operating. Is there another way to do this?

data want;
set have;
array v1 a--e;
array v2 x y z;
i=1;
do over v1;
if not missing(of v1) then do;
v2(i)=v1;
i+1;
end;
end;
drop i;
run;

why not something like this using a counter to identify which position is the actual non missing value?
data try;
infile datalines delimiter=',';
input var1 var2 var3 var4 var5;
datalines;
1,.,.,1,1,
.,1,.,.,.,
.,.,.,.,.,
.,1,.,1,.,
;
data want;
set try;
array vars[5] var1 var2 var3 var4 var5;
array newvars[5] nvar1 nvar2 nvar3 nvar4 nvar5;
do i=1 to 5;
if i=1 then count=0;
if vars[i] ne . then do;
count=count+1;
newvars[count]=vars[i];
end;
end;
drop i count var:;
run;

My method is to copy the existing values to new variables X Y X temp1 temp2, then sort the values using call sortn, which will put the 1's and missing values together. Because call sortn only sorts in ascending order, with missing values coming first, I've reversed the variables in the array statement (creating them first in the correct order with a retain statement.
The unwanted variables temp1 and temp2 can then be dropped.
data have;
input A B C D E;
datalines;
1 . 1 . 1
. . 1 . .
. 1 . 1 .
;
run;
data want;
set have;
retain X Y Z .; /* create new variables in required order */
array GG{5} A--E;
array BB{5} temp1 temp2 Z Y X; /* array in reverse order due to ascending sort later on */
do i = 1 to dim(GG);
BB{i} = GG{i};
end;
call sortn(of BB{*}); /* sort array (missing values come first, hence the reverse array order) */
drop temp: i; /* drop unwanted variables */
run;
Alternatively, here's a simpler solution as your criteria is pretty basic. As you're just dealing with 1's and missings, you can loop through the number of non-missing values in A-E and assign 1 to the new array.
data want;
set have;
array GG{5} A--E;
array BB{3} X Y Z;
do i = 1 to n(of GG{*});
BB{i}=1;
end;
drop i; /* drop unwanted variable*/
run;

Related

The drop sentence doesn't work with variables created with macro arrays SAS

I am trying to run the following code, that calculates the slope of each variables' timeseries.
I need to drop the variables created with an array, because I use the same logic for other functions.
Nevertheless the output data keeps the variables ys_new&i._: and I get the warning: The variable 'ys_new3_:'n in the DROP, KEEP, or RENAME list has never been referenced.
I think the iterator is evaluated to 3 in the %do %while block.
If someone can help me, I will really apreciated it.
DATA HAVE;
INPUT ID N_TRX_M0-N_TRX_M12 TRANSACTION_AMT_M0-TRANSACTION_AMT_M12;
DATALINES;
1 3 6 3 3 7 8 6 10 5 5 8 7 7 379866 856839 307909 239980 767545 511806 603781 948936 566114 402214 844657 2197164 817390
2 51 56 55 73 48 57 54 53 55 52 49 72 53 6439314 7367157 4614827 9465017 3776064 3661525 7870605 3971889 4919128 10024385 4660264 7748467 7339863
3 5 . . . . . . . . . . . . 232165 . . . . . . . . . . . .
;
RUN;
%Macro slope(variables)/parmbuff;
%let i = 1;
/* Get the first Parameter */
%let parm_&i = %scan(&syspbuff,&i,%str( %(, %)));
%do %while (%str(&&parm_&i.) ne %str());
array ys&i(12) &&parm_&i.._M12 - &&parm_&i.._M1;
array ys_new&i._[12];
/* Corre los valores missing*/
k = 1;
do j = 1 to 12;
if not(missing(ys&i(j))) then do;
ys_new&i._[k] = ys&i[j];
k + 1;
end;
end;
nonmissing = n(of ys_new&i._{*});
xbar = (nonmissing + 1)/2;
if nonmissing ge 2 then do;
ybar = mean(of ys&i(*));
cov = 0;
varx = 0;
do m=1 to nonmissing;
cov=sum(cov, (m-xbar)*(ys_new&i._(m)-ybar));
varx=sum(varx, (m-xbar)**2);
end;
slope_&&parm_&i. = cov/varx;
end;
%let i = %eval(&i+1);
/* Get next parm */
%let parm_&i = %scan(&syspbuff ,&i, %str( %(, %)));
%end;
drop ys_new&i._: k j m nonmissing ybar xbar cov varx;
%mend;
%let var_slope =
N_TRX,
TRANSACTION_AMT
;
DATA FEATURES;
SET HAVE;
%slope(&var_slope)
RUN;
The simplest solution is to generate the DROP statement before the macro has a chance to change the value of the macro variable I .
array ys&i(12) &&parm_&i.._M12 - &&parm_&i.._M1;
array ys_new&i._[12];
drop ys_new&i._: k j m nonmissing ybar xbar cov varx;
You could use a _TEMPORARY_ array instead, but then you need to remember to clear the values on each iteration of the data step.
array ys_new&i._[12] _temporary_;
call missing(of ys_new&i._[*]);
Then you can leave the DROP statement at the end if you want.
drop k j m nonmissing ybar xbar cov varx;
You are correct. &i is 3 after it exits the do loop leading to the drop statement giving a warning that ys_new3_: does not exist. Instead, consider using a temporary array to avoid the drop statement altogether:
array ys_new&i._[12] _TEMPORARY_;

Missing String Value Interpolation SAS

I have a series of string values with missing observations. I would like to use flat substitution. For instance variable x has 3 available values. There should be a 33.333% chance that a missing value will be assigned to the available values for x under this substitution method. How would I do this?
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
Run;
You could use temporary arrays to store the possible values. Then generate a random index into the array.
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
data want ;
set have ;
array possible_b (2) $8 ('Male','Female') ;
if missing(b) then b=possible_b(1+int(rand('uniform')*dim(possible_b)));
run;
I did this with generating random numbers and hard coding the limits. There should be an easier way to do this, but for the purposes of the question this should work.
option missing='';
data begin;
input a $;
cards;
a
.
b
c
.
e
.
f
g
h
.
.
j
.
;
run;
data intermediate;
set begin;
if a EQ '' then help= rand("uniform");
else help=.;
run;
data wanted;
set intermediate;
format help populated.;
if a EQ '' then do;
if 0<=help<0.33 then a='V1';
else if 0.33<=help<0.66 then a='V2';
else if 0.66<=help then a='V3';
end;
drop help;
run;

arrays in SAS with more columns

I have a dataset
Data have;
input A B C;
cards;
1 . .
. . 1
1 1 .
run;
And I am looking for an output which is like this.
A B C OUT
1 . . A
. . 1 C
1 1 . A,B
I wrote the program this way:
Data want;
set have;
array U(3)A B C;
do i=1 to 3;
if U(i)^=. then OUT=cat(vname(u(i),',');
end;
run;
This gives only the last VNAME and not the concatenation.
When using a separator with concatenation, then catx is the function to use, or even better call catx which negates the need to put out = and using out in the concatenation as well. Both these functions will trim any leading or trailing blanks.
The other problem with your code is that because out is derived from numeric variables, SAS will default the type to numeric as well. You need to define the type to character beforehand (I've done this with a length statement.
The following code achieves your goal.
Data have;
input A B C;
cards;
1 . .
. . 1
1 1 .
run;
data want;
set have;
length out $20;
array U{3} A B C;
do i = 1 to 3;
if not missing(U{i}) then call catx(',',out,vname(U{i}));
end;
drop i;
run;

Find the next non blank value in SAS

I am trying to linearly interpolate values within a panel set data. So I am find the next non zero value within a variable if the current value of the variable is "."
For example if X = { 1, 2, . , . , . ,7), I want to store 7 as a variable "Y" and subject the lag value of X from it as the numerator of the slope. Can anyone help with this step?
If you cannot transpose your data, here is a way that will work for your given example:
data test;
input id $3. x best12.;
datalines;
AAA 1
BBB 2
CCC .
DDD .
EEE .
FFF 7
;
run;
data test2;
set test;
n = _n_;
if x ne .;
run;
data test3;
set test2;
lagx = lag(x);
lagn = lag(n);
if _n_ > 1 and n ne lagn + 1 then do;
postiondiff = n - lagn;
valuediff = x - lagx;
do i = (lagx + ((x-lagx)/(n-lagn))) to x by ((x-lagx)/(n-lagn));
x = i;
output;
end;
end;
else output;
keep x;
run;
data test4;
merge test test3 (rename = (x=newx));
run;
So we are basically rebuilding the variable with the interpolated values, then remerging it into the original dataset without a by variable which will line up all the new interpolated data with the missing points.
Is there a way you could transpose all your data? Interpolating like that is much easier when all the data you need is in a single observation. Like this:
data test;
input x best12.;
datalines;
1
2
.
.
.
7
;
run;
proc transpose data = test
out = test2;
run;
data test3;
set test2;
array xvalues {*} COL1-COL6;
array interpol {4,10} begin1-begin10 end1-end10 begposition1-begposition10 endposition1-endposition10;
rangenum = 1;
* Find the endpoints of the missing ranges;
do i = 1 to dim(xvalues);
if xvalues{i} ne . then lastknownx = xvalues{i};
else do;
interpol{1,rangenum} = lastknownx;
if interpol{3,rangenum} = . then interpol{3,rangenum} = i - 1;
end;
if i > 1 and xvalues{i} ne . then do;
if xvalues{i-1} = . then do;
interpol{2,rangenum} = xvalues{i};
interpol{4,rangenum} = i;
rangenum = rangenum + 1;
end;
end;
end;
* Interpolate;
rangenum = 1;
do j = 1 to dim(xvalues);
if xvalues{j} = . then do;
xvalues{j} = interpol{1,rangenum} + (j-interpol{3,rangenum})*((interpol{2,rangenum}-interpol{1,rangenum})/(interpol{4,rangenum}-interpol{3,rangenum}));
end;
else if j > 1 and xvalues{j} ne . then do;
if xvalues{j-1} = . then rangenum = rangenum + 1;
end;
end;
keep col1-col6;
run;
That can handle up to 10 different missing ranges per observation, though you could tweak the code to handle much more than that by creating bigger arrays.
The SAS data step reads a dataset one record at a time from top to bottom. So at record i, it can't access i+1 because it hasn't read it yet; it can only access i-1. Assume you have a dataset with a variable x.
data intrpl;
retain _x;
set yourdata;
by x notsorted;
if not missing(x) then do;
_x = x;
if last.x then do;
slope = _x - lag(_x);
output;
end;
end;
run;
Transposing can get kind of messy if x takes on a lot of values, so I recommend this method. I hope it helps!

How to delete blank observations in a data set in SAS

I want to delete ALL blank observations from a data set.
I only know how to get rid of blanks from one variable:
data a;
set data(where=(var1 ne .)) ;
run;
Here I set a new data set without the blanks from var1.
But how to do it, when I want to get rid of ALL the blanks in the whole data set?
Thanks in advance for your answers.
If you are attempting to get rid of rows where ALL variables are missing, it's quite easy:
/* Create an example with some or all columns missing */
data have;
set sashelp.class;
if _N_ in (2,5,8,13) then do;
call missing(of _numeric_);
end;
if _N_ in (5,6,8,12) then do;
call missing(of _character_);
end;
run;
/* This is the answer */
data want;
set have;
if compress(cats(of _all_),'.')=' ' then delete;
run;
Instead of the compress you could also use OPTIONS MISSING=' '; beforehand.
If you want to remove ALL Rows with ANY missing values, then you can use NMISS/CMISS functions.
data want;
set have;
if nmiss(of _numeric_) > 0 then delete;
run;
or
data want;
set have;
if nmiss(of _numeric_) + cmiss(of _character_) > 0 then delete;
run;
for all char+numeric variables.
You can do something like this:
data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then delete;
end;
drop i;
This will scan trough all the numeric variables and will delete the observation where it finds a missing value
Here you go. This will work irrespective of the variable being character or numeric.
data withBlanks;
input a$ x y z;
datalines;
a 1 2 3
b 1 . 3
c . . 3
. . .
d . 2 3
e 1 . 3
f 1 2 3
;
run;
%macro removeRowsWithMissingVals(inDsn, outDsn, Exclusion);
/*Inputs:
inDsn: Input dataset with some or all columns missing for some or all rows
outDsn: Output dataset with some or all columns NOT missing for some or all rows
Exclusion: Should be one of {AND, OR}. AND will only exclude rows if any columns have missing values, OR will exclude only rows where all columns have missing values
*/
/*get a list of variables in the input dataset along with their types (i.e., whether they are numericor character type)*/
PROC CONTENTS DATA = &inDsn OUT = CONTENTS(keep = name type varnum);
RUN;
/*put each variable with its own comparison string in a seperate macro variable*/
data _null_;
set CONTENTS nobs = num_of_vars end = lastObs;
/*use NE. for numeric cols (type=1) and NE '' for char types*/
if type = 1 then call symputx(compress("var"!!varnum), compbl(name!!" NE . "));
else call symputx(compress("var"!!varnum), compbl(name!!" NE '' "));
/*make a note of no. of variables to check in the dataset*/
if lastObs then call symputx("no_of_obs", _n_);
run;
DATA &outDsn;
set &inDsn;
where
%do i =1 %to &no_of_obs.;
&&var&i.
%if &i < &no_of_obs. %then &Exclusion;
%end;
;
run;
%mend removeRowsWithMissingVals;
%removeRowsWithMissingVals(withBlanks, withOutBlanksAND, AND);
%removeRowsWithMissingVals(withBlanks, withOutBlanksOR, OR);
Outout of withOutBlanksAND:
a x y z
a 1 2 3
f 1 2 3
Output of withOutBlanksOR:
a x y z
a 1 2 3
b 1 . 3
c . . 3
e 1 . 3
f 1 2 3
Really weird nobody provided this elegant answer:
if missing(cats(of _all_)) then delete;
Edit: indeed, I didn't realized the cats(of _all_) returns a dot '.' for missing numeric value.
As a fix, I suggest this, which seems to be more reliable:
*-- Building a sample dataset with test cases --*;
data test;
attrib a format=8.;
attrib b format=$8.;
a=.; b='a'; output;
a=1; b=''; output;
a=.; b=''; output; * should be deleted;
a=.a; b=''; output; * should be deleted;
a=.a; b='.'; output;
a=1; b='b'; output;
run;
*-- Apply the logic to delete blank records --*;
data test2;
set test;
*-- Build arrays of numeric and characters --*;
*-- Note: array can only contains variables of the same type, thus we must create 2 different arrays --*;
array nvars(*) _numeric_;
array cvars(*) _character_;
*-- Delete blank records --*;
*-- Blank record: # of missing num variables + # of missing char variables = # of numeric variables + # of char variables --*;
if nmiss(of _numeric_) + cmiss(of _character_) = dim(nvars) + dim(cvars) then delete;
run;
The main issue being if there is no numeric at all (or not char at all), the creation of an empty array will generate a WARNING and the call to nmiss/cmiss an ERROR.
So, I think so far there is not other option than building a SAS statement outside the data step to identify empty records:
*-- Building a sample dataset with test cases --*;
data test;
attrib a format=8.;
attrib b format=$8.;
a=.; b='a'; output;
a=1; b=''; output;
a=.; b=''; output; * should be deleted;
a=.a; b=''; output; * should be deleted;
a=.a; b='.'; output;
a=1; b='b'; output;
run;
*-- Create a SAS statement which test any missing variable, regardless of its type --*;
proc sql noprint;
select distinct 'missing(' || strip(name) || ')'
into :miss_stmt separated by ' and '
from dictionary.columns
where libname = 'WORK'
and memname = 'TEST'
;
quit;
/*
miss_stmt looks like missing(a) and missing(b)
*/
*-- Delete blank records --*;
data test2;
set test;
if &miss_stmt. then delete;
run;