Test if a variable exists - sas

I want to test if a variable exists and if it doesn't, create it.

The open()&varnum() functions can be used. Non-zero output from varnum() indicates the variable exists.
data try;
input var1 var2 var3;
datalines;
7 2 2
5 5 3
7 2 7
;
data try2;
set try;
if _n_ = 1 then do;
dsid=open('try');
if varnum(dsid,'var4') = 0 then var4 = .;
rc=close(dsid);
end;
drop rc dsid;
run;

data try2;
set try;
var4 = coalesce(var4,.);
run;
(assuming var4 is numeric)

Assign var4 to itself. The assignment will create the variable if it doesn't exist and leave the contents in place if it does.
data try;
input var1 var2 var3;
datalines;
7 2 2
5 5 3
7 2 7
;
data try2;
set try;
var4 = var4;
run;
Just remember that creating var4 this way when it doesn't exist will use the default variable attributes, so you may need to use an explicit attrib statement if you require specific formatting/length etc.

This is a very late answer/comment, but this method works for me and is pretty simple (SAS 9.4). In the below example, I used missing numeric and character variables and assigned a value to the missing character variable is missing.
data try;
input var1 var2 var3;
datalines;
7 2 2
5 5 3
7 2 7
;
data try2;
length var4 $20;
length var5 8;
set try;
var4 = var4;
if var4 = ' ' then var4 = 'Not on Source File';
run;

Related

SAS Concatenation based on values

Below is the sample data.
NAME VAR2 VAR3 VAR4 VAR5
ABC X Y 2
DEF P Q R 3
GHI L 1
The count of variables (from VAR2-VAR4) is present under VAR5 for each record, I want the following output with NewVar as the concatenation of the variables which contain a value.
NAME VAR2 VAR3 VAR4 VAR5 NewVar
ABC X Y 2 X,Y
DEF P Q R 3 P,Q,R
GHI L 1 L
I have no clue how to do it in SAS. Any help is appreciated.
Use the CATX() function to concatenate the variables; with this function you have the option to specify the delimiter character to use between the values. Ex. CATX(',',VAR2,VAR3,VAR4)
Input Data:
data have;
input NAME $ VAR2 $ VAR3 $ VAR4 $ VAR5;
datalines;
ABC X Y . 2
DEF P Q R 3
GHI L . . 1
;
run;
Solution:
data want;
set have;
NewVar= catx(',',VAR2,VAR3,VAR4);
run;
or
%let list=VAR2,VAR3,VAR4;
data want2;
set have;
NewVar= catx(',',&list.);
run;
or (Tom's Recommendation)
data want3;
set have;
NewVar= catx(',',of var2-var4);
run;
Output:
NAME=ABC VAR2=X VAR3=Y VAR4= VAR5=2 NewVar=X,Y
NAME=DEF VAR2=P VAR3=Q VAR4=R VAR5=3 NewVar=P,Q,R
NAME=GHI VAR2=L VAR3= VAR4= VAR5=1 NewVar=L

Why does proc arima with NoEst throw 'There is not enough data to fit the model' error?

I am using proc arima in SAS 9.4 to produce a forecast using a previously calibrated model, but it is throwing an error as if it is trying to calibrate the model itself :
ERROR: There is not enough data to fit the model
sample data:
data inputs;
input x var1 var2 var3 var4 var5;
datalines;
20 5 2 4 5 4
25 12 56 13 44 4
20 5 2 4 5 4
25 12 56 13 44 4
20 5 2 4 5 4
25 12 56 13 44 4
. 2 5 6 5 4
;
failing version:
proc arima;
identify
data = inputs
var = x
crossCorr = ( var1 var2 var3 var4 var5 )
noPrint;
estimate
p = 1 input = ( var1 var2 var3 var4 var5 )
ar = 0.9
initVal = ( 0.1$var1 0.2$var2 0.3$var3 0.4$var4 0.4$var5 )
noint
noEst /* Using noEst so should not need to do any estimation and short data-set should not be a problem */
method=ml
noprint
;
forecast lead=1 out=outputs noOutAll noprint;
quit;
If I remove the final variable from the model, it works fine:
proc arima;
identify
data = inputs
var = x
crossCorr = ( var1 var2 var3 var4 )
noPrint;
estimate
p = 1 input = ( var1 var2 var3 var4 )
ar = 0.9
initVal = ( 0.1$var1 0.2$var2 0.3$var3 0.4$var4 )
noint
noEst /* Using noEst so should not need to do any estimation and short data-set should not be a problem */
method=ml
noprint
;
forecast lead=1 out=outputs noOutAll noprint;
quit;
I can also get it to 'work' by adding one more value to the data. However, this shouldn't be necessary when the model is already calibrated (using much more data).
I've checked the SAS documentation to see if there are any flags to prevent the unnecessary check that causes this error but none of them helped.
The answer has been provided on the SAS communities forum. It is known behaviour and so my uncommon use case is not supported. The only workaround would be to add some dummy data, but in my case with MA terms that would change the results.
Response on SAS Communities

Proc report with numeric variables

I recently came across an issue when using Proc report whereby the below code outputs only the first observation:
data have ;
input var1-var3 ;
datalines ;
1 10 100
2 20 200
3 30 300
4 40 400
;run ;
proc report data=have ;
columns var1 var2 var3 ;
define var1 / 'Variable 1' width=10;
define var2 / 'Variable 2' width=10;
define var3 / 'Variable 3' width=10;
run ;
It will report all 4 observations correctly by either:
Changing var1 to be a character variable (input var1 $ var2-var3)
Explicitly defining define var1 to be define var1 / display
I'm trying to work out the logic of why this would be happening. It can't be that having the first variable in the report as numeric defaults to a group variable rather than display as all var1 values are unique so should be grouped separately - whereas only the first observation is reported. Can someone explain the logic?
I was able to find the answer of what's happening behind the scenes by adding the list option to the proc report statement...
input var1-var3 (3x numeric) puts the following to the log:
PROC REPORT DATA=WORK.HAVE LS=120 PS=44 SPLIT="/" CENTER ;
COLUMN ( var1 var2 var3 );
DEFINE var1 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 1" ;
DEFINE var2 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 2" ;
DEFINE var3 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 3" ;
RUN;
input var1 $ (var2 var3) (:) (setting first to character) puts the following to the log:
PROC REPORT DATA=WORK.HAVE LS=120 PS=44 SPLIT="/" CENTER ;
COLUMN ( var1 var2 var3 );
DEFINE var1 / DISPLAY FORMAT= $8. WIDTH=10 SPACING=2 LEFT "Variable 1" ;
DEFINE var2 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 2" ;
DEFINE var3 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 3" ;
RUN;
So, knowing that numeric variables have a default value of SUM at least explains what was causing it. Although it causes a problem on simple report like this, it does at least report a sum correctly if var1 is defined as a by group:
data have ;
input var1 var2 var3 ;
datalines ;
1 10 100
1 15 150
2 20 200
3 30 300
4 40 400
;run ;
proc report data=have list ;
columns var2 var3 ;
by var1 ;
define var2 / 'Variable 2' width=10;
define var3 / 'Variable 3' width=10;
run ;
You should just add options which describes what kind of variables they are; like group or analysis, like below:
proc report nowd data=have ;
columns var1 var2 var3 ;
define var1 / group width=10 'Variable 1';
define var2 / analysis width=10 'Variable 2';
define var3 / analysis width=10 'Variable 3';
run ;
Here is the result:
Variable 1 Variable 2 Variable 3
1 10 100
2 20 200
3 30 300
4 40 400

SAS Conditional Statement

This:
IF VAR1 ne VAR2 ne VAR3 ne VAR4;
I want this condition to check if:
VAR1 is not equal to VAR2, VAR3, VAR4
VAR2 is not equal to VAR1, VAR3, VAR4
VAR3 is not equal to VAR1, VAR2, VAR4
VAR4 is not equal to VAR2, VAR3, VAR1
Is this possible?
I think for the four variable case the six anded IFs is probably best. However, if you want to do this unbounded, an array solution is evident; it's more work here than needed but is less work than 10 variables -> 45 ifs.
data want;
set have;
match=0;
array vars var:;
do _t = 1 to dim(vars)-1;
do _u = _t+1 to dim(vars);
if vars[_t] = vars[_u] then match=1;
end;
if match=1 then leave;
end;
run;
This does the same thing as the 6 if's (tests 1 vs 2,3,4, tests 2 vs 3,4, tests 3 vs 4), but in array/loop form.
A couple of options. Do it in long-form with and between:
VAR1 ne VAR2 and VAR1 ne VAR3 and VAR1 ne VAR4 and
VAR2 ne VAR3 and VAR2 ne VAR4 and VAR3 ne VAR4
Or use the numerical equivalent of a TRUE value as 1 to test it:
sum(VAR1 = VAR2,
VAR1 = VAR3,
VAR1 = VAR4,
VAR2 = VAR3,
VAR2 = VAR4,
VAR3 = VAR4) = 0
You can use:
if var1=var2=var3=var4 then ...
There's some limitations to this but I can't recall them at the moment. In a straight IF condition I think it's okay.
Alternatively:
if var1 ^in (var2 var3 var4) and
if var2 ^in (var3 var4) and
if var3 ^in (var4);
Or "not in".

Read wide file with repeated variables in SAS

I have input data shaped like this:
var1 var2 var3 var2 var3 ...
where each row has one value of var1 followed by a varying number of var2-var3 pairs. After reading this input, I want the data set to have multiple records for each var1: one record for each pair of var2/var3.
So if the first two lines of the input file are
A 1 2 7 3 4 5
B 2 3
this would generate 4 records:
A 1 2
A 7 3
A 4 5
B 2 3
Is there an simple/elegant way to do this? I've tried reading each row as one long variable and splitting with scan but it's getting messy and I'm betting there's a really easy way to do this.
I'm sure there are many ways to do this, but here is the first that comes to my mind:
data want(keep=var1 var2 var3);
infile 'path-to-your-file';
input;
var1 = input(scan(_infile_,1),$8.);
i = 1;
do while(i ne 0);
i + 1;
var2 = input(scan(_infile_,i),8.);
i + 1;
var3 = input(scan(_infile_,i),8.);
if var3 = . then i = 0;
else output;
end;
run;
_infile_ is an automatic SAS variable that contains the currently read record. Use an appropriate informat for each variable you read.
Like this (conditional input with jumping back):
data test;
infile datalines missover;
input var1 $ var2 $ var3 $ temp $ #;
output;
do while(not missing(temp));
input +(-2) var2 $ var3 $ temp $ #;
output;
end;
drop temp;
datalines;
A 1 2 7 3 4 5
B 2 3
;
run;