This:
IF VAR1 ne VAR2 ne VAR3 ne VAR4;
I want this condition to check if:
VAR1 is not equal to VAR2, VAR3, VAR4
VAR2 is not equal to VAR1, VAR3, VAR4
VAR3 is not equal to VAR1, VAR2, VAR4
VAR4 is not equal to VAR2, VAR3, VAR1
Is this possible?
I think for the four variable case the six anded IFs is probably best. However, if you want to do this unbounded, an array solution is evident; it's more work here than needed but is less work than 10 variables -> 45 ifs.
data want;
set have;
match=0;
array vars var:;
do _t = 1 to dim(vars)-1;
do _u = _t+1 to dim(vars);
if vars[_t] = vars[_u] then match=1;
end;
if match=1 then leave;
end;
run;
This does the same thing as the 6 if's (tests 1 vs 2,3,4, tests 2 vs 3,4, tests 3 vs 4), but in array/loop form.
A couple of options. Do it in long-form with and between:
VAR1 ne VAR2 and VAR1 ne VAR3 and VAR1 ne VAR4 and
VAR2 ne VAR3 and VAR2 ne VAR4 and VAR3 ne VAR4
Or use the numerical equivalent of a TRUE value as 1 to test it:
sum(VAR1 = VAR2,
VAR1 = VAR3,
VAR1 = VAR4,
VAR2 = VAR3,
VAR2 = VAR4,
VAR3 = VAR4) = 0
You can use:
if var1=var2=var3=var4 then ...
There's some limitations to this but I can't recall them at the moment. In a straight IF condition I think it's okay.
Alternatively:
if var1 ^in (var2 var3 var4) and
if var2 ^in (var3 var4) and
if var3 ^in (var4);
Or "not in".
Related
Below is the sample data.
NAME VAR2 VAR3 VAR4 VAR5
ABC X Y 2
DEF P Q R 3
GHI L 1
The count of variables (from VAR2-VAR4) is present under VAR5 for each record, I want the following output with NewVar as the concatenation of the variables which contain a value.
NAME VAR2 VAR3 VAR4 VAR5 NewVar
ABC X Y 2 X,Y
DEF P Q R 3 P,Q,R
GHI L 1 L
I have no clue how to do it in SAS. Any help is appreciated.
Use the CATX() function to concatenate the variables; with this function you have the option to specify the delimiter character to use between the values. Ex. CATX(',',VAR2,VAR3,VAR4)
Input Data:
data have;
input NAME $ VAR2 $ VAR3 $ VAR4 $ VAR5;
datalines;
ABC X Y . 2
DEF P Q R 3
GHI L . . 1
;
run;
Solution:
data want;
set have;
NewVar= catx(',',VAR2,VAR3,VAR4);
run;
or
%let list=VAR2,VAR3,VAR4;
data want2;
set have;
NewVar= catx(',',&list.);
run;
or (Tom's Recommendation)
data want3;
set have;
NewVar= catx(',',of var2-var4);
run;
Output:
NAME=ABC VAR2=X VAR3=Y VAR4= VAR5=2 NewVar=X,Y
NAME=DEF VAR2=P VAR3=Q VAR4=R VAR5=3 NewVar=P,Q,R
NAME=GHI VAR2=L VAR3= VAR4= VAR5=1 NewVar=L
I recently came across an issue when using Proc report whereby the below code outputs only the first observation:
data have ;
input var1-var3 ;
datalines ;
1 10 100
2 20 200
3 30 300
4 40 400
;run ;
proc report data=have ;
columns var1 var2 var3 ;
define var1 / 'Variable 1' width=10;
define var2 / 'Variable 2' width=10;
define var3 / 'Variable 3' width=10;
run ;
It will report all 4 observations correctly by either:
Changing var1 to be a character variable (input var1 $ var2-var3)
Explicitly defining define var1 to be define var1 / display
I'm trying to work out the logic of why this would be happening. It can't be that having the first variable in the report as numeric defaults to a group variable rather than display as all var1 values are unique so should be grouped separately - whereas only the first observation is reported. Can someone explain the logic?
I was able to find the answer of what's happening behind the scenes by adding the list option to the proc report statement...
input var1-var3 (3x numeric) puts the following to the log:
PROC REPORT DATA=WORK.HAVE LS=120 PS=44 SPLIT="/" CENTER ;
COLUMN ( var1 var2 var3 );
DEFINE var1 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 1" ;
DEFINE var2 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 2" ;
DEFINE var3 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 3" ;
RUN;
input var1 $ (var2 var3) (:) (setting first to character) puts the following to the log:
PROC REPORT DATA=WORK.HAVE LS=120 PS=44 SPLIT="/" CENTER ;
COLUMN ( var1 var2 var3 );
DEFINE var1 / DISPLAY FORMAT= $8. WIDTH=10 SPACING=2 LEFT "Variable 1" ;
DEFINE var2 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 2" ;
DEFINE var3 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 3" ;
RUN;
So, knowing that numeric variables have a default value of SUM at least explains what was causing it. Although it causes a problem on simple report like this, it does at least report a sum correctly if var1 is defined as a by group:
data have ;
input var1 var2 var3 ;
datalines ;
1 10 100
1 15 150
2 20 200
3 30 300
4 40 400
;run ;
proc report data=have list ;
columns var2 var3 ;
by var1 ;
define var2 / 'Variable 2' width=10;
define var3 / 'Variable 3' width=10;
run ;
You should just add options which describes what kind of variables they are; like group or analysis, like below:
proc report nowd data=have ;
columns var1 var2 var3 ;
define var1 / group width=10 'Variable 1';
define var2 / analysis width=10 'Variable 2';
define var3 / analysis width=10 'Variable 3';
run ;
Here is the result:
Variable 1 Variable 2 Variable 3
1 10 100
2 20 200
3 30 300
4 40 400
This is my data, let me call it 'time'.
VAR1 VAR2 VAR3 VAR4
02NOV14:10:23:00 02NOV14:10:38:00 02NOV14:10:38:00
02NOV14:12:52:00 02NOV14:13:05:00
02NOV14:18:57:00 02NOV14:19:14:00 02NOV14:19:14:00 02NOV14:19:14:00
03NOV14:10:13:00 03NOV14:10:13:00
03NOV14:16:33:00 03NOV14:17:29:00 03NOV14:17:29:00
03NOV14:12:35:00 03NOV14:12:40:00 03NOV14:12:40:00
03NOV14:13:26:00 03NOV14:13:59:00 03NOV14:13:59:00
03NOV14:14:34:00 03NOV14:14:41:00 03NOV14:14:41:00
03NOV14:15:12:00 03NOV14:15:14:00 03NOV14:15:14:00
03NOV14:15:48:00 03NOV14:16:18:00 03NOV14:16:18:00
03NOV14:15:51:00 06NOV14:14:46:00 06NOV14:14:46:00
07NOV14:11:35:00 07NOV14:12:15:00 07NOV14:12:15:00
07NOV14:12:32:00 07NOV14:14:34:00 07NOV14:14:34:00 07NOV14:14:34:00
07NOV14:12:18:00 07NOV14:12:19:00 07NOV14:12:19:00 07NOV14:12:19:00
08NOV14:20:57:00 08NOV14:21:03:00 08NOV14:21:03:00
and I want to create new variable
VAR5 = VAR2 - VAR1;
VAR6 = VAR3 - VAR1;
VAR7 = VAR4 - VAR1;
I use this code
data time;
set time;
VAR5 = VAR2 - VAR1;
VAR6 = VAR3 - VAR1;
VAR7 = VAR4 - VAR1;
run;
and when I print, VAR6 and VAR7 are empty. I guess because of missing values in VAR 3 and in VAR4 SAS doesn't calculate VAR6 and VAR7. How to get values for them when I have data?
Can someone help me?
times to times i have values for VAR3 and VAR4 and times to times I don't. So i want to have value for VAR 6 AND VAR7 when it's possible. for example for the first observation I can have VAR5 VAR6 but not VAR7 because there are no values for the second observation I can only have VAR5 ,for the third one normally I will get VAR5 VAR6 and VAR7.
Updated the code accordingly.
Not missing function will make sure calculations are happening only when the arguments are not missing.
data time;
set time;
if not missing(var1) then do;
if not missing(Var2) then VAR5 = VAR2 - VAR1;
if not missing(Var3) then VAR6 = VAR3 - VAR1;
if not missing(Var4) then VAR7 = VAR4 - VAR1;
end;
run;
I'd like to create dynamic entries in a data set (in SAS) formed using the names of variables (e.g. VarA, VarB, VarC) each having lags up to 4.
The input data set HAVE has this information (the column names are Variables and Values):
Variables Values
VarA 0
VarB 0
VarC 0
Lags 4
and the output data set WANT should be something like below (Var1, Var2, and Var3 are dynamic column names i.e. appending 1,2,3 to any string Var)
Var1 Var2 Var3
VarA VarB VarC
VarA1 VarB1 VarC1
..
VarA4 VarB4 VarC4
The intention is to have this work for any number of variables in HAVE data set.
Thanks
The following code returns what you want. Please modify according to your needs.
/*sample input dataset*/
data have;
input Variables $ Values;
datalines;
VarA 0
VarB 0
VarC 0
Lags 4
;
run;
/*get the no. of lags form the input dataset*/
proc sql noprint;
select Values into :num_of_lags from have where upcase(variables)='LAGS';
quit;
/*transpose the input dataset such that the VarA, VarB, VarC are put in columns Var1, Var2, & Var3 respectively*/
/*have_t, the transposed dataset only has 1 row.*/
proc transpose data = have out = have_t(drop = _name_) prefix = var;
where upcase(variables) ne 'LAGS';
var variables;
run;
/*replicate the 1 row in have_t num_of_lags times*/
data pre_want;
set have_t;
array myVars{*} _character_;
do j= 1 to &num_of_lags+1;
do i = 1 to dim(myVars);
myVars[i]=myVars[i];
end;
output;
end;
run;
/*final dataset*/
data want;
set pre_want;
array myVars{*} _character_;
if _N_>1 then do;
do i = 1 to dim(myVars);
myVars[i]=compress(myVars[i]!!_n_-1);
end;
end;
drop i j;
run;
proc print data = want; run;
Output:
var1 var2 var3
VarA VarB VarC
VarA1 VarB1 VarC1
VarA2 VarB2 VarC2
VarA3 VarB3 VarC3
VarA4 VarB4 VarC4
I want to test if a variable exists and if it doesn't, create it.
The open()&varnum() functions can be used. Non-zero output from varnum() indicates the variable exists.
data try;
input var1 var2 var3;
datalines;
7 2 2
5 5 3
7 2 7
;
data try2;
set try;
if _n_ = 1 then do;
dsid=open('try');
if varnum(dsid,'var4') = 0 then var4 = .;
rc=close(dsid);
end;
drop rc dsid;
run;
data try2;
set try;
var4 = coalesce(var4,.);
run;
(assuming var4 is numeric)
Assign var4 to itself. The assignment will create the variable if it doesn't exist and leave the contents in place if it does.
data try;
input var1 var2 var3;
datalines;
7 2 2
5 5 3
7 2 7
;
data try2;
set try;
var4 = var4;
run;
Just remember that creating var4 this way when it doesn't exist will use the default variable attributes, so you may need to use an explicit attrib statement if you require specific formatting/length etc.
This is a very late answer/comment, but this method works for me and is pretty simple (SAS 9.4). In the below example, I used missing numeric and character variables and assigned a value to the missing character variable is missing.
data try;
input var1 var2 var3;
datalines;
7 2 2
5 5 3
7 2 7
;
data try2;
length var4 $20;
length var5 8;
set try;
var4 = var4;
if var4 = ' ' then var4 = 'Not on Source File';
run;