I recently came across an issue when using Proc report whereby the below code outputs only the first observation:
data have ;
input var1-var3 ;
datalines ;
1 10 100
2 20 200
3 30 300
4 40 400
;run ;
proc report data=have ;
columns var1 var2 var3 ;
define var1 / 'Variable 1' width=10;
define var2 / 'Variable 2' width=10;
define var3 / 'Variable 3' width=10;
run ;
It will report all 4 observations correctly by either:
Changing var1 to be a character variable (input var1 $ var2-var3)
Explicitly defining define var1 to be define var1 / display
I'm trying to work out the logic of why this would be happening. It can't be that having the first variable in the report as numeric defaults to a group variable rather than display as all var1 values are unique so should be grouped separately - whereas only the first observation is reported. Can someone explain the logic?
I was able to find the answer of what's happening behind the scenes by adding the list option to the proc report statement...
input var1-var3 (3x numeric) puts the following to the log:
PROC REPORT DATA=WORK.HAVE LS=120 PS=44 SPLIT="/" CENTER ;
COLUMN ( var1 var2 var3 );
DEFINE var1 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 1" ;
DEFINE var2 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 2" ;
DEFINE var3 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 3" ;
RUN;
input var1 $ (var2 var3) (:) (setting first to character) puts the following to the log:
PROC REPORT DATA=WORK.HAVE LS=120 PS=44 SPLIT="/" CENTER ;
COLUMN ( var1 var2 var3 );
DEFINE var1 / DISPLAY FORMAT= $8. WIDTH=10 SPACING=2 LEFT "Variable 1" ;
DEFINE var2 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 2" ;
DEFINE var3 / SUM FORMAT= BEST9. WIDTH=10 SPACING=2 RIGHT "Variable 3" ;
RUN;
So, knowing that numeric variables have a default value of SUM at least explains what was causing it. Although it causes a problem on simple report like this, it does at least report a sum correctly if var1 is defined as a by group:
data have ;
input var1 var2 var3 ;
datalines ;
1 10 100
1 15 150
2 20 200
3 30 300
4 40 400
;run ;
proc report data=have list ;
columns var2 var3 ;
by var1 ;
define var2 / 'Variable 2' width=10;
define var3 / 'Variable 3' width=10;
run ;
You should just add options which describes what kind of variables they are; like group or analysis, like below:
proc report nowd data=have ;
columns var1 var2 var3 ;
define var1 / group width=10 'Variable 1';
define var2 / analysis width=10 'Variable 2';
define var3 / analysis width=10 'Variable 3';
run ;
Here is the result:
Variable 1 Variable 2 Variable 3
1 10 100
2 20 200
3 30 300
4 40 400
Related
Below is the sample data.
NAME VAR2 VAR3 VAR4 VAR5
ABC X Y 2
DEF P Q R 3
GHI L 1
The count of variables (from VAR2-VAR4) is present under VAR5 for each record, I want the following output with NewVar as the concatenation of the variables which contain a value.
NAME VAR2 VAR3 VAR4 VAR5 NewVar
ABC X Y 2 X,Y
DEF P Q R 3 P,Q,R
GHI L 1 L
I have no clue how to do it in SAS. Any help is appreciated.
Use the CATX() function to concatenate the variables; with this function you have the option to specify the delimiter character to use between the values. Ex. CATX(',',VAR2,VAR3,VAR4)
Input Data:
data have;
input NAME $ VAR2 $ VAR3 $ VAR4 $ VAR5;
datalines;
ABC X Y . 2
DEF P Q R 3
GHI L . . 1
;
run;
Solution:
data want;
set have;
NewVar= catx(',',VAR2,VAR3,VAR4);
run;
or
%let list=VAR2,VAR3,VAR4;
data want2;
set have;
NewVar= catx(',',&list.);
run;
or (Tom's Recommendation)
data want3;
set have;
NewVar= catx(',',of var2-var4);
run;
Output:
NAME=ABC VAR2=X VAR3=Y VAR4= VAR5=2 NewVar=X,Y
NAME=DEF VAR2=P VAR3=Q VAR4=R VAR5=3 NewVar=P,Q,R
NAME=GHI VAR2=L VAR3= VAR4= VAR5=1 NewVar=L
I would like to categorize variables from one table which looks like this:
Var1 Var2
19 0.2
30 0.1
45 0.2
With table that stores conditions for the categroziation
variable condition category
Var1 Var1<20 1
Var1 40>Var1>=20 2
Var1 Var1>=40 3
Var2 Var2<0.2 1
Var2 Var2>=0.2 2
And the result of that would be a new table created containing categories of variables based on first table:
Var1 Var2
1 2
2 1
3 2
This is just a duplicate of this previous question. Categorize variables basing on conditions from other data set
Code generation from data is much easier to create and debug if you just use SAs code to do it and not add in complications of macro code.
Here is the answer again in more detail. First let's make your example data printouts into actual SAS datasets.
data rawdata ;
input Var1 Var2;
cards;
19 0.2
30 0.1
45 0.2
;
data metadata ;
input variable :$32. condition :$200. category ;
cards;
Var1 Var1<20 1
Var1 40>Var1>=20 2
Var1 Var1>=40 3
Var2 Var2<0.2 1
Var2 Var2>=0.2 2
;
Now let's generate an SQL select statement with a CASE statement to generate each output variable from the metadata.
filename code temp;
data _null_;
set metadata end=eof;
by variable ;
file code ;
retain sep ' ';
if _n_=1 then put "create table want as select";
if first.variable then put sep $1. 'case ';
put ' when (' condition ') then ' category ;
if last.variable then put ' else . end as ' variable ;
if eof then put 'from rawdata' / ';' ;
sep=',' ;
run;
And run it.
proc sql;
%include code / source2 ;
quit;
Example SAS LOG:
1639 proc sql;
1640 %include code / source2 ;
NOTE: %INCLUDE (level 1) file CODE is file C:\Users\xxx\AppData\Local\Temp\1\SAS Temporary Files\_TD13724_AMRL20B7F00CGPP_\#LN00654.
1641 +create table want as select
1642 + case
1643 + when (Var1<20 ) then 1
1644 + when (40>Var1>=20 ) then 2
1645 + when (Var1>=40 ) then 3
1646 + else . end as Var1
1647 +,case
1648 + when (Var2<0.2 ) then 1
1649 + when (Var2>=0.2 ) then 2
1650 + else . end as Var2
1651 +from rawdata
1652 +;
NOTE: Table WORK.WANT created, with 3 rows and 2 columns.
Results:
Obs Var1 Var2
1 1 2
2 2 1
3 3 2
If you want to convert it to macro then just replace the hard coded input dataset names and output dataset names with macro variable references.
%macro gencat(indata=,outdata=,metadata=metadata);
filename code temp;
data _null_;
set &metadata end=eof;
by variable ;
file code ;
retain sep ' ';
if _n_=1 then put "create table &outdata as select";
if first.variable then put sep $1. 'case ';
put ' when (' condition ') then ' category ;
if last.variable then put ' else . end as ' variable ;
if eof then put "from &indata" / ';' ;
sep=',' ;
run;
proc sql;
%include code / nosource2 ;
quit;
%mend gencat;
So now the same result is gotten by calling with these values:
%gencat(indata=rawdata,outdata=want)
So the log now looks like this:
1783 %gencat(indata=rawdata,outdata=want)
MPRINT(GENCAT): filename code temp;
NOTE: PROCEDURE SQL used (Total process time):
real time 10.35 seconds
cpu time 0.20 seconds
MPRINT(GENCAT): data _null_;
MPRINT(GENCAT): set metadata end=eof;
MPRINT(GENCAT): by variable ;
MPRINT(GENCAT): file code ;
MPRINT(GENCAT): retain sep ' ';
MPRINT(GENCAT): if _n_=1 then put "create table want as select";
MPRINT(GENCAT): if first.variable then put sep $1. 'case ';
MPRINT(GENCAT): put ' when (' condition ') then ' category ;
MPRINT(GENCAT): if last.variable then put ' else . end as ' variable ;
MPRINT(GENCAT): if eof then put "from rawdata" / ';' ;
MPRINT(GENCAT): sep=',' ;
MPRINT(GENCAT): run;
NOTE: The file CODE is:
Filename=C:\Users\AppData\Local\Temp\1\SAS Temporary Files\_TD13724_AMRL20B7F00CGPP_\#LN00659,
RECFM=V,LRECL=32767,File Size (bytes)=0,
Last Modified=02Feb2018:12:36:39,
Create Time=02Feb2018:12:36:39
NOTE: 12 records were written to the file CODE.
The minimum record length was 1.
The maximum record length was 28.
NOTE: There were 5 observations read from the data set WORK.METADATA.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
MPRINT(GENCAT): proc sql;
MPRINT(GENCAT): create table want as select case when (Var1<20 ) then 1 when (40>Var1>=20 ) then 2 when (Var1>=40 ) then 3 else .
end as Var1 ,case when (Var2<0.2 ) then 1 when (Var2>=0.2 ) then 2 else . end as Var2 from rawdata ;
NOTE: Table WORK.WANT created, with 3 rows and 2 columns.
MPRINT(GENCAT): quit;
Here is a macro way to accomplish this. It assumes that the conditions in the table are in the order you want them applied and grouped by variable. If not, then sort the table appropriately.
First test data:
data have;
input Var1 Var2;
datalines;
19 0.2
30 0.1
45 0.2
;
data conditions;
informat variable condition $32.;
input variable $ condition $ category;
datalines;
Var1 Var1<20 1
Var1 40>Var1>=20 2
Var1 Var1>=40 3
Var2 Var2<0.2 1
Var2 Var2>=0.2 2
;
Now make a macro. We will read the table into macro variables and then write a datastep to apply them. We use IF/THEN/ELSE blocks for each variable.
%macro apply_conditions();
%local i j n;
proc sql noprint;
select count(*) into :n trimmed from conditions;
%do i=1 %to &n;
%local var&i;
%local condition&i;
%local category&i;
%end;
select variable, condition, category
into :var1 - :var&n,
:condition1 - :condition&n,
:category1 - :category&n
from conditions;
quit;
data want;
set have;
%do i=1 %to &n;
/*If the variable changes, then don't add the ELSE */
%if &i>1 %then %do;
%let j=%eval(&i-1);
%if &&var&i = &&var&j %then %do;
else
%end;
%end;
/*apply the condition*/
if &&condition&i then
&&var&i = &&category&i;
%end;
run;
%mend;
Finally run the macro. Using MPRINT to see the code that is generated.
options mprint;
%apply_conditions;
I have a dataset with a lot of lines and I'm studying a group of variables.
For each line and each variable, I want to know if the value is equal to the max for this variable or more than or equal to 10.
Expected output (with input as all variables without _B) :
(you can replace T/F by TRUE/FALSE or 1/0 as you wish)
+----+------+--------+------+--------+------+--------+
| ID | Var1 | Var1_B | Var2 | Var2_B | Var3 | Var3_B |
+----+------+--------+------+--------+------+--------+
| A | 1 | F | 5 | F | 15 | T |
| B | 1 | F | 5 | F | 7 | F |
| C | 2 | T | 5 | F | 15 | T |
| D | 2 | T | 6 | T | 10 | T |
+----+------+--------+------+--------+------+--------+
Note that for Var3, the max is 15 but since 15>=10, any value >=10 will be counted as TRUE.
Here is what I've maid up so far (doubt it will be any help but still) :
%macro pleaseWorkLittleMacro(table, var, suffix);
proc means NOPRINT data=&table;
var &var;
output out=Varmax(drop=_TYPE_ _FREQ_) max=;
run;
proc transpose data=Varmax out=Varmax(rename=(COL1=varmax));
run;
data Varmax;
set Varmax;
varmax = ifn(varmax<10, varmax, 10);
run; /* this outputs the max for every column, but how to use it afterward ? */
%mend;
%pleaseWorkLittleMacro(MY_TABLE, VAR1 VAR2 VAR3 VAR4, _B);
I have the code in R, works like a charm but I really have to translate it to SAS :
#in a for loop over variable names, db is my data.frame, x is the
#current variable name and x2 is the new variable name
x.max = max(db[[x]], na.rm=T)
x.max = ifelse(x.max<10, x.max, 10)
db[[x2]] = (db[[x]] >= x.max) %>% mean(na.rm=T) %>% percent(2)
An old school sollution would be to read the data twice in one data step;
data expect ;
input ID $ Var1 Var1_B $ Var2 Var2_B $ Var3 Var3_B $ ;
cards;
A 1 F 5 F 15 T
B 1 F 5 F 7 F
C 2 T 5 F 15 T
D 2 T 6 T 10 T
;
run;
data my_input;
set expect;
keep ID Var1 Var2 Var3 ;
proc print;
run;
It is a good habit to declare the most volatile things in your code as macro variables.;
%let varList = Var1 Var2 Var3;
%let markList = Var1_B Var2_B Var3_B;
%let varCount = 3;
Read the data twice;
data my_result;
set my_input (in=maximizing)
my_input (in=marking);
Decklare and Initialize arrays;
format &markList $1.;
array _vars [&&varCount] &varList;
array _maxs [&&varCount] _temporary_;
array _B [&&varCount] &markList;
if _N_ eq 1 then do _varNr = 1 to &varCount;
_maxs(_varNr) = -1E15;
end;
While reading the first time, Calculate the maxima;
if maximizing then do _varNr = 1 to &varCount;
if _vars(_varNr) gt _maxs(_varNr) then _maxs(_varNr) = _vars(_varNr);
end;
While reading the second time, mark upt to &maxMarks maxima;
if marking then do _varNr = 1 to &varCount;
if _vars(_varNr) eq _maxs(_varNr) or _vars(_varNr) ge 10
then _B(_varNr) = 'T';
else _B(_varNr) = 'F';
end;
Drop all variables starting with an underscore, i.e. all my working variables;
drop _:;
Only keep results when reading for the second time;
if marking;
run;
Check results;
proc print;
var ID Var1 Var1_B Var2 Var2_B Var3 Var3_B;
proc compare base=expect compare=my_result;
run;
This is quite simple to solve in sql
proc sql;
create table my_result as
select *
, Var1_B = (Var1 eq max_Var1)
, Var1_B = (Var2 eq max_Var2)
, Var1_B = (Var3 eq max_Var3)
from my_input
, (select max(Var1) as max_Var1
, max(Var2) as max_Var2
, max(Var3) as max_Var3)
;
quit;
(Not tested, as our SAS server is currently down, which is the reason I pass my time on Stack Overflow)
If you need that for a lot of variables, consult the system view VCOLUMN of SAS:
proc sql;
select ''|| name ||'_B = ('|| name ||' eq max_'|| name ||')'
, 'max('|| name ||') as max_'|| name
from sasHelp.vcolumn
where libName eq 'WORK'
and memName eq 'MY_RESULT'
and type eq 'num'
and upcase(name) like 'VAR%'
;
into : create_B separated by ', '
, : select_max separated by ', '
create table my_result as
select *, &create_B
, Var1_B = (Var1 eq max_Var1)
, Var1_B = (Var2 eq max_Var2)
, Var1_B = (Var3 eq max_Var3)
from my_input
, (select max(Var1) as max_Var1
, max(Var2) as max_Var2
, max(Var3) as max_Var3)
;
quit;
(Again not tested)
After Proc MEANS computes the maximum value for each column you can run a data step that combines the original data with the maximums.
data want;
length
ID $1 Var1 8 Var1_B $1. Var2 8 Var2_B $1. Var3 8 Var3_B $1. var4 8 var4_B $1;
input
ID Var1 Var1_B Var2 Var2_B Var3 Var3_B ; datalines;
A 1 F 5 F 15 T
B 1 F 5 F 7 F
C 2 T 5 F 15 T
D 2 T 6 T 10 T
run;
data have;
set want;
drop var1_b var2_b var3_b var4_b;
run;
proc means NOPRINT data=have;
var var1-var4;
output out=Varmax(drop=_TYPE_ _FREQ_) max= / autoname;
run;
The neat thing the VAR statement is that you can easily list numerically suffixed variable names. The autoname option automatically appends _ to the names of the variables in the output.
Now combine the maxes with the original (have). The set varmax automatically retains the *_max variables, and they will not get overwritten by values from the original data because the varmax variable names are different.
Arrays are used to iterate over the values and apply the business logic of flagging a row as at max or above 10.
data want;
if _n_ = 1 then set varmax; * read maxes once from MEANS output;
set have;
array values var1-var4;
array maxes var1_max var2_max var3_max var4_max;
array flags $1 var1_b var2_b var3_b var4_b;
do i = 1 to dim(values); drop i;
flags(i) = ifc(min(10,maxes(i)) <= values(i),'T','F');
end;
run;
The difficult part above is that the MEANS output creates variables that can not be listed using the var1 - varN syntax.
When you adjust the naming convention to have all your conceptually grouped variable names end in numeric suffixes the code is simpler.
* number suffixed variable names;
* no autoname, group rename on output;
proc means NOPRINT data=have;
var var1-var4;
output out=Varmax(drop=_TYPE_ _FREQ_ rename=var1-var4=max_var1-max_var4) max= ;
run;
* all arrays simpler and use var1-varN;
data want;
if _n_ = 1 then set varmax;
set have;
array values var1-var4;
array maxes max_var1-max_var4;
array flags $1 flag_var1-flag_var4;
do i = 1 to dim(values); drop i;
flags(i) = ifc(min(10,maxes(i)) <= values(i),'T','F');
end;
run;
You can use macro code or arrays, but it might just be easier to transform your data into a tall variable/value structure.
So let's input your test data as an actual SAS dataset.
data expect ;
input ID $ Var1 Var1_B $ Var2 Var2_B $ Var3 Var3_B $ ;
cards;
A 1 F 5 F 15 T
B 1 F 5 F 7 F
C 2 T 5 F 15 T
D 2 T 6 T 10 T
;
First you can use PROC TRANSPOSE to make the tall structure.
proc transpose data=expect out=tall ;
by id ;
var var1-var3 ;
run;
Now your rules are easy to apply in PROC SQL step. You can derive a new name for the flag variable by appending a suffix to the original variable's name.
proc sql ;
create table want_tall as
select id
, cats(_name_,'_Flag') as new_name
, case when col1 >= min(max(col1),10) then 'T' else 'F' end as value
from tall
group by 2
order by 1,2
;
quit;
Then just flip it back to horizontal and merge with the original data.
proc transpose data=want_tall out=flags (drop=_name_);
by id;
id new_name ;
var value ;
run;
data want ;
merge expect flags;
by id;
run;
I am using proc arima in SAS 9.4 to produce a forecast using a previously calibrated model, but it is throwing an error as if it is trying to calibrate the model itself :
ERROR: There is not enough data to fit the model
sample data:
data inputs;
input x var1 var2 var3 var4 var5;
datalines;
20 5 2 4 5 4
25 12 56 13 44 4
20 5 2 4 5 4
25 12 56 13 44 4
20 5 2 4 5 4
25 12 56 13 44 4
. 2 5 6 5 4
;
failing version:
proc arima;
identify
data = inputs
var = x
crossCorr = ( var1 var2 var3 var4 var5 )
noPrint;
estimate
p = 1 input = ( var1 var2 var3 var4 var5 )
ar = 0.9
initVal = ( 0.1$var1 0.2$var2 0.3$var3 0.4$var4 0.4$var5 )
noint
noEst /* Using noEst so should not need to do any estimation and short data-set should not be a problem */
method=ml
noprint
;
forecast lead=1 out=outputs noOutAll noprint;
quit;
If I remove the final variable from the model, it works fine:
proc arima;
identify
data = inputs
var = x
crossCorr = ( var1 var2 var3 var4 )
noPrint;
estimate
p = 1 input = ( var1 var2 var3 var4 )
ar = 0.9
initVal = ( 0.1$var1 0.2$var2 0.3$var3 0.4$var4 )
noint
noEst /* Using noEst so should not need to do any estimation and short data-set should not be a problem */
method=ml
noprint
;
forecast lead=1 out=outputs noOutAll noprint;
quit;
I can also get it to 'work' by adding one more value to the data. However, this shouldn't be necessary when the model is already calibrated (using much more data).
I've checked the SAS documentation to see if there are any flags to prevent the unnecessary check that causes this error but none of them helped.
The answer has been provided on the SAS communities forum. It is known behaviour and so my uncommon use case is not supported. The only workaround would be to add some dummy data, but in my case with MA terms that would change the results.
Response on SAS Communities
I want to test if a variable exists and if it doesn't, create it.
The open()&varnum() functions can be used. Non-zero output from varnum() indicates the variable exists.
data try;
input var1 var2 var3;
datalines;
7 2 2
5 5 3
7 2 7
;
data try2;
set try;
if _n_ = 1 then do;
dsid=open('try');
if varnum(dsid,'var4') = 0 then var4 = .;
rc=close(dsid);
end;
drop rc dsid;
run;
data try2;
set try;
var4 = coalesce(var4,.);
run;
(assuming var4 is numeric)
Assign var4 to itself. The assignment will create the variable if it doesn't exist and leave the contents in place if it does.
data try;
input var1 var2 var3;
datalines;
7 2 2
5 5 3
7 2 7
;
data try2;
set try;
var4 = var4;
run;
Just remember that creating var4 this way when it doesn't exist will use the default variable attributes, so you may need to use an explicit attrib statement if you require specific formatting/length etc.
This is a very late answer/comment, but this method works for me and is pretty simple (SAS 9.4). In the below example, I used missing numeric and character variables and assigned a value to the missing character variable is missing.
data try;
input var1 var2 var3;
datalines;
7 2 2
5 5 3
7 2 7
;
data try2;
length var4 $20;
length var5 8;
set try;
var4 = var4;
if var4 = ' ' then var4 = 'Not on Source File';
run;