Does SAS has an equivalent of += or *= expression just like the C++ language?
For instance I have a list of variables X1-X10 and I would like to define a variable Final in the following form:
%let name= X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
%Macro Total;
%Do i=1 %to %sysfunc(countw(&name., " "));
%Let K= %sysfunc(scan(&name., %i., " "));
Final *= K;
%End;
%Mend;
Not sure why you are trying to do it with macro code. If you want to manipulate data use a data step.
There is a the sum statement.
var + (expression) ;
For example if I wanted to calculate a cumulative sum of a variable I could do this:
data want;
set have ;
cum_X + X;
run;
But there is no similar statement to do aggregate products. If you wanted to to make a similar thing using multiplication would need to spell out the RETAIN statement and use normal assignment statement.
data want;
set have ;
retain prod_x 1 ;
prod_x = prod_x * x;
run;
If you actually just have a macro variable with a list of variable names and you ant to operate strictly on a single observation at a time then you can just use text manipulation to convert the list into the calculation you want.
%let namelist = vara varb vard varx vary varz ;
data want ;
set have ;
sum = %sysfunc(tranwrd(&namelist,%str( ),+)) ;
product = %sysfunc(tranwrd(&namelist,%str( ),*)) ;
run;
If you want more control then define an array and loop over it. Here is an example using the DO OVER syntax. (SAS is trying to get rid of DO OVER so you can also do the same thing using an index variable and typing more characters in your array references.)
data want;
set have ;
array x &namelist ;
sum=0;
product=1;
do over x ;
if not missing(x) then do ;
sum= sum+x ;
product = product*x ;
end;
end;
run;
Related
I have a table which contains one key id and 100 variables (x1, x2, x3 ..... x100) and i need to check every variables if there are any values stored as -9999, -8888, -7777, -6666 in of them.
For one variable i use
proc sql;
select keyid, x1
from mytable
where x1 in(-9999,-8888,-7777,-6666);
quit;
This is the data i am trying to get but it is just for one variable.
I do not have time for copying and pasting all the variables (100 times) in this basic query.
I have searched the forum but the answers i have found are a bit far from what i actually need
and since i am new to SAS i can not write a macro.
Can you help me please?
Thanks.
Try this. Just made up some sample data that resembles what you describe :-)
data have;
do key = 1 to 1e5;
array x x1 - x100;
do over x;
x = rand('integer', -10000, -5000);
end;
output;
end;
run;
data want;
set have;
array x x1 - x100;
do over x;
if x in (-9999, -8888, -7777, -6666) then do;
output;
leave;
end;
end;
run;
Don't use SQL. Instead use normal SAS code so you can take advantage of SAS syntax like ARRAYs and variable lists.
So make an array containing the variable you want to look at. Then loop over the array. There is no need to keep looking once you find one.
data want;
set mytable;
array list var1 varb another_var x1-x10 Z: ;
found=0;
do index=1 to dim(list) until (found);
found = ( list[index] in (-9999 -8888 -7777 -6666) );
end;
if found;
run;
And if you want to search all of the numeric variables you can even use the special variable list _NUMERIC_ when defining the array:
array list _numeric_;
thank you for your help i have found a solution and wanted to share it with you.
It has some points that needs to be evaluated but it is fine for me now. (gets the job done)
`%LET LIB = 'LIBRARY';
%LET MEM = 'GIVENTABLE';
%PUT &LIB &MEM;
PROC SQL;
SELECT
NAME INTO :VARLIST SEPARATED BY ' '
FROM DICTIONARY.COLUMNS
WHERE
LIBNAME=&LIB
AND
MEMNAME=&MEM
AND
TYPE='num';
QUIT;
%PUT &VARLIST;
%MACRO COUNTS(INPUT);
%LOCAL i NEXT_VAR;
%DO i=1 %TO %SYSFUNC(COUNTW(&VARLIST));
%LET NEXT_VAR = %SCAN(&VARLIST, &i);
PROC SQL;
CREATE TABLE &NEXT_VAR AS
SELECT
COUNT(ID) AS NUMBEROFDESIREDVALUES
FROM &INPUT
WHERE
&NEXT_VAR IN (6666, 7777, 8888, 9999)
GROUP BY
&NEXT_VAR;
QUIT;
%END;
%MEND;
%COUNTS(GIVENTABLE);`
The answer you provided to your own question gives more insight to what you really wanted. However, the solution you offered while it works is not very efficient. The SQL statement runs 100 times for each variable in the source data. That means the source table is read 100 times. Another problem is that it creates 100 output tables. Why?
A better solution is to create 1 table that contains the counts for each of the 100 variables. Even better is to do it in 1 pass of the source data instead of 100.
data sum;
set have end=eof;
array x(*) x:;
array csum(100) _temporary_;
do i = 1 to dim(x);
x(i) = (x(i) in (-9999, -8888, -7777, -6666)); * flag (0 or 1) those meeting criteria;
csum(i) + x(i); * cumulative count;
if eof then do;
x(i) = csum(i); * move the final total to the orig variable;
end;
end;
if eof then output; * only output the final obs which has the totals;
drop key i;
run;
Partial result:
x1 x2 x3 x4 x5 x6 x7 x8 ...
90 84 88 85 81 83 59 71 ...
You can keep it in that form or you can transpose it.
proc transpose data=sum out=want (rename=(col1=counts))
name=variable;
run;
Partial result:
variable counts
x1 90
x2 84
x3 88
x4 85
x5 81
... ...
I'm SAS user.
I want to assign year columns using date values.
for example, here is my code, below.
I want to make Y_2010, Y_2011, Y_2012 , Y_2013, Y_2014 in work.total data set.
but there is only Y_2014 as a result.
How can I change the code as I can get right result which I intended first?
options mcompilenote = all;
%let a = Y_ ;
%macro B(YMIN, YMAX) ;
%do i = &YMIN %to &YMAX ;
DATA TOTAL ;
SET SASUSER.EMPDATA ;
IF YEAR(HIREDATE) = &i THEN &a&i = 1 ;
ELSE &a&i = 0 ;
RUN;
%end;
%mend;
%B (2010, 2014) ;
Because you are repeatedly re-creating the output dataset only the final version is available. To fix the macro move the %DO loop inside the DATA step so that you are generating all of the variables in a single data step.
%macro B(YMIN, YMAX) ;
DATA TOTAL ;
SET SASUSER.EMPDATA ;
%do i = &YMIN %to &YMAX ;
IF YEAR(HIREDATE) = &i THEN &a&i = 1 ;
ELSE &a&i = 0 ;
%end;
RUN;
%mend;
But there is no need to a macro for this. Just use normal SAS statements. For example you could use an ARRAY statement to define the variables and then loop over the array and set the values. Note that the result of a boolean expression in SAS is 0 when false and 1 when true so you can eliminate the IF/THEN/ELSE statement and just use an assignment statement.
DATA TOTAL ;
SET SASUSER.EMPDATA ;
array &a &a&ymin - &a&ymax;
do i=&ymin to &ymax ;
&a[i-&ymin+1] = (year(hiredata)=i);
end;
drop i;
RUN;
I have many datasets for many years from 2001 to 2014 which look like the following. Each year is stored in one file, yXXXX.sas7bdat,
ID Weight X1 X2 X3
1 100 1 2 4
2 300 4 3 4
and I need to create a dataset where for each year we have the (weighted) sums of each of the X columns.
X1 X2 X3 Year
10 20 30 2014
40 15 20 2013
I would be happy to implement this into a macro but I am unsure of a way to isolate column sums, and also an efficient way to attach results together (proc append?)
Edit: Including an attempt.
%macro final_dataset;
%do i = 2001 %to 2014;
/*Code here which enables me to get the column sums I am interested in.*/
proc means data = y&i;
weight = weight;
X1 = SUM X1;
X2 = SUM X2;
X3 = SUM X3;
OUTPUT OUT = sums&i;
run;
data final;
set final sums&i;
run;
%end;
%mend;
Edit: Another attempt.
%macro final_dataset;
%do i = 2001 %to 2014;
/*Code here which enables me to get the column sums I am interested in.*/
proc means data = y&i SUM;
weight = weight;
var X1 X2 X3;
OUTPUT OUT = sums&i;
run;
data final;
set final sums&i;
run;
%end;
%mend;
Edit: Final.
%macro final_dataset;
%do i = 2001 %to 2014;
/*Code here which enables me to get the column sums I am interested in.*/
proc means data = y&i SUM NOPRINT;
weight = weight;
var X1 X2 X3;
OUTPUT OUT = sums&i sum(X1 X2 X3) = X1 X2 X3;
run;
data final;
set final sums&i;
run;
%end;
%mend;
This is probably what I'd do, append all the data sets together and run one proc means. You didn't mention how big the data sets are, but I'm assuming smaller data.
data combined;
length source year $50.;
set y2001-y2014 indsname=source;
*you can tweak this variable so it looks how you want it to;
year=source;
run;
proc means data=combined noprint nway;
class year;
var x1 x2 x3;
output out=want sum= ;
run;
I'm trying to find the max of four variables, Value_A Value_B Value_C Value_D, within a macro. I thought I could do %sysfunc(max(value_&i.)) but that isn't working. My full code is:
%let i = (A B C D);
%macro maxvalue;
data want;
set have;
%do j = 1 %to %sysfunc(countw(&list.));
%let i = %scan(&list.,&j.);
value_&i.= Sale_&i. - int_&i.
Max_Value = %sysfunc(max(value_&i.));
%end;
run;
%mend maxvalue;
%maxvalue;
I should specify that I only want the max of the four variables for each observation. Thanks for your help!
Aside from the typo - %let i=(A B C D); should be %let list=(A B C D) - you're a) overcomplicating it, and b) confusing macro syntax with datastep syntax. Whilst you could do this using a macro, there is no need.
Given the variables in question are all prefixed in a similar manner (although it would be even better if they were numerically-suffixed, e.g. Value1, Value2), it's far easier to use arrays and the appropriate functions :
data want ;
set have ;
array sale{*} Sale_A Sale_B Sale_C Sale_D ;
array int{*} Int_A Int_B Int_C Int_D ;
array value{*} Value_A Value_B Value_C Value_D ;
/* Iterate over array */
do i = 1 to dim(sale) ;
value{i} = sum(sale{i},-int{i}) ;
end ;
max_value = max(of value{*}) ;
run ;
As aforementioned, you're over-complicating this, but you can achieve what you're trying to do using macro logic by including another for loop within your max_value assignment. This method involves you taking the max of your four variables and a missing value, which should produce the desired result:
%let list = A B C D;
%macro maxvalue;
data want;
set have;
%do j = 1 %to %sysfunc(countw(&list.));
%let i = %scan(&list.,&j.);
value_&i.= Sale_&i. - int_&i.
%end;
max_value = max(
%do x = 1 %to %sysfunc(countw(&list.));
%let y = %scan(&list.,&x.);
value_&y.,
%end; .
);
run;
%mend maxvalue;
%maxvalue;
Why not just rename your variables to SALE_1 to SALE_4? Then you can reference them with a simple variable list SALE_1-SALE_4.
If you are going to use non-numeric suffixes on lists of similarly named variables then perhaps what you really need is a simple function style macro to generate the lists of variable names based on a base name and list of suffix values.
%macro generate_names(base,list);
&base%sysfunc(tranwrd(%sysfunc(compbl(&list)),%str( ),%str( &base)))
%mend generate_names;
Then it is easier to generate variable lists to use for ARRAY statements
%let suffixes=A B C D;
array sale %generate_names(Sale_,&suffixes);
array int %generate_names(Int_,&suffixes);
array value %generate_names(Value_,&suffixes);
and other statements.
max_value = max(of %generate_names(Value_,&suffixes)) ;
I want to have a mean which is based in non zero values for given variables using proc means only.
I know we do can calculate using proc sql, but I want to get it done through proc means or proc summary.
In my study I have 8 variables, so how can I calculate mean based on non zero values where in I am using all of those in the var statement as below:
proc means = xyz;
var var1 var2 var3 var4 var5 var6 var7 var8;
run;
If we take one variable at a time in the var statement and use a where condition for non zero variables , it works but can we have something which would work for all the variables of interest mentioned in the var statement?
Your suggestions would be highly appreciated.
Thank you !
One method is to change all of your zero values to missing, and then use PROC MEANS.
data zeromiss /view=zeromiss ;
set xyz ;
array n{*} var1-var8 ;
do i = 1 to dim(n) ;
if n{i} = 0 then call missing(n{i}) ;
end ;
drop i ;
run ;
proc means data=zeromiss ;
var var1-var8 ;
run ;
Create a view of your input dataset. In the view, define a weight variable for each variable you want to summarise. Set the weight to 0 if the corresponding variable is 0 and 1 otherwise. Then do a weighted summary via proc means / proc summary. E.g.
data xyz_v /view = xyz_v;
set xyz;
array weights {*} weight_var1-weight_var8;
array vars {*} var1-var8;
do i = 1 to dim(vars);
weights[i] = (vars[i] ne 0);
end;
run;
%macro weighted_var(n);
%do i = 1 to &n;
var var&i /weight = weight_var&i;
%end;
%mend weighted_var;
proc means data = xyz_v;
%weighted_var(8);
run;
This is less elegant than Chris J's solution for this specific problem, but it generalises slightly better to other situations where you want to apply different weightings to different variables in the same summary.
Can't you use a data statement?
data lala;
set xyz;
drop qty;
mean = 0;
qty = 0;
if(not missing(var1) and var1 ^= 0) then do;
mean + var1;
qty + 1;
end;
if(not missing(var2) and var2 ^= 0) then do;
mean + var2;
qty + 1;
end;
/* ... repeat to all variables ... */
if(not missing(var8) and var8 ^= 0) then do;
mean + var8;
qty + 1;
end;
mean = mean/qty;
run;
If you want to keep the mean in the same xyz dataset, just replace lala with xyz.