Single and Double Hash in Proc sql - sas

I have 12 columns and I want to add them through sql. I have tried:
proc sql;
select*,sum(a1-a12) as total
from tablename;
quit;
However this isn't working. Is there an alternative or can we use single and double hash only in Data steps.

If you want to add values in the same observation then you need to use SAS function sum(,...) and not the SQL aggregate function sum(). You current code looks like the later since it only has one value listed, the difference between variables A1 and A12. This is because PROC SQL does not recognize variable lists. You will need to list all of your variables.
select *,sum(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12) as total
from have
;

If you want this in SQL because you're making use of other SQL functionality in addition to this, make a view.
data have_v/view=have_v;
set have;
total = sum(of a1-a12);
run;
proc sql;
select * from have_v; *presumably you do other things here;
quit;

In some cases you do not know how many variables there are or you don't want to hard code it. The syntax is in this case: sum(of < variable>:);
data test;
a1=1;
a2=2;
/*number 3 is missing*/
a4=4;
a5=5;
run;
data test2;
set test;
sum_of_all_As= sum(of a:);
run;
For more tips and tricks see: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245953.htm

Related

2 ids with 3 distinct groups in SAS

hi I just wanted to know what would be the code in a data step for this outcome
I have the following:
and need the below:
please direct me to another page if this was answered previously...in addition how would I asked something like this in the future.
Thanks
Do you really need to do this in a data step? You might as well use proc sql for a Cartesian product.
proc sql;
create table want as
select code, groupds
from (select distinct code from have) a,
(select distinct groupds from have) b;
quit;
Here is how you can do it in a data step.
data want;
set have(keep=code where=(not missing(code)));
do i=1 to n;
set have(keep=groupds where=(not missing(groupds))) point=i nobs=n;
output;
end;
run;
The issue with this method is if you have a duplicate code or groupds a record will be created for the duplicate entry.

Normalize a variable (divide by its total)

I have a variable of weight, wprm, that takes integer values. I would like to have one that is the weight "normalized", that is to say wprm/sum(wprm)
I can do that by outputing a proc summary ant then a merge to put it back with the original data, and then dividing my wprm variable, but it seems a bit heavy, is there a simpler way ?
Use PROC STDIZE or PROC STANDARD - they both allow various normalization methods.
proc stdize data=have method=sum out=want;
var wprm;
run;
You can grab the macro %simple_normalize from here.
data test;
do i=1 to 10;
output;
end;
run;
%simple_normalize(test,i);
The other common option is SQL, but it will post a warning/note to the log that many people don't like.
proc sql;
create table want as
select a.*, a.wprm/sum(a.wprm) as weight
from have;
quit;

Running All Variables Through a Function in SAS

I am new to SAS and need to sgplot 112 variables. The variable names are all very different and may change over time. How can I call each variable in the statement without having to list all of them?
Here is what I have done so far:
%macro graph(var);
proc sgplot data=monthly;
series x=date y=var;
title 'var';
run;
%mend;
%graph(gdp);
%graph(lbr);
The above code can be a pain since I have to list 112 %graph() lines and then change the names in the future as the variable names change.
Thanks for the help in advance.
List processing is the concept you need to deal with something like this. You can also use BY group processing or in the case of graphing Paneling in some cases to approach this issue.
Create a dataset from a source convenient to you that contains the list of variables. This could be an excel or text file, or it could be created from your data if there's a way to programmatically tell which variables you need.
Then you can use any of a number of methods to produce this:
proc sql;
select cats('%graph(',var,')')
into: graphlist separated by ' '
from yourdata;
quit;
&graphlist
For example.
In your case, you could also generate a vertical dataset with one row per variable, which might be easier to determine which variables are correct:
data citiwk;
set sashelp.citiwk;
var='COM';
val=WSPCA;
output;
var='UTI';
val=WSPUA;
output;
var='INDU';
val=WSPIA;
output;
val=WSPGLT;
var='GOV';
output;
keep val var date;
run;
proc sort data=citiwk;
by var date;
run;
proc sgplot data=citiwk;
by var;
series x=date y=val;
run;
While I hardcoded those four, you could easily create an array and use VNAME() to get the variable name or VLABEL() to get the variable label of each array element.

SAS dynamically declaring macro variable

My company just switched from R to SAS and I am converting a lot of my R code to SAS. I am having a huge issue dynamically declaring variables (macro variables) in SAS.
For example one of my processes needs to take in the mean of a column and then apply it throughout the code in many steps.
%let numm =0;
I have tried the following with my numm variable but both methods do not work and I cannot seem to find anything online.
PROC MEANS DATA = ASSGN3.COMPLETE mean;
#does not work
&numm = VAR MNGPAY;
run;
Proc SQL;
#does not work
&numm =(Select avg(Payment) from CORP.INV);
quit;
I would highly recommend buying a book on SAS or taking a class from SAS Training. SAS Programming II is a good place to start (Programming I if you have not programmed anything else, but that doesn't sound like the case). The code you have shows you need it. It is a complete paradigm shift from R.
That said, try this:
proc sql noprint;
select mean(payment) into :numm from corp.inv;
quit;
%put The mean is: &numm;
Here's the proc summary / data step equivalent:
proc summary data = corp.inv;
var payment;
output out = inv_summary mean=;
run;
data _null_;
set inv_summary;
call symput('numm',payment);
run;
%put The mean is: &numm;
Proc sql is a more compact approach if you just want a simple arithmetic mean, but if you require more complex statistics it makes sense to use proc summary.

Sas macro with proc sql

I want to perform some regression and i would like to count the number of nonmissing observation for each variable. But i don't know yet which variable i will use. I've come up with the following solution which does not work. Any help?
Here basically I put each one of my explanatory variable in variable. For example
var1 var 2 -> w1 = var1, w2= var2. Notice that i don't know how many variable i have in advance so i leave room for ten variables.
Then store the potential variable using symput.
data _null_;
cntw=countw(&parameters);
i = 1;
array w{10} $15.;
do while(i <= cntw);
w[i]= scan((&parameters"),i, ' ');
i = i +1;
end;
/* store a variable globally*/
do j=1 to 10;
call symput("explanVar"||left(put(j,3.)), w(j));
end;
run;
My next step is to perform a proc sql using the variable i've stored. It does not work as
if I have less than 10 variables.
proc sql;
select count(&explanVar1), count(&explanVar2),
count(&explanVar3), count(&explanVar4),
count(&explanVar5), count(&explanVar6),
count(&explanVar7), count(&explanVar8),
count(&explanVar9), count(&explanVar10)
from estimation
;quit;
Can this code work with less than 10 variables?
You haven't provided the full context for this project, so it's unclear if this will work for you - but I think this is what I'd do.
First off, you're in SAS, use SAS where it's best - counting things. Instead of the PROC SQL and the data step, use PROC MEANS:
proc means data=estimation n;
var &parameters.;
run;
That, without any extra work, gets you the number of nonmissing values for all of your variables in one nice table.
Secondly, if there is a reason to do the PROC SQL, it's probably a bit more logical to structure it this way.
proc sql;
select
%do i = 1 %to %sysfunc(countw(&parameters.));
count(%scan(&parameters.,&i.) ) as Parameter_&i., /* or could reuse the %scan result to name this better*/
%end; count(1) as Total_Obs
from estimation;
quit;
The final Total Obs column is useful to simplify the code (dealing with the extra comma is mildly annoying). You could also put it at the start and prepend the commas.
You finally could also drive this from a dataset rather than a macro variable. I like that better, in general, as it's easier to deal with in a lot of ways. If your parameter list is in a data set somewhere (one parameter per row, in the dataset "Parameters", with "var" as the name of the column containing the parameter), you could do
proc sql;
select cats('%countme(var=',var,')') into :countlist separated by ','
from parameters;
quit;
%macro countme(var=);
count(&var.) as &var._count
%mend countme;
proc sql;
select &countlist from estimation;
quit;
This I like the best, as it is the simplest code and is very easy to modify. You could even drive it from a contents of estimation, if it's easy to determine what your potential parameters might be from that (or from dictionary.columns).
I'm not sure about your SAS macro, but the SQL query will work with these two notes:
1) If you don't follow your COUNT() functions with an identifier such as "COUNT() AS VAR1", your results will not have field headings. If that's ok with you, then you may not need to worry about it. But if you export the data, it will be helpful for you if you name them by adding "...AS "MY_NAME".
2) For observations with fewer than 10 variables, the query will return NULL values. So don't worry about not getting all of the results with what you have, because as long as the table you're querying has space for 10 variables (10 separate fields), you will get data back.