Normalize a variable (divide by its total) - sas

I have a variable of weight, wprm, that takes integer values. I would like to have one that is the weight "normalized", that is to say wprm/sum(wprm)
I can do that by outputing a proc summary ant then a merge to put it back with the original data, and then dividing my wprm variable, but it seems a bit heavy, is there a simpler way ?

Use PROC STDIZE or PROC STANDARD - they both allow various normalization methods.
proc stdize data=have method=sum out=want;
var wprm;
run;

You can grab the macro %simple_normalize from here.
data test;
do i=1 to 10;
output;
end;
run;
%simple_normalize(test,i);

The other common option is SQL, but it will post a warning/note to the log that many people don't like.
proc sql;
create table want as
select a.*, a.wprm/sum(a.wprm) as weight
from have;
quit;

Related

SAS: Using Weight statement in a Proc Freq command error

In SAS (through WPS Workbench), I am trying to get some frequency counts on my data using the popn field (populations as integers) as a weight.
proc freq data= working.PC_pops noprint;
by District;
weight popn / zeros;
tables AreaType / out= _AreaType;
run;
However, when I run the code above, I am getting the following error pointing to my Weight statement:
ERROR: Found "/" when expecting ;
ERROR: Statement "/" is not valid
I have checked the syntax online and to include zero counts within my weighting, it definitely says to use the "/ zeros" option within the Weight statement, but SAS (WPS) is erroring? What am I doing wrong?
UPDATE: I have now discovered that the zeros option is not supported through WPS Workbench. Is there a workaround to this?
Given you're not using any of the advanced elements of PROC FREQ (the statistical tests), you may be better off using PROC TABULATE. That will allow you to define exactly what levels you want in your output, even if they have zero elements, using a few different methods. Here's a bit of a hacky solution, but it works (at least in SAS 9.4):
data class;
set sashelp.class;
weight=1;
if age=15 then weight=0;
run;
proc freq data=class;
weight weight/zeros;
tables age;
run;
proc tabulate data=class;
class age;
var weight;
weight weight; *note this is WEIGHT, but does not act like weight in PROC FREQ, so we have to hack it a bit by using it as an analysis variable which is annoying;
tables age,sumwgt='Count'*weight=' '*f=2.0;
run;
Both give the identical result. You can also use a CLASSDATA set, which is a bit less hacky but I'm not sure how well it's supported in non-SAS:
proc sort data=class out=class_classdata(keep=age) nodupkey;
by age;
run;
proc tabulate data=class classdata=class_classdata;
class age;
freq weight; *note this is FREQ not WEIGHT;
tables age,n*f=2.0/misstext='0';
run;

Single and Double Hash in Proc sql

I have 12 columns and I want to add them through sql. I have tried:
proc sql;
select*,sum(a1-a12) as total
from tablename;
quit;
However this isn't working. Is there an alternative or can we use single and double hash only in Data steps.
If you want to add values in the same observation then you need to use SAS function sum(,...) and not the SQL aggregate function sum(). You current code looks like the later since it only has one value listed, the difference between variables A1 and A12. This is because PROC SQL does not recognize variable lists. You will need to list all of your variables.
select *,sum(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12) as total
from have
;
If you want this in SQL because you're making use of other SQL functionality in addition to this, make a view.
data have_v/view=have_v;
set have;
total = sum(of a1-a12);
run;
proc sql;
select * from have_v; *presumably you do other things here;
quit;
In some cases you do not know how many variables there are or you don't want to hard code it. The syntax is in this case: sum(of < variable>:);
data test;
a1=1;
a2=2;
/*number 3 is missing*/
a4=4;
a5=5;
run;
data test2;
set test;
sum_of_all_As= sum(of a:);
run;
For more tips and tricks see: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245953.htm

SAS SD and Percentiles

I am writing a formula in SAS. I need to use the standard deviation and the percentiles in all of it. But I am not sure how to write that in SAS.
data test;
set test1;
if ((the 100th percentile of X)-(99th percentile of X))>(SD of X) then delete;
run;
I am just not sure how to write those out in SAS
The percentile and standard deviation are characteristics of the entire data, not just one observation. Your logic seems to suggest you would delete every observation. Presumably you actually want to compare each observation to some feature of the distribution.
The basic approach is to add the percentiles and standard deviation that you want as new variables to your data. You can use proc univariate with an output statement to calculate the statistics you're interested in and save them to a new data set.
You then merge this back into your original data, so you will now have the variables you need. At that point you can use essentially the same syntax you already have.
This should get you started:
data tmp;
do i=1 to 100;
x=rannor(123);
output;
end;
run;
proc univariate data=tmp noprint;
var x;
output out=pctls max=max p99=p99 std=std;
run;
data tmp;
if _n_=1 then do;
set pctls;
end;
set tmp;
/* Just making up a condition here */
if x>p99 then delete;
run;

Running All Variables Through a Function in SAS

I am new to SAS and need to sgplot 112 variables. The variable names are all very different and may change over time. How can I call each variable in the statement without having to list all of them?
Here is what I have done so far:
%macro graph(var);
proc sgplot data=monthly;
series x=date y=var;
title 'var';
run;
%mend;
%graph(gdp);
%graph(lbr);
The above code can be a pain since I have to list 112 %graph() lines and then change the names in the future as the variable names change.
Thanks for the help in advance.
List processing is the concept you need to deal with something like this. You can also use BY group processing or in the case of graphing Paneling in some cases to approach this issue.
Create a dataset from a source convenient to you that contains the list of variables. This could be an excel or text file, or it could be created from your data if there's a way to programmatically tell which variables you need.
Then you can use any of a number of methods to produce this:
proc sql;
select cats('%graph(',var,')')
into: graphlist separated by ' '
from yourdata;
quit;
&graphlist
For example.
In your case, you could also generate a vertical dataset with one row per variable, which might be easier to determine which variables are correct:
data citiwk;
set sashelp.citiwk;
var='COM';
val=WSPCA;
output;
var='UTI';
val=WSPUA;
output;
var='INDU';
val=WSPIA;
output;
val=WSPGLT;
var='GOV';
output;
keep val var date;
run;
proc sort data=citiwk;
by var date;
run;
proc sgplot data=citiwk;
by var;
series x=date y=val;
run;
While I hardcoded those four, you could easily create an array and use VNAME() to get the variable name or VLABEL() to get the variable label of each array element.

Convert the number of observations in a dataset into a macro variable

I am trying to determine the number of observations in a dataset, then convert this number into a macro variable that i can use as part of a loop. I've searched the web for answers and not had much luck. I would post some example code I've tried but I have literally no idea how to approach this.
Could anybody assist?
Thanks
Chris
SAS stores dataset information, such as number of observations, separately, so the key is to access this information without having to read in the entire dataset.
The following code will do just that, the if 0 part is never true so the dataset isn't read, however the information is.
data _null_;
if 0 then set sashelp.class nobs=n;
call symput('numobs',n);
stop;
run;
%put n=&numobs;
You can also get it from dictionary.tables like this:
proc sql noprint;
select nobs into :nobs
from dictionary.tables
where libname='YourLibrary' and memname='YourDatasetName';
quit;
Here it is:
Create macro variable:
data _null_;
set sashelp.class;
call symput("nbobs",_N_);
run;
See result:
%put &nbobs;
Use it:
data test;
do i = 1 to &nbobs;
put i;
end;
run;