SAS: Using Weight statement in a Proc Freq command error - sas

In SAS (through WPS Workbench), I am trying to get some frequency counts on my data using the popn field (populations as integers) as a weight.
proc freq data= working.PC_pops noprint;
by District;
weight popn / zeros;
tables AreaType / out= _AreaType;
run;
However, when I run the code above, I am getting the following error pointing to my Weight statement:
ERROR: Found "/" when expecting ;
ERROR: Statement "/" is not valid
I have checked the syntax online and to include zero counts within my weighting, it definitely says to use the "/ zeros" option within the Weight statement, but SAS (WPS) is erroring? What am I doing wrong?
UPDATE: I have now discovered that the zeros option is not supported through WPS Workbench. Is there a workaround to this?

Given you're not using any of the advanced elements of PROC FREQ (the statistical tests), you may be better off using PROC TABULATE. That will allow you to define exactly what levels you want in your output, even if they have zero elements, using a few different methods. Here's a bit of a hacky solution, but it works (at least in SAS 9.4):
data class;
set sashelp.class;
weight=1;
if age=15 then weight=0;
run;
proc freq data=class;
weight weight/zeros;
tables age;
run;
proc tabulate data=class;
class age;
var weight;
weight weight; *note this is WEIGHT, but does not act like weight in PROC FREQ, so we have to hack it a bit by using it as an analysis variable which is annoying;
tables age,sumwgt='Count'*weight=' '*f=2.0;
run;
Both give the identical result. You can also use a CLASSDATA set, which is a bit less hacky but I'm not sure how well it's supported in non-SAS:
proc sort data=class out=class_classdata(keep=age) nodupkey;
by age;
run;
proc tabulate data=class classdata=class_classdata;
class age;
freq weight; *note this is FREQ not WEIGHT;
tables age,n*f=2.0/misstext='0';
run;

Related

Store the value generated by PROC FREQ

The PROC Freq function generates a frequency table with the percentage, and the frequency of the variables. Is there a way to store the percentage of the variable for later use (like use this value to create dataset, create graphs)?
ODS OUTPUT is your friend any time you want to take the results of a PROC. Many procs, including PROC FREQ, also have output options built-in, but the generic one is ODS OUTPUT.
First, use ODS TRACE to identify the (internal to SAS) name of the table you need.
ods trace on;
proc freq data=sashelp.class;
tables age;
run;
ods trace off;
This generates:
Output Added:
-------------
Name: OneWayFreqs
Label: One-Way Frequencies
Template: Base.Freq.OneWayFreqs
Path: Freq.Table1.OneWayFreqs
-------------
Now you can use that with ods output:
ods output OneWayFreqs=myfreqdata;
proc freq data=sashelp.class;
tables age;
run;
ods output close; *this is technically not needed, but I like to have it for clarity;
It will make a table called myfreqdata (name that something useful please!). It isn't always "pretty"; sometimes you are better off using the built in output options in a proc, because they're more usefully formatted, but usually you can get somewhere from it.
Good afternoon,
It sounds like you want to output the results of the PROC FREQ to a SAS dataset. This will allow you to use them later, whether it is for further analysis or visualization. To do that, simply pass the a dataset name to the out= option after the table statement. You will notice below I also used the noprint option. This avoids needlessly printing the table.
For example,
proc freq data=work.dataset noprint;
table var1*var2 / out=work.output_data;
run;
I hope this helps!

Required ordering for statements and options within SAS procedures

In many cases, one can choose any order for statements and options within SAS procedures.
For instance, as far as statements' order is concerned, the two following
PROC FREQ, in which the order of the BY and the TABLES statements is interverted,
are equivalent:
PROC SORT DATA=SASHELP.CLASS OUT=class;
BY Sex;
RUN;
PROC FREQ DATA=class;
BY Sex;
TABLES Age;
RUN;
PROC FREQ DATA=class;
TABLES Age;
BY Sex;
RUN;
In a similar way, as far as options' order is concerned, the two following PROC PRINT, in which the order of the OBS= and the FIRSTOBS= options is interverted, are equivalent:
PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5);
RUN;
PROC PRINT DATA=SASHELP.CLASS (OBS=5 FIRSTOBS=2 OBS=5);
RUN;
But there is some exceptions.
For instance, as far as options' order is concerned, among the two following PROC PRINT, in which the location of the NOOBS option is different, the second PROC PRINT, where the NOOBS option is preceding the parentheses, results in an error while the first PROC PRINT is correct:
PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5) NOOBS;
RUN;
PROC PRINT DATA=SASHELP.CLASS NOOBS (FIRSTOBS=2 OBS=5);
RUN;
Similarly, as far as statements' order is concerned, I occasionally met cases where a certain statement must be placed before other(s) statement(s) - but, unfortunately, I don't remember in which procedure (probably a statistical one, for duration or multilevel models).
While the ordering question within data steps might be seen as a completely different question, because within data steps the statements' order is most of the time a matter of logic, the way of ordering some statements looks like being partly a matter of conventional ordering, as within procedures; it is for instance the case in the following merging procedure, where the MERGE statement must precede the BY statement; but I suppose that SAS could have been designed to understand these statements in any order:
/* to get a simple example of merge I start with artificially cutting the Class dataset in two parts */
PROC SORT DATA=SASHELP.CLASS OUT=class;
BY Name;
RUN;
DATA sex_and_age;
SET class (KEEP=Name Sex Age);
RUN;
DATA height_and_weight;
SET class (KEEP=Name Height Weight);
RUN;
DATA all_variables;
MERGE sex_and_age height_and_weight;
BY Name;
RUN;
Because I am unable to find out such a guide, my question is: does it exist a text devoted to the question of the required order for statements and options within SAS procedures?
Joel,
Let me address the NOOBS example to help clarify. The 2 statements:
PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5) NOOBS;
PROC PRINT DATA=SASHELP.CLASS NOOBS (FIRSTOBS=2 OBS=5);
Those are dataset options and they affect the read of the dataset. There are a number of them, including KEEP, DROP, WHERE, etc. NOOBS is not a dataset so you get an error. Dataset options are subsequent to the dataset name.
The order of statements, in many cases, is important because it sets the PDV (program data vector). Hence, why an ATTRIB should be at the top of a data step. Some procs, it doesn't matter since they will all be combined for execution.
data test;
attrib myNewVar length=$8 format=$20.
myNewVar2 format=date.
;
set sashelp.class;
myNewVar = 'Hey Joel!';
myNewVar2 = '24FEB2020'd;
run;
A parenthetical list of name=value pairs after a data set specifier are known as data set options. Thus you need to be able to anticipate what the SAS submit parser will be doing.
* (...) applies to SASHELP.CLASS;
PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5);
* (...) are where a option name or options name=value is expected -- error ensues;
PROC PRINT DATA=SASHELP.CLASS NOOBS (FIRSTOBS=2 OBS=5);
* (...) applies to SASHELP.CLASS, NOOBS is in a proper option location within the PROC statement;
PROC PRINT NOOBS DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5);
Any special statement ordering is found in the PROC documentation. Some procs have common syntax and documentation will redirect you.
Your first point appears to be caused by not understanding what dataset options are. Otherwise order of optional parts of statement (like PROC PRINT) will be specified in the documentation for that statement.
To the second point it appears you are confusing the purpose of the BY statement in a PROC and the BY statement in a data step. In a PROC step the BY statement tells it to process the data in groups. In a DATA step the BY statement must be linked to a specific MERGE/SET/UPDATE statement.

SAS Proc Freq For All Variables - But Collapsing Extra Categories

Exploring a new & large SAS dataset. Many years ago I had a solution that did proc freq on all numeric or all character vars within a dataset.
But it kept just the most frequent categories (user-specified) and munged the rest of the categories (or values of response) into one large one for the sake of simplicity.
There's no default that does that I'm aware of, but it wouldn't be difficult to code one.
In general, you can do this using _numeric_ or _character_ to reference the list of variables.
proc freq data=have;
table _numeric_ ; *all numeric variables;
table _character_; *all character variables;
table _all_; *all variables;
run;
*all variables;
proc freq data=have;
run;

Is it possible to do a paired t-test is SAS using panel data?

I am working with paneldata that looks something like this:
I am going to perform a t-test in SAS 9.4 to find out if there is a significant change in var1 from 2014 to 2016, and I am assuming that I have to use a paired t-test, since I have several an observation in both 2014 and 2016 for each individual (ID).
My question is, can this be done in SAS, when I am using panel data like the one I have shown? Or do I need to create a a wide dataset with one variable containing the data from 2014 and one variable containing the data from 2016? I know that I have to do that in STATA, but maybe I don't have to change my entire dataset to do this in SAS?
You will have to transpose your data to to a paired t-test. You can use PROC TRANSPOSE though.
*sort for transpose;
proc sort data=have; by id year; run;
*reformat from long to wide;
proc transpose data=have out=want prefix=Year_;
by ID;
ID Year;
Var Var1;
run;
*Paired T-Test;
proc ttest data=want;
paired Year_2014*Year_2016;
run;
PS. Please include your data as text not an image in the future. We cannot write code off an image and I'm not typing out your data, so at present this is untested but should work.

Normalize a variable (divide by its total)

I have a variable of weight, wprm, that takes integer values. I would like to have one that is the weight "normalized", that is to say wprm/sum(wprm)
I can do that by outputing a proc summary ant then a merge to put it back with the original data, and then dividing my wprm variable, but it seems a bit heavy, is there a simpler way ?
Use PROC STDIZE or PROC STANDARD - they both allow various normalization methods.
proc stdize data=have method=sum out=want;
var wprm;
run;
You can grab the macro %simple_normalize from here.
data test;
do i=1 to 10;
output;
end;
run;
%simple_normalize(test,i);
The other common option is SQL, but it will post a warning/note to the log that many people don't like.
proc sql;
create table want as
select a.*, a.wprm/sum(a.wprm) as weight
from have;
quit;