So I have a dataset that has the sample size and the seed for each stratum.
How do I reference these in a proc surveyselect?
My Code:
proc surveyselect data=hca2 (where =(disp=1 and fin=1 and dol_str=1))
out=Disp1_Fin1_DS1
method=SRS
seed=seed
sampsize=samp_n;
run;
Does anyone know how to do this?
There is a secondary input data set reference in proc surveyselect, which may be your anwser.
SAS Help Center: Secondary Input Data Set
And the following is a simple example:
proc sort data = sashelp.class out = class;
by sex name;
run;
data config;
do sex = 'F','M';
_seed_ = 42;
_nsize_ = 6;
output;
end;
run;
proc surveyselect data = class out = result method = srs seed=config sampsize=config outseed;
strata sex;
run;
Open data set result and see if it is what you need.
Related
I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here
Suppose I have these data read into SAS:
I would like to list each unique name and the number of months it appeared in the data above to give a data set like this:
I have looked into PROC FREQ, but I think I need to do this in a DATA step, because I would like to be able to create other variables within the new data set and otherwise be able to manipulate the new data.
Data step:
proc sort data=have;
by name month;
run;
data want;
set have;
by name month;
m=month(lag(month));
if first.id then months=1;
else if month(date)^=m then months+1;
if last.id then output;
keep name months;
run;
Pro Sql:
proc sql;
select distinct name,count(distinct(month(month))) as months from have group by name;
quit;
While it's possible to do this in a data step, you wouldn't; you'd use proc freq or similar. Almost every PROC can give you an output dataset (rather than just print to the screen).
PROC FREQ data=sashelp.class;
tables age/out=age_counts noprint;
run;
Then you can use this output dataset (age_counts) as a SET input to another data step to perform your further calculations.
You can also use proc sql to group the variable and count how many are in that group. It might be faster than proc freq depending on how large your data is.
proc sql noprint;
create table counts as
select AGE, count(*) as AGE_CT from sashelp.class
group by AGE;
quit;
If you want to do it in a data step, you can use a Hash Object to hold the counted values:
data have;
do i=1 to 100;
do V = 'a', 'b', 'c';
output;
end;
end;
run;
data _null_;
set have end=last;
if _n_ = 1 then do;
declare hash cnt();
rc = cnt.definekey('v');
rc = cnt.definedata('v','v_cnt');
rc = cnt.definedone();
call missing(v_cnt);
end;
rc = cnt.find();
if rc then do;
v_cnt = 1;
cnt.add();
end;
else do;
v_cnt = v_cnt + 1;
cnt.replace();
end;
if last then
rc = cnt.output(dataset: "want");
run;
This is very efficient as it is a single loop over the data. The WANT data set contains the key and count values.
I have the original dataset
What I want is:
My code:
proc transpose data = lib.original
out= lib.new(rename=(col1=Mean col2=Median));
var WBCmean RBCmean WBCmedian RBCmedian;
run;
But I get
Can you give some hint?
EDIT
If I add by statement,
proc transpose data = lib.original
out= lib.new;
by Gender;
var WBCmean RBCmean WBCmedian RBCmedian;
run;
then I get
One way is to convert to a tall skinny format.
proc transpose data = have out=middle ;
by gender ;
var WBCmean RBCmean WBCmedian RBCmedian;
run;
Then generate the new variables needed,
data middle;
set middle ;
testcode = substr(_name_,1,3);
_name_ = substr(_name_,4);
run;
you might need to sort the data depending on the order of variable names in your original VAR statement,
proc sort data=middle;
by gender testcode _name_;
run;
and then transpose into the new arrangement.
proc transpose data=middle out=want ;
by gender testcode ;
id _name_;
var col1 ;
run;
I have a proc report that groups and does subtotals. If I only have one observation in the group, the subtotal is useless. I'd like to either not do the subtotal for that line or not do the observation there. I don't want to go with a line statement, due to inconsistent formatting\style.
Here's some sample data. In the report the Tiki (my cat) line should only have one line, either the obs from the data or the subtotal...
data tiki1;
name='Tiki';
sex='C';
age=10;
height=6;
weight=9.5;
run;
data test;
set sashelp.class tiki1;
run;
It looks like you are trying do something that proc report cannot achieve in one pass. If however you just want the output you describe here is an approach that does not use proc report.
proc sort data = test;
by sex;
run;
data want;
length sex $10.;
set test end = eof;
by sex;
_tot + weight;
if first.sex then _stot = 0;
_stot + weight;
output;
if last.sex and not first.sex then do;
Name = "";
sex = "Subtotal " || trim(sex);
weight = _stot;
output;
end;
keep sex name weight;
if eof then do;
Name = "";
sex = "Total";
weight = _tot;
output;
end;
run;
proc print data = want noobs;
run;
This method manually creates subtotals and a total in the dataset by taking rolling sums. If you wanted do fancy formatting you could pass this data through proc report rather than proc print, Joe gives an example here.
I'm a beginner in SAS and I have the following problem.
I need to calculate counts and percents of several variables (A B C) from one dataset and save the results to another dataset.
my code is:
proc freq data=mydata;
tables A B C / out=data_out ; run;
the result of the procedure for each variable appears in the SAS output window, but data_out contains the results only for the last variable. How to save them all in data_out?
Any help is appreciated.
ODS OUTPUT is your answer. You can't output directly using the OUT=, but you can output them like so:
ods output OneWayFreqs=freqs;
proc freq data=sashelp.class;
tables age height weight;
run;
ods output close;
OneWayFreqs is the one-way tables, (n>1)-way tables are CrossTabFreqs:
ods output CrossTabFreqs=freqs;
ods trace on;
proc freq data=sashelp.class;
tables age*height*weight;
run;
ods output close;
You can find out the correct name by running ods trace on; and then running your initial proc whatever (to the screen); it will tell you the names of the output in the log. (ods trace off; when you get tired of seeing it.)
Lots of good basic sas stuff to learn here
1) Run three proc freq statements (one for each variable a b c) with a different output dataset name so the datasets are not over written.
2) use a rename option on the out = statement to change the count and percent variables for when you combine the datasets
3) sort by category and merge all datasets together
(I'm assuming there are values that appear in in multiple variables, if not you could just stack the data sets)
data mydata;
input a $ b $ c$;
datalines;
r r g
g r b
b b r
r r r
g g b
b r r
;
run;
proc freq noprint data = mydata;
tables a / out = data_a
(rename = (a = category count = count_a percent = percent_a));
run;
proc freq noprint data = mydata;
tables b / out = data_b
(rename = (b = category count = count_b percent = percent_b));
run;
proc freq noprint data = mydata;
tables c / out = data_c
(rename = (c = category count = count_c percent = percent_c));
run;
proc sort data = data_a; by category; run;
proc sort data = data_b; by category; run;
proc sort data = data_c; by category; run;
data data_out;
merge data_a data_b data_c;
by category;
run;
As ever, there are lots of different ways of doing this sort of thing in SAS. Here are a couple of other options:
1. Use proc summary rather than proc freq:
proc summary data = sashelp.class;
class age height weight;
ways 1;
output out = freqs;
run;
2. Use multiple table statements in a single proc freq
This is more efficient than running 3 separate proc freq statements, as SAS only has to read the input dataset once rather than 3 times:
proc freq data = sashelp.class noprint;
table age /out = freq_age;
table height /out = freq_height;
table weight /out = freq_weight;
run;
data freqs;
informat age height weight count percent;
set freq_age freq_height freq_weight;
run;
This is a question I've dealt with many times and I WISH SAS had a better way of doing this.
My solution has been a macro that is generalized, provide your input data, your list of variables and the name of your output dataset. I take into consideration the format/type/label of the variable which you would have to do
Hope it helps:
https://gist.github.com/statgeek/c099e294e2a8c8b5580a
/*
Description: Creates a One-Way Freq table of variables including percent/count
Parameters:
dsetin - inputdataset
varlist - list of variables to be analyzed separated by spaces
dsetout - name of dataset to be created
Author: F.Khurshed
Date: November 2011
*/
%macro one_way_summary(dsetin, varlist, dsetout);
proc datasets nodetails nolist;
delete &dsetout;
quit;
*loop through variable list;
%let i=1;
%do %while (%scan(&varlist, &i, " ") ^=%str());
%let var=%scan(&varlist, &i, " ");
%put &i &var;
*Cross tab;
proc freq data=&dsetin noprint;
table &var/ out=temp1;
run;
*Get variable label as name;
data _null_;
set &dsetin (obs=1);
call symput('var_name', vlabel(&var.));
run;
%put &var_name;
*Add in Variable name and store the levels as a text field;
data temp2;
keep variable value count percent;
Variable = "&var_name";
set temp1;
value=input(&var, $50.);
percent=percent/100; * I like to store these as decimals instead of numbers;
format percent percent8.1;
drop &var.;
run;
%put &var_name;
*Append datasets;
proc append data=temp2 base=&dsetout force;
run;
/*drop temp tables so theres no accidents*/
proc datasets nodetails nolist;
delete temp1 temp2;
quit;
*Increment counter;
%let i=%eval(&i+1);
%end;
%mend;
%one_way_summary(sashelp.class, sex age, summary1);
proc report data=summary1 nowd;
column variable value count percent;
define variable/ order 'Variable';
define value / format=$8. 'Value';
define count/'N';
define percent/'Percentage %';
run;
EDIT (2022):
Better way of doing this is to use the ODS Tables:
/*This code is an example of how to generate a table with
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required
Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;
The option STACKODS(OUTPUT) added to PROC MEANS in 9.3 makes this a much simpler task.
proc means data=have n nmiss stackods;
ods output summary=want;
run;
| Variable | N | NMiss |
| ------ | ----- | ----- |
| a | 4 | 3 |
| b | 7 | 0 |
| c | 6 | 1 |