I have a SAS dataset whose column layout is like this:
Col1 Col2 Col3
A_jan2018 A_feb2018 A_mar2018
B_jan2018 B_feb2018 B_mar2018
C_jan2018 C_feb2018 C_mar2018
I need to re-order the columns that start with A or B or C in such a format --
Col1 Col2 Col3
A_Jan2018 B_Jan2018 C_Jan2018
A_Feb2018 B_Feb2018 C_Feb2018
A_Mar2018 B_Mar2018 C_Mar2018
The A,B,C prefixes need not be in any sorting order (meaning they can start with anything), but my requirement is to re-order them based on the month-year (meaning B_Jan2018 A_Feb2018 C_2018 is okay).
Is there any way of achieving this in SAS?
Change the data structure so that you have a long dat set instead of
a short data set. (DATA STEP)
Separate out the prefix from date portion (DATA STEP)
Sort into your desired order (PROC SORT)
Transpose to desired format (PROC TRANSPOSE)
%*create sample data;
data have;
informat col1 col2 col3 $10.;
input Col1 $ Col2 $ Col3 $;
cards;
A_jan2018 A_feb2018 A_mar2018
B_jan2018 B_feb2018 B_mar2018
C_jan2018 C_feb2018 C_mar2018
;
run;
%*Make it wide table;
data _long;
set have;
array _col(3) col1-col3;
do i=1 to 3;
prefix=scan(_col(i), 1, "_");
date=input(scan(_col(i), 2, "_"), anydtdte.);
value=catx('_', prefix, put(date, monyy7.));
output;
end;
format date date9.;
run;
%*Sort by desired output;
proc sort data=_long;
by date prefix;
run;
%* transpose to the desired format;
proc transpose data=_long out=want1;
by i;
var value;
run;
If your data is exactly as posted and the output is transposed exactly then this also works but its entirely reliant on the source data being as specified and sorted correctly.
proc transpose data=have out=want2;
var col1-col3;
run;
Related
Essentially I need to reorder my data set.
The data consists of 4 columns, one for each treatment group. I'm trying to run a simple 1-way anova in SAS but I don't know how to reorder the data so that there are two columns, one with the responses and one with the treatments.
Here is some example code to create example data sets.
data have;
input A B C D;
cards;
26.72 37.42 11.23 44.33
28.58 56.46 29.63 76.86
29.71 51.91 . .
;
run;
data want;
input Response Treatment $;
cards;
26.72 A
28.58 A
29.71 A
37.42 B
56.46 B
51.91 B
11.23 C
29.63 C
44.33 D
76.86 D
;
run;
I'm sure this is a very simple solution but I didn't see the same thing asked elsewhere on the site. I'm typically an R user but have to use SAS for this analysis so I could be looking for the wrong keywords.
I have used proc transpose for this, see below
/*1. create row numbers for each obs*/
data have1;
set have;
if _n_=1 then row=1;
else row+1;
run;
proc sort data=have1; by row; run;
/*Transpose the dataset by row number*/
proc transpose data=have1 out=want0;
by row;
run;
/*Final dataset by removing missing values*/
data want;
set want0;
drop row;
if COL1=. then delete;
rename _NAME_=Response
COL1=Treatment;
run;
proc sort data=want; by Response; run;
proc print data=want; run;
If that is your data for which you must use SAS just read so that you get the structure required for ANOVA.
data have;
do rep=1 to 3;
do trt='A','B','C','D';
input y #;
output;
end;
end;
cards;
26.72 37.42 11.23 44.33
28.58 56.46 29.63 76.86
29.71 51.91 . .
;;;;
run;
proc print;
run;
Data IV_SAS;
set IV;
Total_Loans=Goods+Bads;
Dist_Loans=Total_Loans/sum(Total_Loans));
Dist_Goods=Goods/Sum(Goods);
Dist_Bads=Bads/Sum(Bads);
Difference=Dist_Goods-Dist_Bads;
WOE=log10(Dist_goods/Dist_Bads);
IV=WOE*Difference;
run;
I am facing issues in calculating sum of (Total Loans),its calculating Row total instead of column total.
That's how Base SAS works - it operates on row level in the data step.
You would want to use PROC MEANS or PROC TABULATE or similar proc and find the column total there, then merge that on (or combine in another method).
For example:
proc means data=sashelp.class;
var age height weight;
output out=class_means sum(age)=age_sum sum(height)=height_sum sum(weight)=weight_Sum;
run;
data class;
if _n_=1 then set class_means;
set sashelp.class;
age_prop = age/age_sum;
height_prop = height/height_sum;
weight_prop = weight/weight_Sum;
run;
Alternately, use SAS/IML or PROC SQL, both of which will operate on the column level when asked inline (though I think the above solution is likely superior in speed to both due to lower overhead).
data a;
input goods bads;
datalines;
36945 33337
23820 21761
26990 24647
33195 30299
43755 39014
46100 41100
89765 79978
25940 23508
35940 32506
31840 28846
33430 30366
34480 31388
36640 33129
39640 35992
42490 38325
44240 40075
42840 38840
49690 44936
69190 64740
;
run;
proc sql;
create table b as
select goods,bads,
sum(goods,bads) as Total_Loans format=dollar10.,
sum(goods)as Column_goods_tot format=dollar10. ,
sum(bads) as Column_bads_tot format=dollar10. ,
sum(calculated Column_goods_tot, calculated Column_bads_tot) as Column_Total_Loans format=dollar10. ,
(calculated Total_Loans/calculated Column_Total_Loans) as Dist_Loans
/*add more code to calculate Dist_Goods, Dist_Bads, etc..*/
from a;
quit;
/*Column totals only*/
proc sql;
create table c as
select
sum(goods)as Column_goods_tot format=dollar10. ,
sum(bads) as Column_bads_tot format=dollar10. ,
sum(calculated Column_goods_tot, calculated Column_bads_tot) as Column_Total_Loans format=dollar12.
from a;
quit;
I would like to use proq freq to count the number of food types that someone consumed on a specific day(fint variable). My data is in long format with repeated idno for the different food types and different number of interview dates. However SAS hangs and does not run the code. I have more than 300,000 datalines.Is there another way to do this?
proc freq;
tables idno*fint*foodtype / out=countft;
run;
I am a little unsure of your data structure, but proc means can also count.
Assuming that you have multiple dates for each person, and multiple food types for each date, you can use:
data dataset;
set dataset;
count=1;
run;
proc means data=dataset sum;
class idno fint foodtype;
var count;
output out=countft sum=counftpday;
run;
/* Usually you only want the lines with the largest _type_, so keep going here */
proc sql noprint;
select max(_type_) into :want from countft;
quit; /*This grabs the max _type_ from output file */
data countft;
set countft;
where _type_=&want.;
run;
Try a proc sql:
proc sql;
create table want as
select distinct idno, fint, foodtype, count(*) as count
from have
order by 1, 2, 3;
quit;
Worse case scenario, sort and count in a data step.
proc sort data=have;
by idno fint foodtype;
run;
data count;
set have;
by idno fint foodtype;
if first.foodtype then count=1;
else count+1;
if last.foodtype then output;
run;
I'm not very familiar with Do Loops in SAS and was hoping to get some help. I have data that looks like this:
Product A: 1
Product A: 2
Product A: 4
I'd like to transpose (easy) and flag that Product A: 3 is missing, but I need to do this iteratively to the i-th degree since the number of products is large.
If I run the transpose part in SAS, my first column will be 1, second column will be 2, and third column will be 4 - but I'd really like the third column to be missing and the fourth column to be 4.
Any thoughts? Thanks.
Get some sample data:
proc sort data=sashelp.iris out=sorted;
by species;
run;
Determine the largest column we will need to transpose to. Depending on your situation you may just want to hardcode this value using a %let max=somevalue; statement:
proc sql noprint;
select cats(max(sepallength)) into :max from sorted;
quit;
%put &=max;
Transpose the data using a data step:
data want;
set sorted;
by species;
retain _1-_&max;
array a[1:&max] _1-_&max;
if first.species then do;
do cnt = lbound(a) to hbound(a);
a[cnt] = .;
end;
end;
a[sepallength] = sepallength;
if last.species then do;
output;
end;
keep species _1-_&max;
run;
Notice we are defining an array of columns: _1,_2,_3,..._max. This happens in our array statement.
We then use by-group processing to populate these newly created columns for a single species at a time. For each species, on the first record, we clear the array. For each record of the species, we populate the appropriate element of the array. On the final record for the species output the array contents.
You need a way to tell SAS that you have 4 products and the values are 1-4. In this example I create dummy ID with the needed information then transpose using ID statement to name new variables using the value of product.
data product;
input id product ##;
cards;
1 1 1 2 1 4
2 2 2 3
;;;;
run;
proc print;
run;
data productspace;
if 0 then set product;
do product = 1 to 4;
output;
end;
stop;
run;
data productV / view=productV;
set productspace product;
run;
proc transpose data=productV out=wide(where=(not missing(id))) prefix=P;
by id;
var product;
id product;
run;
proc print;
run;
I'm a beginner in SAS and I have the following problem.
I need to calculate counts and percents of several variables (A B C) from one dataset and save the results to another dataset.
my code is:
proc freq data=mydata;
tables A B C / out=data_out ; run;
the result of the procedure for each variable appears in the SAS output window, but data_out contains the results only for the last variable. How to save them all in data_out?
Any help is appreciated.
ODS OUTPUT is your answer. You can't output directly using the OUT=, but you can output them like so:
ods output OneWayFreqs=freqs;
proc freq data=sashelp.class;
tables age height weight;
run;
ods output close;
OneWayFreqs is the one-way tables, (n>1)-way tables are CrossTabFreqs:
ods output CrossTabFreqs=freqs;
ods trace on;
proc freq data=sashelp.class;
tables age*height*weight;
run;
ods output close;
You can find out the correct name by running ods trace on; and then running your initial proc whatever (to the screen); it will tell you the names of the output in the log. (ods trace off; when you get tired of seeing it.)
Lots of good basic sas stuff to learn here
1) Run three proc freq statements (one for each variable a b c) with a different output dataset name so the datasets are not over written.
2) use a rename option on the out = statement to change the count and percent variables for when you combine the datasets
3) sort by category and merge all datasets together
(I'm assuming there are values that appear in in multiple variables, if not you could just stack the data sets)
data mydata;
input a $ b $ c$;
datalines;
r r g
g r b
b b r
r r r
g g b
b r r
;
run;
proc freq noprint data = mydata;
tables a / out = data_a
(rename = (a = category count = count_a percent = percent_a));
run;
proc freq noprint data = mydata;
tables b / out = data_b
(rename = (b = category count = count_b percent = percent_b));
run;
proc freq noprint data = mydata;
tables c / out = data_c
(rename = (c = category count = count_c percent = percent_c));
run;
proc sort data = data_a; by category; run;
proc sort data = data_b; by category; run;
proc sort data = data_c; by category; run;
data data_out;
merge data_a data_b data_c;
by category;
run;
As ever, there are lots of different ways of doing this sort of thing in SAS. Here are a couple of other options:
1. Use proc summary rather than proc freq:
proc summary data = sashelp.class;
class age height weight;
ways 1;
output out = freqs;
run;
2. Use multiple table statements in a single proc freq
This is more efficient than running 3 separate proc freq statements, as SAS only has to read the input dataset once rather than 3 times:
proc freq data = sashelp.class noprint;
table age /out = freq_age;
table height /out = freq_height;
table weight /out = freq_weight;
run;
data freqs;
informat age height weight count percent;
set freq_age freq_height freq_weight;
run;
This is a question I've dealt with many times and I WISH SAS had a better way of doing this.
My solution has been a macro that is generalized, provide your input data, your list of variables and the name of your output dataset. I take into consideration the format/type/label of the variable which you would have to do
Hope it helps:
https://gist.github.com/statgeek/c099e294e2a8c8b5580a
/*
Description: Creates a One-Way Freq table of variables including percent/count
Parameters:
dsetin - inputdataset
varlist - list of variables to be analyzed separated by spaces
dsetout - name of dataset to be created
Author: F.Khurshed
Date: November 2011
*/
%macro one_way_summary(dsetin, varlist, dsetout);
proc datasets nodetails nolist;
delete &dsetout;
quit;
*loop through variable list;
%let i=1;
%do %while (%scan(&varlist, &i, " ") ^=%str());
%let var=%scan(&varlist, &i, " ");
%put &i &var;
*Cross tab;
proc freq data=&dsetin noprint;
table &var/ out=temp1;
run;
*Get variable label as name;
data _null_;
set &dsetin (obs=1);
call symput('var_name', vlabel(&var.));
run;
%put &var_name;
*Add in Variable name and store the levels as a text field;
data temp2;
keep variable value count percent;
Variable = "&var_name";
set temp1;
value=input(&var, $50.);
percent=percent/100; * I like to store these as decimals instead of numbers;
format percent percent8.1;
drop &var.;
run;
%put &var_name;
*Append datasets;
proc append data=temp2 base=&dsetout force;
run;
/*drop temp tables so theres no accidents*/
proc datasets nodetails nolist;
delete temp1 temp2;
quit;
*Increment counter;
%let i=%eval(&i+1);
%end;
%mend;
%one_way_summary(sashelp.class, sex age, summary1);
proc report data=summary1 nowd;
column variable value count percent;
define variable/ order 'Variable';
define value / format=$8. 'Value';
define count/'N';
define percent/'Percentage %';
run;
EDIT (2022):
Better way of doing this is to use the ODS Tables:
/*This code is an example of how to generate a table with
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required
Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;
The option STACKODS(OUTPUT) added to PROC MEANS in 9.3 makes this a much simpler task.
proc means data=have n nmiss stackods;
ods output summary=want;
run;
| Variable | N | NMiss |
| ------ | ----- | ----- |
| a | 4 | 3 |
| b | 7 | 0 |
| c | 6 | 1 |