I have a categorical variable, say SALARY_GROUP, and a group variable, say COUNTRY. I would like to get the relative frequency of SALARY_GROUP within COUNTRY in SAS. Is it possible to get it by proc SUMMARY or proc means?
Perhaps explore proc tabulate and a counter variable?
Yes, You can calculate the relative frequency of a categorical variable using both Proc Means and Proc Summary. For both procs you have to:
-Specify NWAY in the proc statement,
-Specify in the Class statement your categorical fields,
-Specify in the Var statement your response or numeric field.
Example below is for proc means:
Dummy Data:
/*Dummy Data*/
data work.have;
input Country $ Salary_Group $ Value;
datalines;
USA Group1 100
USA Group1 100
GBR Group1 100
GBR Group1 100
USA Group2 20
USA Group2 20
GBR Group2 20
GBR Group1 100
;
run;
Code:
*Calculating Frequncy and saving output to table sg_means*/
proc means data=have n nway ;
class Country Salary_Group;
var Value;
output out=sg_means n=frequency;
run;
Output Table:
Country=GBR Salary_Group=Group1 _TYPE_=3 _FREQ_=3 frequency=3
Country=GBR Salary_Group=Group2 _TYPE_=3 _FREQ_=1 frequency=1
Country=USA Salary_Group=Group1 _TYPE_=3 _FREQ_=2 frequency=2
Country=USA Salary_Group=Group2 _TYPE_=3 _FREQ_=2 frequency=2
Related
I need to make a TOP 10 table with month by month history like:
The problem is I have only "melted" data:
Is there a SAS procedure to create new columns in a dataset based on a variable values like in my example?
PROC TRANSPOSE
proc sort data=have;
by type descending date;
run;
proc transpose data=have
out=want(drop=_NAME_)
;
by type;
id date;
var total;
run;
Your "melted" data is in a categorical form that is lock and load ready for a Proc TABULATE report.
Example:
data have;
input date yymmdd10. treatment: $10. count:;
format date yymmdd10.;
datalines;
2020-01-31 lolipops 5
2020-01-31 chocolate 4
2020-01-31 cakes 3
2020-01-31 cookies 2
2019-01-31 lolipops 2
2019-01-31 chocolate 3
2019-01-31 cakes 4
2019-01-31 cookies 5
2018-01-31 lolipops 3
2018-01-31 chocolate 4
2018-01-31 cakes 5
2018-01-31 cookies 6
;
ods html file='top10.html' style=plateau;
proc tabulate data=have;
class date / descending;
class treatment / order=freq;
var count;
table treatment='', date='' * count='' * sum='' * f=6.
/ box='Treat'
;
run;
ods html close;
Report
I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.
I have one table having 4 columns and i want to separate them into 2 table 2 columns in one table and 2 columns in another table.but both table should be below to each other.I want this in proc report format.code should be in report.
id name age gender
1 abc 21 m
2 pqr 23 f
3 qwe 25 f
4 ert 54 m
i want id and name in one table and age and gender in other table.but one below the other in ods excel.
I've split the main table into two tables using a data setp then appended them to each other, I added an extra columns called "source" in order to be differniate between the tables. if you use a Proc report you can group by "source"
Code:
*Create input data*/
data have;
input id name $ age gender $ ;
datalines;
1 abc 21 m
2 pqr 23 f
3 qwe 25 f
4 ert 54 m
;;;;
run;
/*Split / create first table*/
data table1;
set have;
source="table1: id & name";
keep source id name ;
run;
/*Split / create second table*/
data table2;
set have;
source="table2: age & gender";
keep source age gender;
run;
/*create Empty table*/
data want;
length Source $30. column1 8. column2 $10.;
run;
proc sql; delete * from want; quit;
/* Append both tables to each other*/
proc append base= want data=table1(rename=(id=column1 name=column2)) force ; run;
proc append base= want data=table2(rename=(age=column1 gender=column2)) force ; run;
/*Create Report*/
proc report data= want;
col source column1 column2 ;
define source / group;
run;
Output Table:
Report:
For data
data have;input
id name $ age gender $; datalines;
1 abc 21 m
2 pqr 23 f
3 qwe 25 f
4 ert 54 m
run;
Being output as Excel, the splitting into two parts can be done via two Proc REPORT steps; each step responsible for a single set of columns. Options are used in the ODS EXCEL to control how sheet processing is handled.
The first step manages the common header through DEFINE, the subsequent steps are NOHEADER and don't need DEFINE statements. Each step must define and compute the value of the new source column. There will be a one Excel row gap between each table.
ods _all_ close;
ods excel file='want.xlsx' options(sheet_interval='NONE');
proc report data=have;
column source id name;
define id / 'Column 1';
define name / 'Column 2';
define source / format=$20.;
compute source / character length=20; source='ID and NAME'; endcomp;
run;
proc report data=have noheader;
column source age gender;
define source / format=$20.;
compute source / character length=20; source='AGE and GENDER'; endcomp;
run;
ods excel close;
There is no reasonable single Proc REPORT step that would produce similar output from dataset have.
Here is my data :
data example;
input id sports_name;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
This is just a sample. The variable sports_name is categorical with 56 types.
I am trying to transpose the data to wide form where each row would have a user_id and the names of sports as the variables with values being 1/0 indicating Presence or absence.
So far, I used proc freq procedure to get the cross tabulated frequency table and put that in a different data set and then transposed that data. Now i have missing values in some cases and count of the sports in rest of the cases.
Is there any better way to do this?
Thanks!!
You need a way to create something from nothing. You could have also used the SPARSE option in PROC FREQ. SAS names cannot have length greater than 32.
data example;
input id sports_name :$16.;
retain y 1;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;;;;
run;
proc print;
run;
proc summary data=example nway completetypes;
class id sports_name;
output out=freq(drop=_type_);
run;
proc print;
run;
proc transpose data=freq out=wide(drop=_name_);
by id;
var _freq_;
id sports_name;
run;
proc print;
run;
Same theory here, generate a list of all possible combinations using SQL instead of Proc Summary and then transposing the results.
data example;
informat sports_name $20.;
input id sports_name $;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;
run;
proc sql;
create table complete as
select a.id, a_x.sports_name, case when not missing(e.sports_name) then 1 else 0 end as Present
from (select distinct ID from example) a
cross join (select distinct sports_name from example) a_x
full join example as e
on e.id=a.id
and e.sports_name=a_x.sports_name;
quit;
proc transpose data=complete out=want;
by id;
id sports_name;
var Present;
run;
I have the following sample data and 'proc means' command.
data have;
input measure country $;
datalines;
250 UK
800 Ireland
500 Finland
250 Slovakia
3888 Slovenia
34 Portugal
44 Netherlands
4666 Austria
run;
PROC PRINT data=have; RUN;
The following PROC MEANS command prints out a listing for each country above. How can I group some of those countries (i.e. UK & Ireland, Slovakia/SLovenia as Central Europe) in the PROC MEANS step, rather than adding another datastep to add a 'case when' etc?
proc means data=have sum maxdec=2 order=freq STACKODS;
var measure;
class country;
run;
Thanks for any help at all on this. I understand there are various things you can do in the PROC MEANS command itself (like limit the number of countries by doing this:
proc means data=have(WHERE=(country not in ('Finland', 'UK')
I'd like to do the grouping in the PROC MEANS command for brevity.
Thanks.
This is very easy with a format for any PROC that takes a CLASS statement.
Simply build a format, either with code or from data; then apply the format in the PROC MEANS statement.
proc format lib=work;
value $countrygroup
"UK"="British Isles"
"Ireland"="British Isles"
"Slovakia","Slovenia"="Central Europe"
;
quit;
proc means data=have;
class country;
var measure;
format country $countrygroup.;
run;
It's usually better to have numeric codes for country and then format those to be whichever set of names is needed at any one time, particularly as capitalization/etc. is pretty irritating, but this works well enough even here.
The CNTLIN= option in PROC FORMAT allows you to make a format from a dataset, with FMTNAME as the value statement, START as the value-to-label, LABEL as the label. (END=end of range if numeric.) There are other options also, the documentation goes into more detail.