I use the following code to bucket my continuous variable in SAS, but it does not work:
proc freq data = right;
table Age;
run;
proc format;
value AgeBuckets
low -< 74 = "Younger"
75 -< 84 = "Older"
85 - high = "Oldest"
run;
data right;
format Age AgeBuckets.;
run;
It removes all of the records so I have no more data in there. What am I doing wrong?
Also, would it perhaps be best to simply create a new variable (bucketed version) off of the continuous one with if/then statements?
You are just not setting the dataset - rather creating a new one.
data right;
set right;
format Age AgeBuckets.;
run;
proc print;
run;
Also you are excluding ages 74 and 84 from the buckets. You may want to include them also:
proc format;
value AgeBuckets
low -< 74 = "Younger"
74 -< 84 = "Older"
84 - high = "Oldest"
run;
If that is your exact code, you're missing a semicolon after the last element in your PROC FORMAT statement. That would likely cause the issue you've described. It should read
proc format;
value AgeBuckets
low -< 74 = "Younger"
75 -< 84 = "Older"
85 - high = "Oldest";
run;
You made a mistake in your data step, where you didn't reference the input file since you have no SET statement.
It's rarely more efficient to use IF/THEN statements
If you want a new variable use PUT to convert it, illustrated here
Programming where you use the same name for the input and output data set is a bad idea, it makes it very hard to find your mistakes.
proc format;
value AgeBuckets
low -< 75 = "Younger"
75 -< 85 = "Older"
85 - high = "Oldest"
run;
data right_formatted;
set right;
format Age AgeBuckets.;
*create new variable with formatted value, will not sort correctly;
Age_Formatted = put(age, ageBuckets.);
run;
and:
*applying a format means that it sorts correctly for display;
proc freq data=right_formatted;
table age age_formatted ;
format age ageBuckets.;
run;
#Python R SAS user is correct about your formats as well.
You overwrote the dataset with the extra data step. But why did you want the data step at all? Just add the FORMAT statement to procedure that wants to group the values by their formatted values.
proc freq data = right;
table Age;
format Age AgeBuckets.;
run;
Related
I want to use SAS and eg. proc report to produce a custom table within my workflow.
Why: Prior, I used proc export (dbms=excel) and did some very basic stats by hand and copied pasted to an excel sheet to complete the report. Recently, I've started to use ODS excel to print all the relevant data to excel sheets but since ODS excel would always overwrite the whole excel workbook (and hence also the handcrafted stats) I now want to streamline the process.
The task itself is actually very straightforward. We have some information about IDs, age, and registration, so something like this:
data test;
input ID $ AGE CENTER $;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
The goal is to produce a table report which should look like this structure-wise:
ID NO-ID Total
Count 3 2 5
Age (mean) 27 45.5 34.4
Count by Center:
A 2 1 3
B 0 1 1
A 1 0 1
It seems, proc report only takes variables as columns but not a subsetted data set (ID NE .; ID =''). Of course I could just produce three reports with three subsetted data sets and print them all separately but I hope there is a way to put this in one table.
Is proc report the right tool for this and if so how should I proceed? Or is it better to use proc tabulate or proc template or...?
I found a way to achieve an almost match to what I wanted. First if all, I had to introduce a new variable vID (valid ID, 0 not valid, 1 valid) in the data set, like so:
data test;
input ID $ AGE CENTER $;
if ID = '' then vID = 0;
else vID = 1;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
After this I was able to use proc tabulate as suggested by #Reeza in the comments to build a table which pretty much resembles what I initially aimed for:
proc tabulate data = test;
class vID Center;
var age;
keylabel N = 'Count';
table N age*mean Center*N, vID ALL;
run;
Still, I wonder if there is a way without introducing the new variable at all and just use the SAS counters for missing and non-missing observations.
UPDATE:
#Reeza pointed out to use the proc format to assign a value to missing/non-missing ID data. In combination with the missing option (prints missing values) in proc tabulate this delivers the output without introducing a new variable:
proc format;
value $ id_fmt
' ' = 'No-ID'
other = 'ID'
;
run;
proc tabulate data = test missing;
format ID $id_fmt.;
class ID Center;
var age;
keylabel N = 'Count';
table N age*(mean median) Center*N, (ID=' ') ALL;
run;
Hi I have two tables with different column orders, and the column name are not capitalized as the same. How can I compare if the contents of these two tables are the same?
For example, I have two tables of students' grades
table A:
Math English History
-------+--------+---------
Tim 98 95 90
Helen 100 92 85
table B:
history MATH english
--------+--------+---------
Tim 90 98 95
Helen 85 100 92
You may use either of the two approaches to compare, regardless of the order or column name
/*1. Proc compare*/
proc sort data=A; by name; run;
proc sort data=B; by name; run;
proc compare base=A compare=B;
id name;
run;
/*2. Proc SQL*/
proc sql;
select Math, English, History from A
<union/ intersect/ Except>
select MATH, english, history from B;
quit;
use except corr(corresponding) it will check by name. if everything is matching you will get zero records.
data have1;
input Math English History;
datalines;
1 2 3
;
run;
data have2;
input English math History;
datalines;
2 1 3
;
run;
proc sql ;
select * from have1
except corr
select * from have2;
edit1
if you want to check which particular column it differs you may have to transpose and compare as shown below example.
data have1;
input name $ Math English pyschology History;
datalines;
Tim 98 95 76 90
Helen 100 92 55 85
;
run;
data have2;
input name $ English Math pyschology History;
datalines;
Tim 95 98 76 90
Helen 92 100 99 85
;
run;
proc sort data = have1 out =hav1;
by name;
run;
proc sort data = have2 out =hav2;
by name;
run;
proc transpose data =hav1 out=newhave1 (rename = (_name_= subject
col1=marks));
by name;
run;
proc transpose data =hav2 out=newhave2 (rename = (_name_= subject
col1=marks));
by name;
run;
proc sql;
create table want(drop=mark_dif) as
select
a.name as name
,a.subject as subject
,a.marks as have1_marks
,b.marks as have2_marks
,a.marks -b.marks as mark_dif
from newhave1 a inner join newhave2 b
on upcase(a.name) = upcase(b.name)
and upcase(a.subject) =upcase(b.subject)
where calculated mark_dif ne 0;
I created a format for a variable as follows
proc format;
value now 0=M
1=F
;
run;
and now I apply this to a dataset.
Data X;
set X2;
format Var1 now.;
run;
and I want to export this format using cntlout
proc format library=work cntlout=form; run;
this gives me the list of formats in the library catalog. But doesnot give me the variable name to which it is attached.
How can I create a dataset with list of formats and the attached variables to it?
So I can see which format is linked to what variable.
If you just want to look up the variables in a specific dataset, often PROC CONTENTS is faster than using SASHELP.VCOLUMN or DICTIONARY.TABLES, particularly when there are lots of libraries/datasets defined.
57 proc contents data=x out=myvars(keep=name format) noprint;
58 run;
NOTE: The data set WORK.MYVARS has 1 observations and 2 variables.
59
60 data _null_;
61 set myvars;
62 put _all_;
63 run;
NAME=Var1 FORMAT=NOW _ERROR_=0 _N_=1
NOTE: There were 1 observations read from the data set WORK.MYVARS.
Assuming you want this for a specific library you can use the SASHELP.VCOLUMN dataset. This dataset contains the formats for all variables and you can filter it as desired.
Suppose I have some data sets in library lib, their names look like Table_YYYYMMDD (e.g. Table_20150101).
I want to get a name of a data set with maximum date (YYYYMMDD) and store it in a macro variable.
I'm using proc sql and from dictionary.tables.
First I extract a YYYYMMDD part of name. Then I should convert it to date and then find MAX. And I want to be sure that I have at least one data set in library.
proc sql;
select put(MAX(input(scan(memname, 2, '_'), yymmdd8.)), yymmddn8.)
into :mvTable_MaxDate
from dictionary.tables
where libname = 'LIB';
quit;
So,
Is it right to use sas functions like scan in proc sql?
How could I check whether the query is not empty (mvTable_MaxDate hasn't missing value)?
Thanks for your help:)
The cause of the error is that your are using the INPUTN() function, which expects the second argument to be a text literal or the name of a variable. If you change to INPUT(), it will avoid the error.
Also note you need to upcase the literal value of the library name on your where clause. Dictionary.tables stores libnames in upcase.
As written, the value of the macro variable will be a SAS date value. If you want it formatted as YYMMDDN8. you will need to add that.
Here's an example:
74 data a_20151027
75 a_20141022
76 a_20130114
77 ;
78 x=1;
79 run;
NOTE: The data set WORK.A_20151027 has 1 observations and 1 variables.
NOTE: The data set WORK.A_20141022 has 1 observations and 1 variables.
NOTE: The data set WORK.A_20130114 has 1 observations and 1 variables.
80
81 proc sql noprint;
82 select COALESCE(MAX(input(scan(memname, 2, '_'), yymmdd8.)), 0)
83 into :mvTable_MaxDate
84 from dictionary.tables
85 where libname = 'WORK';
86 quit;
87
88 %put &mvTable_MaxDate;
20388
89 %put %sysfunc(putn(&mvTable_MaxDate,yymmddn8));
20151027
As a side-comment, often life becomes much easier if you can just combine all your data into one dataset, and store the dataset name date suffix as a variable.
I'm trying to get all dataset names in a library in to a data set.
proc datasets library=LIB1 memtype=data ;
contents data=_all_ noprint out=Datasets_in_Lib1(keep=memname) ;
run;
The final data set (Datasets_in_Lib1) is having all the data set names that are in LIB1, but names are truncated to 6 characters.Is there any way to get full names of the datasets with out truncation.
Ex: If dataset name is x123456789, the Datasets_in_Lib1 will have x12345 only.
Thanks in advance,
Sam.
Agree with the comment, in 9.3 memname is $32. I don't think there was a version of SAS where data set names were limited to 6 characters. They went from 8 characters to 32 characters in v7 (I think).
Here's a log from running your code, showing it works as you want in 9.3
51 data work.x123456789;
52 x=1;
53 run;
NOTE: The data set WORK.X123456789 has 1 observations and 1 variables.
54
55 proc datasets library=work memtype=data nolist;
56 contents data=_all_ noprint out=Datasets_in_Lib1(keep=memname) ;
57 run;
NOTE: The data set WORK.DATASETS_IN_LIB1 has 1 observations and 1 variables.
58
59 data _null_;
60 set Datasets_in_Lib1;
61 put _all_;
62 run;
MEMNAME=X123456789 _ERROR_=0 _N_=1
NOTE: There were 1 observations read from the data set WORK.DATASETS_IN_LIB1.
You can also query the sashelp.vtable to obtain the list of datasets:
proc sql;
create table mem_list as
select memname
from sashelp.vtable
where libname='LIB1' and memtype='DATA';
quit;