In the SAS data step, what is the difference between the following code:
DATA MK_RETURN_DATA;
SET MK_RETURN;
output;
RUN;
and
DATA MK_RETURN_DATA;
SET MK_RETURN;
RUN;
Is the output statement absolutely necessary here? (My understanding is, since there is no condition specified, even without the output statement, the output will still be automatically performed.)
Here output statement is not at all necessary.
You just have only one output dataset so using the 'output' statement is not required. but if there are more than one output dataset then 'output' statement might come to use..please see the below example
data MK_RETURN;
input name :$10. age;
datalines;
Hardik 23
Mishima 47
run;
DATA MK_RETURN_DATA MK_RETURN_DATA2;
SET MK_RETURN;
if age= 23 then output MK_RETURN_DATA;
if age= 47 then output MK_RETURN_DATA2;
RUN;
here the observation with age 23 will go to the MK_RETURN_DATA dataset and the observation with age = 47 will go to the MK_RETURN_DATA2 dataset.
Hope it helps
Related
I want to use SAS and eg. proc report to produce a custom table within my workflow.
Why: Prior, I used proc export (dbms=excel) and did some very basic stats by hand and copied pasted to an excel sheet to complete the report. Recently, I've started to use ODS excel to print all the relevant data to excel sheets but since ODS excel would always overwrite the whole excel workbook (and hence also the handcrafted stats) I now want to streamline the process.
The task itself is actually very straightforward. We have some information about IDs, age, and registration, so something like this:
data test;
input ID $ AGE CENTER $;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
The goal is to produce a table report which should look like this structure-wise:
ID NO-ID Total
Count 3 2 5
Age (mean) 27 45.5 34.4
Count by Center:
A 2 1 3
B 0 1 1
A 1 0 1
It seems, proc report only takes variables as columns but not a subsetted data set (ID NE .; ID =''). Of course I could just produce three reports with three subsetted data sets and print them all separately but I hope there is a way to put this in one table.
Is proc report the right tool for this and if so how should I proceed? Or is it better to use proc tabulate or proc template or...?
I found a way to achieve an almost match to what I wanted. First if all, I had to introduce a new variable vID (valid ID, 0 not valid, 1 valid) in the data set, like so:
data test;
input ID $ AGE CENTER $;
if ID = '' then vID = 0;
else vID = 1;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
After this I was able to use proc tabulate as suggested by #Reeza in the comments to build a table which pretty much resembles what I initially aimed for:
proc tabulate data = test;
class vID Center;
var age;
keylabel N = 'Count';
table N age*mean Center*N, vID ALL;
run;
Still, I wonder if there is a way without introducing the new variable at all and just use the SAS counters for missing and non-missing observations.
UPDATE:
#Reeza pointed out to use the proc format to assign a value to missing/non-missing ID data. In combination with the missing option (prints missing values) in proc tabulate this delivers the output without introducing a new variable:
proc format;
value $ id_fmt
' ' = 'No-ID'
other = 'ID'
;
run;
proc tabulate data = test missing;
format ID $id_fmt.;
class ID Center;
var age;
keylabel N = 'Count';
table N age*(mean median) Center*N, (ID=' ') ALL;
run;
I created a format for a variable as follows
proc format;
value now 0=M
1=F
;
run;
and now I apply this to a dataset.
Data X;
set X2;
format Var1 now.;
run;
and I want to export this format using cntlout
proc format library=work cntlout=form; run;
this gives me the list of formats in the library catalog. But doesnot give me the variable name to which it is attached.
How can I create a dataset with list of formats and the attached variables to it?
So I can see which format is linked to what variable.
If you just want to look up the variables in a specific dataset, often PROC CONTENTS is faster than using SASHELP.VCOLUMN or DICTIONARY.TABLES, particularly when there are lots of libraries/datasets defined.
57 proc contents data=x out=myvars(keep=name format) noprint;
58 run;
NOTE: The data set WORK.MYVARS has 1 observations and 2 variables.
59
60 data _null_;
61 set myvars;
62 put _all_;
63 run;
NAME=Var1 FORMAT=NOW _ERROR_=0 _N_=1
NOTE: There were 1 observations read from the data set WORK.MYVARS.
Assuming you want this for a specific library you can use the SASHELP.VCOLUMN dataset. This dataset contains the formats for all variables and you can filter it as desired.
This is the story
This is the input file
mukesh,04/04/15,04/06/15,125.00,333.23
vishant,04/05/15,04/07/15,200.00,200
achal,04/06/15,04/08/15,275.00,55.43
this is the import statement that I am using
data datetimedata;
infile fileref dlm=',';
input lastname$ datechkin mmddyy10. datechkout mmddyy10. room_rate equip_cost;
run;
the below is the log which shows success
NOTE: The infile FILEREF is:
Filename=\\VBOXSVR\win_7\SAS\DATA\datetime\datetimedata.csv,
RECFM=V,LRECL=256,File Size (bytes)=688,
Last Modified=13Jun2015:12:08:36,
Create Time=13Jun2015:09:13:09
NOTE: 17 records were read from the infile FILEREF.
The minimum record length was 34.
The maximum record length was 40.
NOTE: The data set WORK.DATETIMEDATA has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
I have published only 3 observation here.
Now when I print the sas dataset everything works fine except the room_rate variable.
THe output should be 3 digit numbers , but i am getting only the last digit .
Where Am i going wrong !!!
You're mixing input types. When you use list input, you can't specify informats. You either need to specify them using modified list input (add a colon to the informat) or use an informat statement earlier. The following works.
data datetimedata;
infile datalines dlm=',';
input lastname$ datechkin :mmddyy10. datechkout :mmddyy10. room_rate equip_cost;
datalines;
mukesh,04/04/15,04/06/15,125.00,333.23
vishant,04/05/15,04/07/15,200.00,200
achal,04/06/15,04/08/15,275.00,55.43
;;;;
run;
proc print data=datetimedata;
run;
I would like to include the results of a macro function call in a data step. I can do this indirectly, by first assigning the macro function output to a macro variable, and then using that macro variable within my function, but this seems inelegant.
data dataset_employee;
input name $ dept $;
datalines;
John Sales
Mary Acctng
;
data dataset_employee;
input name $ dept $;
datalines;
John Sales
Mary Acctng
;
data dataset_manager;
input name $ dept $;
datalines;
Will Sales
Sue Acctng
;
It seems like SAS doesn't realize that the macro call is complete and I'm switching to regular SAS code.
/*this works*/
%let var = %DO_OVER(VALUES=employee, PHRASE=dataset_?) dataset_manager;
data combined1;
set &var dataset_manager;
run;
/*this fails*/
data combined;
set %DO_OVER(VALUES=employee manager, PHRASE=dataset_?);
dataset_manager;
run;
/*this works*/
data combined;
set dataset_manager %DO_OVER(VALUES=employee manager, PHRASE=dataset_?);
;
run;
Can anyone help me understand what is going on here?
It seems that the failing attempt is due to an extra ; at the end of the macro invocation. Try removing it.
A macro call doesn't require a semicolon.
The first example works without a semicolon after the macro call (pay attention, you are using the dataset_manager dataset twice, in the %let and again in the set statement).
The third example would work even if you remove one of the two semicolons (one is required to end the set statement).
I'm new to SAS, and would greatly appreciate anyone who can help me formulate a code. Can someone please help me with formatting changing arrays based on the first column values?
So basically here's the original data:
Category Name1 Name2......... (Changes invariably)
#ofpeople 20 30
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100
AmountPaid 11 35
AmountDed 5 6
I would like to format the values under Name1 to infinite Name# and reformat them to dollar10.2 for any values under Category called 'AmountBilled','AmountPaid','AmountDed'.
Thank you so much for your help!
You can't conditionally format a column (like you might in excel). A variable/column has one format for the entire column. There are tricks to get around this, but they're invariably more complex than should be considered useful.
You can store the formatted value in a character variable, but it loses the ability to do math.
data have;
input category :$10. name1 name2;
datalines;
#ofpeople 20 30
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100
AmountPaid 11 35
AmountDed 5 6
;;;;
run;
data want;
set have;
array names name:; *colon is wildcard (starts with);
array newnames $10 newname1-newname10; *Arbitrarily 10, can be whatever;
if substr(category,1,6)='Amount' then do;
do _t = 1 to dim(names);
newnames[_t] = put(names[_t],dollar10.2);
end;
end;
run;
You could programmatically figure out the newname1000 endpoint using PROC CONTENTS or SQL's DICTIONARY.COLUMNS / SAS's SASHELP.VCOLUMN. Alternately, you could put out the original dataset as a three column dataset with many rows for each category (was it this way to begin with prior to a PROC TRANSPOSE?) and put the character variable there (not needing an array). To me that's the cleanest option.
data have_t;
set have;
array names name:;
format nameval $10.;
do namenum = 1 to dim(names);
if substr(category,1,6)='Amount' then nameval = put(names[namenum],dollar10.2 -l);
else nameval=put(names[namenum],10. -l); *left aligning here, change this if you want otherwise;
output; *now we have (namenum) rows per line. Test for missing(name) if you want only nonmissing rows output (if not every row has same number of names).
end;
run;
proc transpose data=have_t out=want_T(drop=_name_) prefix=name;
by category notsorted;
var nameval;
run;
Finally, depending on what you're actually doing with this, you may have superior options in terms of the output method. If you're doing PROC REPORT for example, you can use compute blocks to set the style (format) of the column conditionally in the report output.