Run Duration in SAS enterprise miner - sas

I have the following problem. We have several streams in Enterprise Miner and we would like to be able to tell how long was each run. I have tried to create a macro that would save the start and end time/date but the problem is that global variables defined in a node, are not seen anymore in a subsequent node (so are global only inside a node, but not between nodes). How people usually solve the problem? Any idea or suggestion?
Thanks, Umberto

Just write out timestamps to log (EM should produce a global log in the same fashion that EG and DI do)
Either use:
data _null_;
datetime = datetime();
put datetime= datetime20.;
run;
or macro language:
%put EM node started at at %sysfunc(time(),timeampm.) on %sysfunc(date(),worddate.).;
with a higly customised message you have read the log in SAS looking for those strings using regex.
Solution 2:
Other option is to created a table in a library that is visible from EM and EG for example and have sql inserts at the beginning/end of your process.
proc sql;
create table EM_logger
(jobcode char(100),
timestamp num informat=datetime20. format=datetime20.);
quit;
proc sql;
insert into EM_logger values('Begining Linear Reg',%sysfunc(datetime()));
quit;
data w;
do i=1 to 10000000;
output;
end;
run;
proc sql;
insert into EM_logger values('End Linear Reg',%sysfunc(datetime()));
quit;
Table layout can be as complex as you want and as long as you can access it you can get your statistics.
Hope it helps

Related

permanently save modified dataset

I know this is a very basic question but my code keeps failing when trying to run what I found through the help documentation.
Up to now I have been running an analysis project off of the .WORK directory which I understand gets wiped out every time a session ends. I have done a bunch of data cleaning and preparation and do not want to have to do that every time before I start my analysis.
So I understand, from reading this: https://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001310720.htm that I have to output the cleaned dataset to a non-temporary directory.
Steps I have taken so far:
1) created a new Library called "Project"
2) Saved it in a folder that I have under "my folders" in SAS
3) My code for saving the cleaned dataset to the "Project" library is as follows:
PROC SORT DATA=FAA_ALL NODUPKEY;
BY GROUND_SPEED;
DATA PROJECT.FAA_ALL;
RUN;
Then I run this code in a new program:
PROC PRINT DATA=PROJECT.FAA_ALL;
RUN;
It says there are no observations and that the dataset is essentially empty.
Can some tell me where I'm going wrong?
Your problem is the PROC SORT
PROC SORT DATA=FAA_ALL NODUPKEY;
BY GROUND_SPEED;
DATA PROJECT.FAA_ALL;
RUN;
Should be
PROC SORT DATA=FAA_ALL OUT= PROJECT.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
That DATA PROJECT.FAA_ALL was starting a Data Step creating a blank data set.
Something else worth mentioning: your data step didn't do what you might have expected because you had no set statement. Your code was equivalent to:
PROC SORT DATA=WORK.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
DATA PROJECT.FAA_ALL;
SET _NULL_;
RUN;
PROJECT.FAA_ALL is empty because nothing was read in.
The SORT procedure implicitly sorts a dataset in-place. You could have SAS move the sorted data by adding the set statement to your data step:
PROC SORT DATA=WORK.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
DATA PROJECT.FAA_ALL;
SET WORK.FAA_ALL;
RUN;
However, this still takes two steps, and requires extra disk I/O. Using the out option in a SAS procedure (as in DomPazz's answer) is almost always faster and more efficient than using a data step just to move data.

Adding a column is SAS using MODIFY (no sql)

I'm new to SAS and have some problems with adding a column to existing data set in SAS using MODIFY statement (without proc sql).
Let's say I have data like this
id name salary perks
1 John 2000 50
2 Mary 3000 120
What I need to get is a new column with the sum of salary and perks.
I tried to do it this way
data data1;
modify data1;
money=salary+perks;
run;
but apparently it doesn't work.
I would be grateful for any help!
As #Tom mentioned you use SET to access the dataset.
I generally don't recommend programming this way with the same name in set and data statements, especially as you're learning SAS. This is because it's harder to detect errors, since once run and encounter an error, you destroy your original dataset and have to recreate it before you start again.
If you want to work step by step, consider intermediary datasets and then clean up after you're done by using proc datasets to delete any unnecessary intermediary datasets. Use a naming conventions to be able to drop them all at once, i.e. data1, data2, data3 can be referenced as data1-data3 or data:.
data data2;
set data1;
money = salary + perks;
run;
You do now have two datasets but it's easy to drop datasets later on and you can now run your code in sections rather than running all at once.
Here's how you would drop intermediary datasets
proc datasets library=work nodetails holist;
delete data1-data3;
run;quit;
You can't add a column to an existing dataset. You can make a new dataset with the same name.
data data1;
set data1;
money=salary+perks;
run;
SAS will build it as a new physical file (with a temporary name) and when the step finishes without error it deletes the original and renames the new one.
If you want to use a data set you do it like this:
data dataset;
set dataset;
format new_column $12;
new_column = 'xxx';
run;
Or use Proc SQL and ALTER TABLE.
proc sql;
alter table dataset
add new_column char(8) format = $12.
;
quit;

Automating IF and then statement in sas using macro in SAS

I have a data where I have various types of loan descriptions, there are at least 100 of them.
I have to categorise them into various buckets using if and then function. Please have a look at the data for reference
data des;
set desc;
if loan_desc in ('home_loan','auto_loan')then product_summary ='Loan';
if loan_desc in ('Multi') then product_summary='Multi options';
run;
For illustration I have shown it just for two loan description, but i have around 1000 of different loan_descr that I need to categorise into different buckets.
How can I categorise these loan descriptions in different buckets without writing the product summary and the loan_desc again and again in the code which is making it very lengthy and time consuming
Please help!
Another option for categorizing is using a format. This example uses a manual statement, but you can also create a format from a dataset if you have the to/from values in a dataset. As indicated by #Tom this allows you to change only the table and the code stays the same for future changes.
One note regarding your current code, you're using If/Then rather than If/ElseIf. You should use If/ElseIf because then it terminates as soon as one condition is met, rather than running through all options.
proc format;
value $ loan_fmt
'home_loan', 'auto_loan' = 'Loan'
'Multi' = 'Multi options';
run;
data want;
set have;
loan_desc = put(loan, $loan_fmt.);
run;
For a mapping exercise like this, the best technique is to use a mapping table. This is so the mappings can be changed without changing code, among other reasons.
A simple example is shown below:
/* create test data */
data desc (drop=x);
do x=1 to 3;
loan_desc='home_loan'; output;
loan_desc='auto_loan'; output;
loan_desc='Multi'; output;
loan_desc=''; output;
end;
data map;
loan_desc='home_loan'; product_summary ='Loan '; output;
loan_desc='auto_loan'; product_summary ='Loan'; output;
loan_desc='Multi'; product_summary='Multi options'; output;
run;
/* perform join */
proc sql;
create table des as
select a.*
,coalescec(b.product_summary,'UNMAPPED') as product_summary
from desc a
left join map b
on a.loan_desc=b.loan_desc;
There is no need to use the macro language for this task (I have updated the question tag accordingly).
Already good solutions have been proposed (I like #Reeza's proc format solution), but here's another route which also minimizes coding.
Generate sample data
data have;
loan_desc="home_loan"; output;
loan_desc="auto_loan"; output;
loan_desc="Multi"; output;
loan_desc=""; output;
run;
Using PROC SQL's case expression
This way doesn't allow, to my knowledge, having several criteria on a single when line, but it really simplifies coding since the resulting variable's name needs to be written down only once.
proc sql;
create table want as
select
loan_desc,
case loan_desc
when "home_loan" then "Loan"
when "auto_loan" then "Loan"
when "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;
Otherwise, using the following syntax is also possible, giving the same results:
proc sql;
create table want as
select
loan_desc,
case
when loan_desc in ("home_loan", "auto_loan") then "Loan"
when loan_desc = "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;

How do I work with a SAS file that was created in a different format (Linux/Windows) if I don't have access to machine that created it?

I have numerous SAS datasets on my Windows 7 / SAS 9.4 machine:
data_19921.sas7bdat
data_19922.sas7bdat
data_19923.sas7bdat
....
data_200212.sas7bdat
etc. The format is data_YYYYM.sas7bdat or data_YYYYMM.sas7bdat (the latter for two digit months) and every dataset has identical variables and formatting. I'm trying to iterate over all of these files and append them into one big SAS dataset. The datasets were created on some Unix machine elsewhere in my company that I don't have access to. When I try to append the files:
%let root = C:\data;
libname in "&raw\in";
libname out "&raw\out";
/*****************************************************************************/
* initialize final data set but don't add any observations to it;
data out.alldata;
set in.data_19921;
stop;
run;
%macro append_files;
%do year=1992 %to 2002;
%do month=1 %to 12;
proc append data=out.alldata base=in.data_&year&month;
run;
%end;
%end;
%mend;
%append_files;
I get errors that say
File in.data_19921 cannot be updated because its encoding does not match the session encoding or the file is in a format native to another host, such as LINUX_32, INTEL_ABI
I don't have access to the Unix machine that created these datasets (or any Unix machine right now), so I don't think I can use proc cport and proc cimport.
How can I read/append these data sets on my Windows machine?
You can use the colon operator or dash to shortcut your process:
data out.alldata;
set in.data_:;
run;
OR
data out.alldata;
set in.data19921-in.data200212;
run;
If you have variables that are different lengths you may get truncation. SAS will warn you though:
WARNING: Multiple lengths were specified for the variable name by input data set(s). This may
cause truncation of data.
One way to avoid it is to create a false table first, using the original table. The following SQL code will generate the CREATE Table statements and you can modify the lengths manually as needed.
proc sql;
describe table in.data19921;
quit;
*run create table statement here with modified lengths;
data out.alldata;
set fake_data (obs=0)
in.data19921-in.data200212;
run;

SAS dynamically declaring macro variable

My company just switched from R to SAS and I am converting a lot of my R code to SAS. I am having a huge issue dynamically declaring variables (macro variables) in SAS.
For example one of my processes needs to take in the mean of a column and then apply it throughout the code in many steps.
%let numm =0;
I have tried the following with my numm variable but both methods do not work and I cannot seem to find anything online.
PROC MEANS DATA = ASSGN3.COMPLETE mean;
#does not work
&numm = VAR MNGPAY;
run;
Proc SQL;
#does not work
&numm =(Select avg(Payment) from CORP.INV);
quit;
I would highly recommend buying a book on SAS or taking a class from SAS Training. SAS Programming II is a good place to start (Programming I if you have not programmed anything else, but that doesn't sound like the case). The code you have shows you need it. It is a complete paradigm shift from R.
That said, try this:
proc sql noprint;
select mean(payment) into :numm from corp.inv;
quit;
%put The mean is: &numm;
Here's the proc summary / data step equivalent:
proc summary data = corp.inv;
var payment;
output out = inv_summary mean=;
run;
data _null_;
set inv_summary;
call symput('numm',payment);
run;
%put The mean is: &numm;
Proc sql is a more compact approach if you just want a simple arithmetic mean, but if you require more complex statistics it makes sense to use proc summary.