proc gchart stacked bar with attached table annotate months - sas

I'm trying to recreate a graph that looks like this:
It's a stacked bar graph with many types of visits, with the values shown in an attached data table and 2 types of goal lines.
My data looks like this (I wasn't sure how to create sample code):
I transformed the data so it's long:
I'm basing my method from this.
In the example, if I run the first annotate portion (anno_values) using the example data from the thread, everything runs fine. However, using a similar setup but accounting for more groups (Visit1, Visit2, etc.) I keep getting this error message:
NOTE: ERROR DETECTED IN ANNOTATE= DATASET WORK.ANNO_VALUES.
MINIMUM VARIABLES NOT MET - AMBIGUITY PREVENTS SELECTION
NOTE: ERROR LIMIT REACHED IN ANNOTATE PROCESS. PROCESSING IS TERMINATED.
NOTE: PROCESSING TERMINATED BY INDIVIDUAL ERROR COUNT.
NOTE: 1 TOTAL ERRORS.
data anno_values; set long2;
format xc monyy.; informat month monyy.;
xsys='2'; ysys='3'; hsys='3'; when='a';
function='label'; position='5';
xc=month;
if type='Total' then do;
y=15;
text=trim(left(value));
output;
end;
if type='Visit1' then do;
y=7;
text=trim(left(value));
output;
end;
if type='Visit2' then do;
y=0;
text=trim(left(value));
output;
end;
if type='Visit3' then do;
y=-7;
text=trim(left(value));
output;
end;
run;
proc gchart data=long2 anno=anno_values;
vbar month / type=sum sumvar=value discrete
subgroup=type nolegend
raxis=axis1 maxis=axis2
coutline=gray77;
run; quit;
I'm not sure if it's the months that causing the issue, but couldn't get further than the first step.

There are macros installed with SAS/Graph that will help you construct a proper annotation data set. The macro is name dclanno, meaning declare annonation variables.
Add these lines to your code:
%annomac /* compiles the SAS/Graph annotation macros */
data myAnno;
/* The dclanno macro, part of the annomac package does code generation
* for defining the annotation variables in the PDV
*/
%dclanno;
dclanno is part of the annomac package found in your installation at SASHOME\SASFoundation\9.4\core\sasmacro.
Here is a link to another example of A stacked vbar chart annotated to display counts of another subgroup

Related

SAS data step with BY variable on unsorted data

I'm executing SAS data step with by variable. I understand the output when the data is sorted by key (X in my case). However, when the data is unsorted, I get the following output:
I'm using SAS ODA's AFRICA dataset from MAPS library which has 52824 rows. Here's the link to the CSV file.
data AFRICA_NEW12;
set Maps.AFRICA;
by X;
firstX = FIRST.X;
lastX = LAST.X;
run;
I don't understand how rows are selected when data is not sorted. Why does the output have 14 rows?
You have an error in your log because you didn't sort it. Make sure to read your log.
This likely generates the same issue for you:
data cars;
set sashelp.cars;
by model;
run;
proc print data=cars;
var make model origin;
run;
Output is:
Obs Make Model Origin
1 Acura MDX Asia
2 Acura RSX Type S 2dr Asia
And the log shows:
ERROR: BY variables are not properly sorted on data set SASHELP.CARS.
Make=Acura Model=TSX 4dr Type=Sedan Origin=Asia DriveTrain=Front MSRP=$26,990 Invoice=$24,647 EngineSize=2.4 Cylinders=4
Horsepower=200 MPG_City=22 MPG_Highway=29 Weight=3230 Wheelbase=105 Length=183 FIRST.Model=1 LAST.Model=1 _ERROR_=1 _N_=3
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 4 observations read from the data set SASHELP.CARS.
WARNING: The data set WORK.CARS may be incomplete. When this step was stopped there were 2 observations and 15 variables.
WARNING: Data set WORK.CARS was not replaced because this step was stopped.
Note this portion specifically:
WARNING: The data set WORK.CARS may be incomplete. When this step was stopped there were 2 observations and 15 variables.
If you know the data is sorted in the order you want, which may not be the same as what SAS expects you can add the notsorted option on the BY statement but this is a different type of functionality so check your code thoroughly.
data cars;
set sashelp.cars;
by model notsorted;
run;

Run Duration in SAS enterprise miner

I have the following problem. We have several streams in Enterprise Miner and we would like to be able to tell how long was each run. I have tried to create a macro that would save the start and end time/date but the problem is that global variables defined in a node, are not seen anymore in a subsequent node (so are global only inside a node, but not between nodes). How people usually solve the problem? Any idea or suggestion?
Thanks, Umberto
Just write out timestamps to log (EM should produce a global log in the same fashion that EG and DI do)
Either use:
data _null_;
datetime = datetime();
put datetime= datetime20.;
run;
or macro language:
%put EM node started at at %sysfunc(time(),timeampm.) on %sysfunc(date(),worddate.).;
with a higly customised message you have read the log in SAS looking for those strings using regex.
Solution 2:
Other option is to created a table in a library that is visible from EM and EG for example and have sql inserts at the beginning/end of your process.
proc sql;
create table EM_logger
(jobcode char(100),
timestamp num informat=datetime20. format=datetime20.);
quit;
proc sql;
insert into EM_logger values('Begining Linear Reg',%sysfunc(datetime()));
quit;
data w;
do i=1 to 10000000;
output;
end;
run;
proc sql;
insert into EM_logger values('End Linear Reg',%sysfunc(datetime()));
quit;
Table layout can be as complex as you want and as long as you can access it you can get your statistics.
Hope it helps

Automating IF and then statement in sas using macro in SAS

I have a data where I have various types of loan descriptions, there are at least 100 of them.
I have to categorise them into various buckets using if and then function. Please have a look at the data for reference
data des;
set desc;
if loan_desc in ('home_loan','auto_loan')then product_summary ='Loan';
if loan_desc in ('Multi') then product_summary='Multi options';
run;
For illustration I have shown it just for two loan description, but i have around 1000 of different loan_descr that I need to categorise into different buckets.
How can I categorise these loan descriptions in different buckets without writing the product summary and the loan_desc again and again in the code which is making it very lengthy and time consuming
Please help!
Another option for categorizing is using a format. This example uses a manual statement, but you can also create a format from a dataset if you have the to/from values in a dataset. As indicated by #Tom this allows you to change only the table and the code stays the same for future changes.
One note regarding your current code, you're using If/Then rather than If/ElseIf. You should use If/ElseIf because then it terminates as soon as one condition is met, rather than running through all options.
proc format;
value $ loan_fmt
'home_loan', 'auto_loan' = 'Loan'
'Multi' = 'Multi options';
run;
data want;
set have;
loan_desc = put(loan, $loan_fmt.);
run;
For a mapping exercise like this, the best technique is to use a mapping table. This is so the mappings can be changed without changing code, among other reasons.
A simple example is shown below:
/* create test data */
data desc (drop=x);
do x=1 to 3;
loan_desc='home_loan'; output;
loan_desc='auto_loan'; output;
loan_desc='Multi'; output;
loan_desc=''; output;
end;
data map;
loan_desc='home_loan'; product_summary ='Loan '; output;
loan_desc='auto_loan'; product_summary ='Loan'; output;
loan_desc='Multi'; product_summary='Multi options'; output;
run;
/* perform join */
proc sql;
create table des as
select a.*
,coalescec(b.product_summary,'UNMAPPED') as product_summary
from desc a
left join map b
on a.loan_desc=b.loan_desc;
There is no need to use the macro language for this task (I have updated the question tag accordingly).
Already good solutions have been proposed (I like #Reeza's proc format solution), but here's another route which also minimizes coding.
Generate sample data
data have;
loan_desc="home_loan"; output;
loan_desc="auto_loan"; output;
loan_desc="Multi"; output;
loan_desc=""; output;
run;
Using PROC SQL's case expression
This way doesn't allow, to my knowledge, having several criteria on a single when line, but it really simplifies coding since the resulting variable's name needs to be written down only once.
proc sql;
create table want as
select
loan_desc,
case loan_desc
when "home_loan" then "Loan"
when "auto_loan" then "Loan"
when "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;
Otherwise, using the following syntax is also possible, giving the same results:
proc sql;
create table want as
select
loan_desc,
case
when loan_desc in ("home_loan", "auto_loan") then "Loan"
when loan_desc = "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;

Perform Iterative Operations on OUTEST or OUTSTAT variables in SAS?

In SAS, how can I assign a variable coming from either the OUTEST or OUTSTAT functions to be used in a loop?
For example, say I want to run some sort of iterative analysis until my mean (average) reaches a certain threshold. I know how to extract the mean using either OUTEST or OUTSTAT, but then how can I perform operations or blocks of code on it?
Thank you.
If you are interested in details, I am trying to perform backward selection of VIFs (to remove multicollinearity). Unfortunately, SAS doesn't seem to have a 'SELECTION=BACKWARD' feature for this...
EDIT: Updated with sample code:
%MACRO MULTICOLLINEARITY(TABLE_SUFFIX,YVAR,FIELDS,MAX_VIF);
/* PRELIMINARY PROC REG ON ALL FIELDS*/
PROC REG DATA=TABLE_&TABLE_SUFFIX. NOPRINT;
MODEL &YVAR = &FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST1;
RUN;
/* RETAIN NON-NULL VIF FIELDS ONLY */
DATA NO_NULL_VIF;
SET PAREST1 (WHERE=(VarianceInflation <> .));
RUN;
/* CREATE VARIABLE LIST OF NON-NULL VIF FIELDS */
PROC SQL;
SELECT VARIABLE
INTO :NO_NULL_VIF_FIELDS SEPARATED BY ' '
FROM NO_NULL_VIF;
QUIT;
/* RE-RUN REGRESSION WITH NON-NULL VIF FIELDS ONLY */
PROC REG DATA=TABLE_&TABLE_SUFFIX. NOPRINT;
MODEL &YVAR = &NO_NULL_VIF_FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST2;
RUN;
/* START ITERATION OF DROPPING THE HIGHEST VIF UNTIL THE CRITERIA IS MET */
???
%MEND;
%MULTICOLLINEARITY(, RESPONSE, &INPUT_FIELDS,???)
And by criteria I mean VIF_MAX < N where N is some threshold specified in the macro. For example, if we want to retain only fields with VIF less than 5, then it should drop the highest one, re-run the PROC REG, drop the highest, re-run, etc. etc. until the highest on is less than 5.
First off - I'd verify that you can't do this using PROC MODEL. I'm not a regression guy so I don't know for sure. Might be worth posting on a more stat-focused site; CV isn't really appropriate since they're not generally trying to answer software questions, but maybe communities.sas.com . I would find it surprising if this wasn't directly possible in PROC MODEL and/or in one of the more complicated procs.
Second, the way I'd approach this is to write a recursive macro. Take out the first part (the non-null VIF fields) and either move that to an outer macro that just runs once, or make it an expectation of the programmer to do on his/her own (unless this is not feasible, and/or can change with iterations - not something I'm knowledgeable of). Then do something like this:
%MACRO MULTICOLLINEARITY(TABLE_SUFFIX,YVAR,FIELDS,MAX_VIF);
ods _all_ close;
%put Running with &fields; *note which fields currently running;
*also may want to include a run # counter as parameter;
PROC REG DATA=TABLE_&TABLE_SUFFIX.;
MODEL &YVAR = &FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST2;
RUN;
quit;
*Data step to analyse PAREST2 and see if any of the fields can be dropped;
proc sort data=parest2;
by descending varianceinflation;
run;
data _null_;
set parest2(obs=1);
if varianceinflation > &max_vif then do;
fields_run = tranwrd("&fields",trim(variable),' ');
if not missing(fields_run) then do;
call_string = cats('%multicollinearity(',"&table_suffix.,&yvar.,",fields_run,",&max_vif.)");
call execute(call_string);
end;
end;
else do;
put "Stopped with Max VIF:" variable "=" varianceinflation;
run;
ods preferences;
%MEND MULTICOLLINEARITY;
Then you call it once with the full field list, and it calls itself in the CALL EXECUTE if there is still a parameter left. An incremented # of runs may be helpful (both to see how many times it ran in your log, and to be able to make sure that you don't end up in an infinite loop if you make a mistake with the fields variable deletion.)
I would run this with OPTION NONOTES NOSOURCE; and none of the symbogen/mprint stuff on, so you can just get the %put/put statements in your log.

Can I get some default/empty text to display if a PROC REPORT doesn't generate due to no valid data?

I have a SAS program that loops through certain sets of data and generates a bunch of reports to an ODS HTML destination.
Sometimes, due to small sets of data I run these reports for, a certain PROC REPORT will not generate because, for this set of data I'm on, there is no data to report. I get this message for those instances:
WARNING: A GROUP, ORDER, or ACROSS variable is missing on every observation.
What I want in the HTML is to display some sort of message for these like "did not generate" or something.
I tried to use return/error codes or the warning text above to detect this, but the error code is 0 (no problem, really?) and the warning text doesn't reset if the next PROC REPORT generates OK.
If it is of any importance, I'm using a data step with CALL EXECUTE to get all this PROC REPORT code generated for these sets of data.
Is there any way to generate this "did not generate" message or at least to catch these warnings per PROC REPORT?
You can substitute in a value for the missing observations in your report.
First redefine missing values to some character. I think you can only use a single character, I could be wrong, though.
options missing='M';run;
Then make sure to use the "missing" option in your PROC REPORT.
proc report data=somedata nowd headline missing;
....
run;
EDITS BASED ON COMMENTS
To get comments to show up, I see a few possibilities.
One, scan the the data set and check for missing values. If any are present throw a message out.
Data _Null_;
Set dataset;
file print notitles;
if obs = . then do;
put #01 'DID NOT COMPUTE';
stop;
end;
run;
Two, add a column with a compute:
define xx /computed "(Message)";
compute xx /char length=16 ;
if obs =. then xx = 'did not compute value in row';
Three, a conditional line using compute:
compute after obs;
if obs = . then do;
line #1 "DID NOT COMPUTE";
end;
endcomp;
endcomp;
See: http://www2.sas.com/proceedings/sugi26/p095-26.pdf
Look for the MTANYOBS macro and the section on printing a 'no observations' page.