Trying to convert the tick value of Y-Axis Scale in SAS - sas

I'm trying to convert the tick value of Y-Axis Scale from (0 .2 .4 .6 .8 1.0) to (0 .01 .02 .03 .04 .05), but failed. However, no such problem when converting viewmax
PROC TEMPLATE;
DELETE Stat.Lifetest.Graphics.ProductLimitFailure2;
SOURCE Stat.Lifetest.Graphics.ProductLimitFailure2 / FILE='C:\Users\Username\Documents\My SAS Files\9.4\tpl.tpl';
QUIT;
DATA _null_;
INFILE 'C:\Users\Username\Documents\My SAS Files\9.4\tpl.tpl' END=eof;
INPUT;
IF _n_ eq 1 THEN CALL execute('PROC TEMPLATE;');
_infile_ = tranwrd(_infile_, 'viewmax=1', 'viewmax=0.05'); /* tranwrd(var, from, to);*/
_infile_ = tranwrd(_infile_, 'tickvaluelist=(0 .2 .4 .6 .8 1.0)', 'tickvaluelist=(0 .01 .02 .03 .04 .05)');
CALL execute(_infile_);
IF eof THEN CALL execute('quit;');
RUN;
PROC LIFETEST DATA=for_analysis_1 PLOT=SURVIVAL (FAILURE TEST ATRISK(OUTSIDE(0.10) MAXLEN=26) NOCENSOR) NOTABLE;
TIME Days * Status(0);
STRATA group;
RUN;
**This code was adapted from: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_kaplan_sect012.htm
Thank you, Joe
I was running SAS with zt mode. It worked successfully after changing to EN mode. Besides, open the .tpl file and edit it by hand is a easy way too! Very appreciate that.
**Change the default SAS® session encoding:
https://support.sas.com/kb/51/586.html

It works fine for me.
PROC TEMPLATE;
DELETE Stat.Lifetest.Graphics.ProductLimitFailure2;
SOURCE Stat.Lifetest.Graphics.ProductLimitFailure2 / FILE='....dir....\tpl.tpl';
QUIT;
DATA _null_;
INFILE '....dir....\tpl.tpl' END=eof;
INPUT;
IF _n_ eq 1 THEN CALL execute('PROC TEMPLATE;');
_infile_ = tranwrd(_infile_, 'viewmax=1', 'viewmax=0.05'); /* tranwrd(var, from, to);*/
_infile_ = tranwrd(_infile_, 'tickvaluelist=(0 .2 .4 .6 .8 1.0)', 'tickvaluelist=(0 .01 .02 .03 .04 .05)');
CALL execute(_infile_);
IF eof THEN CALL execute('quit;');
RUN;
proc lifetest data=sashelp.BMT
plots=survival(cb=hw failure test atrisk(outside maxlen=13));
time T * Status(0);
strata Group;
run;
What you can do to test is, first, run this exact code (using the built in dataset); if that works, then you know it's an issue with your data. Second, I wonder if you might have some issue with character encoding (I'm running in EN-US mode, are you in UTF-8? Are the spaces real spaces, or maybe 'A0'x web spaces?). TRANWRD is quite the "hammer" to use here. Maybe consider using a different way to change the values that's easier to debug. At least, step through it with some PUT statements to see if the TRANWRD is actually doing anything (before/after that line, PUT the infile value IF "tickvaluelist" is found).
Also consider it's possible that the tickvaluelist is separated across lines. Open the .tpl file and just see what you have! There's nothing saying you need to do it this way - you could just write the PROC TEMPLATE code directly. What I'd probably do is run that first PROC TEMPLATE that sources it to a file, then edit it by hand, then run it as part of your code. The only reason not to do that is if you want to programatically change the tick values list based on what's in the data, but even that you could do differently - like with a macro variable.
Below is what works for me and an image.

Related

How to determine the frequency of a time series?

For an if-query I would like to create a macro varibale giving the respective frequency of the underlying time
series. I tried to get some descriptive statistics from proc time series. However, they unfortunately do not include the figure for the frequency.
The underlying times series does not necessarily conclude all periods of the frequency. That excludes a selected count by proc sql from my point of view.
Does anyone know an efficient procedure to determine the frequency without computing the frequency on my own (in a data step or a proc sql code)?
You can use the outspectra statement to help learn what kind of seasonality it has. Based on the data, give PROC TIMESERIES your best guess of day, month, etc. In the example below, we know we want to forecast by month but we do not know what seasonality it has.
proc timeseries data=sashelp.air outspectra=spectra;
id date interval=month;
var air;
run;
Plot this spectra dataset in proc sgplot and you'll see something that looks like this:
proc sgplot data=spectra;
where NOT missing(period);
series x=period y=p;
run;
This line will naturally increase over time, but we're looking for a bumps in the line. Notice the large bump somewhere between 0 and 24 months and the several smaller bumps before it. Let's zoom in on that by filtering out the longer periods.
proc sgplot data=spectra;
where period < 24 and NOT missing(period);
series x=period y=p;
run;
It's pretty clear that there is a strong seasonality of 12, with potentially smaller cycles at 3 and 6 months. From this plot, we can conclude that our seasonality should be 12 based on our spectra plot.
You can turn this into a macro to help identify the season if you'd like. Simply search for the largest bump within a reasonable timeframe. In our case we'll choose 36 because we do not suspect that we have any seasonality > 36 months.
proc sort data=spectra;
by period;
run;
data identify_period;
set spectra;
by period;
where NOT missing(period) AND period LE 36;
delta = abs(p - lag(p) );
run;
proc sql;
select period, max(delta) as max_delta
from identify_period
having delta = max(delta)
;
quit;
Output:
PERIOD max_delta
12 163712
I don't know how to do this without data step logic, but you could wrap the data step in a macro as follows:
%macro get_frequency(data,date_variable,output_variable);
proc sort data=&data (keep=&date_variable) out=__tempsorted;
by &date_variable;
run;
data _null_;
set __tempsorted end=lastobs;
prevdate=lag(&date_variable);
if _n_ > 1 then do;
interval_number+1;
interval_total + (&date_variable - prevdate);
end;
if lastobs then do;
average_interval = interval_total/interval_number;
frequency = round(365.25/average_interval);
call symput ("&output_variable",left(put(frequency,best32.)));
end;
run;
proc datasets nolist;
delete __tempsorted;
run;
quit;
%mend get_frequency;
Then you can call the macro on your original data set timeseries to examine the variable date and create a new macro variable frequency1 with the required frequency.
data work.timeseries;
input date date. value;
format date date9.;
datalines;
01Oct18 3000
01Nov18 4000
01Dec18 6500
01Jan19 7000
01Feb19 4000
01Mar19 5000
01Apr19 7500
01May19 4800
01Jun19 4500
;
run;
%get_frequency(timeseries,date,freqency1)
%put &=frequency1;
This seems to work on your sample data where each date is the first of the month. If your dates are evenly distributed (e.g. always near month start/end, or always near mid-month etc.) then this macro should work ok. Obviously if you have multiple observations per date then it will give the completely incorrect frequency.

SAS Macro variable escaping apostrophe in variable name Proc Http

I have been working on this for 3 days now and have tried all I can think of including %str(),%bquote(), translate() and tranwrd() to replace single apostrophe with double apostrophe or %’
The below data step and macro work fine until I hit a last name which contains an apostrophe e.g. O’Brien. I then encounter syntax errors due to un closed left parentheses. The below code I have left what I thought was closest to working with the tranwrd included.
Any assistance you can provide is greatly appreciated.
%macro put_data (object1,id);
Proc http
method=“put”
url=“https://myurl/submissionid/&id”
in=&object1;
Headers “content-type”=“application/json”;
Run;
%mend;
data _null_;
Set work.emp_lis;
call execute(catt(‘%put_data(‘,’%quote(‘’{“data”:{“employeeName”:”’,tranwrd(employeeName,”’’”,”’”),’”}}’’),’,id,’)’));
run;
Craig
There are a wide potential of problems in constructing or passing a json string in SAS macro. Proc JSON will produce valid json (in a file) from data and that file in turn can be specified as the input to be consumed by your web service.
Example:
data have;
length id 8 name $25;
input id& name&;
datalines;
1 Homer Simpson
2 Ned Flanders
3 Shaun O'Connor
4 Tom&Bob
5 'X Æ A-12'
6 Goofy "McDuck"
;
%macro put_data (data,id);
filename jsonfile temp;
proc json out=jsonfile;
export &data.(where=(id=&id));
write values "data";
write open object;
write values "name" name;
write close;
run;
proc http
method="put"
url="https://myurl/submissionid/&id"
in=jsonfile
;
headers "content-type"="application/json";
run;
%mend;
data _null_;
set have;
call execute(cats('%nrstr(%put_data(have,',id,'))'));
run;
I was able to find issue with my code with the tranwrd statement was backwards and it needed to be moved to the proc sql create table statement. I also needed to wrap &object1 in %bquote. This was the final code that worked.
When creating table wrap variables in tranwrd as below.
tranwrd(employeeName, “‘“,”’’”)
% macro put_data (object1,id);
Proc http
method=“put”
url=“https://myurl/submissionid/&id”
in=%bquote(&object1);
Headers “content-type”=“application/json”;
Run;
%mend;
data _null_;
Set work.emp_lis;
call execute(catt(‘%put_data(‘,’%quote(‘’{“data”:{“employeeName”:”’,employeeName,’”}}’’),’,id,’)’));
run;
Just use actual quotes and you won't have to worry about macro quoting at all.
So if your macro looks like this:
%macro put_data(object1,id);
proc http method="put"
url="https://myurl/submissionid/&id"
in=&object1
;
headers "content-type"="application/json";
run;
%mend;
Then the value of OBJECT1 would usually be a quoted string literal or a fileref. (There are actually other forms.) Looks like you are trying to generate a quoted string. So just use the QUOTE() function.
So if your data looks like:
data emp_lis;
input id employeeName $50.;
cards;
1234 O'Brien
4567 Smith
;
Then you can use a data step like this to generate one macro call for each observation.
data _null_;
set emp_lis;
call execute(cats
('%nrstr(%put_data)('
,quote(cats('{"data":{"employeeName":',quote(trim(employeeName)),'}}'))
,',',id
,')'
));
run;
And your SAS log will look something like:
NOTE: CALL EXECUTE generated line.
1 + %put_data("{""data"":{""employeeName"":""O'Brien""}}",1234)
NOTE: PROCEDURE HTTP used (Total process time):
real time 2.46 seconds
cpu time 0.04 seconds
2 + %put_data("{""data"":{""employeeName"":""Smith""}}",4567)
NOTE: PROCEDURE HTTP used (Total process time):
real time 2.46 seconds
cpu time 0.04 seconds

Merging the Not Missing Observation also in the final Table and transpose the final Output

I have 10 different variables in 10 different tables with the VARNAME and MISSING PERCENT.
Out of these 10, lets say 5 do not have the "MISSING PERCENT" and I want to include these observation with 0% Missing. For now, it eliminates this observation in the final output.
data Final_Output_All_Missing;
length VARNAME $ 30;
merge work.Final_Output_MOLD work.Final_output_tbm_stage2
work.final_output_article7
work.final_output_tbm_stage1 work.final_output_bladder
work.final_output_batch_id;
by varname;
keep VARNAME PERCENT;
run;
VARNAME MISSING PERCENT
BLADDER 0.10
MOLD 0.06
TBM_STAGE1 0.18
TBM_STAGE2 99.9
Secondly, I have already merged the different tables containing different variables(0% still needs to be merged) as shown below:
After merging, I want to see the output in this format. is it possible for me to get in this format?
BLADDER MOLD TBM_STAGE1 TBM_STAGE2
1. 0.10% 0.06% 0.18% 99.9%
Appreciate your help!
The Code below will:
Replace missing values with 0 (I added missing values in the data)
Adds Percent format xx.xx%
Transpose the data
Code:
/*Create input data*/
data have;
informat VARNAME $10.;
input VARNAME $ MISSING_PERCENT ;
length VARNAME $10;
datalines;
BLADDER 0.10
MOLD 0.06
TBM_STAGE1 0.18
TBM_STAGE2 99.9
TBM_STAGE3 .
TBM_STAGE4 .
;;;;;;
run;
/*Replace missing values with 0, and Add % format */
proc sql;
create table work.new as
select
VARNAME,
coalesce(MISSING_PERCENT,0)/100 as MISSING_PERCENT format=percent8.2
from work.have;
quit;
/*Transpose Data*/
proc transpose data=work.new
out=work.want name=VARNAME;
id VARNAME;
run;
Input:
Formatted:
Output:
Thank you for the answer. It is perfectly giving what I want.
But, I do have a list of 160 Variables and I think to hard code it, will take more effort. What I did is I excluded the Variables which have 0% Missing Percent and extracted the ones which have missing percent.
This is how I did it to extract the Missing Percent:
> data Final_Output_&var;
set Final_Output_&var;
VARNAME = "&var";
%if &t=char %then
%do;
where put (&var., $missfmt.) in ("Missing");
%end;
%else
%do;
where put (&var., missfmt.) in ("Missing");
%end;
run;
Thanks again for your quick reply !!!
Best Regards,
Pankaj

SAS Data Step | Between 2 Dates

Probably a simple question. I have a simple dataset with scheduled payment dates in it.
DATA INFORM2;
INFORMAT previous_pmt_date scheduled_pmt_date MMDDYY10.;
INPUT previous_pmt_date scheduled_pmt_date;
FORMAT previous_pmt_date scheduled_pmt_date MMDDYYS10.;
DATALINES;
11/16/2015 12/16/2015
12/17/2015 01/16/2016
01/17/2016 02/16/2016
;
What I'm trying to do is to create a binary latest row indicator. For example, If I wanted to know the latest row as of 1/31/2016 I'd want row 2 to be flagged as the latest row. What I had been doing before is finding out where 1/31/2016 is between the previous_pmt_date and the scheduled_pmt_date, but that isn't correct for my purposes. I'd like to do this in an data step as opposed to SQL subqueries. Any ideas?
Want:
previous_pmt_date scheduled_pmt_date latest_row_ind
11/16/2015 12/16/2015 0
12/17/2015 01/16/2016 1
01/17/2016 02/16/2016 0
Here's a solution that does it all in the single existing datastep without any additional sorting. First I'm going to modify your data slightly to include account as the solution really should take that into account as well:
DATA INFORM2;
INFORMAT previous_pmt_date scheduled_pmt_date MMDDYY10.;
INPUT account previous_pmt_date scheduled_pmt_date;
FORMAT previous_pmt_date scheduled_pmt_date MMDDYYS10.;
DATALINES;
1 11/16/2015 12/16/2015
1 12/17/2015 01/16/2016
1 01/17/2016 02/16/2016
2 11/16/2015 12/16/2015
2 12/17/2015 01/16/2016
2 01/17/2016 02/16/2016
;
run;
Specify a cutoff date:
%let cutoff_date = %sysfunc(mdy(1,31,2016));
This solution uses the approach from this question to save the variables in the next row of data, into the current row. You can drop the vars at the end if desired (I've commented out for the purposes of testing).
data want;
set inform2 end=eof;
by account scheduled_pmt_date;
recno = _n_ + 1;
if not eof then do;
set inform2 (keep=account previous_pmt_date scheduled_pmt_date
rename=(account = next_account
previous_pmt_date = next_previous_pmt_date
scheduled_pmt_date = next_scheduled_pmt_date)
) point=recno;
end;
else do;
call missing(next_account, next_previous_pmt_date, next_scheduled_pmt_date);
end;
select;
when ( next_account eq account and next_scheduled_pmt_date gt &cutoff_date ) flag='a';
when ( next_account ne account ) flag='b';
otherwise flag = 'z';
end;
*drop next:;
run;
This approach works by using the current observation in the dataset (obtained via _n_) and adding 1 to it to get the next observation. We then use a second set statement with the point= option to load in that next observation and rename the variables at the same time so that they don't overwrite the current variables.
We then use some logic to flag the necessary records. I'm not 100% of the logic you require for your purposes, so I've provided some sample logic and used different flags to show which logic is being triggered.
Some notes...
The by statement isn't strictly necessary but I'm including it to (a) ensure that the data is sorted correctly, and (b) help future readers understand the intent of the datastep as some of the logic requires this sort order.
The call missing statement is simply there to clean up the log. SAS doesn't like it when you have variables that don't get assigned values, and this will happen on the very last observation so this is why we include this. Comment it out to see what happens.
The end=eof syntax basically creates a temporary variable called eof that has a value of 1 when we get to the last observation on that set statement. We simply use this to determine if we're at the last row or not.
Finally but very importantly, be sure to make sure you are keeping only the variables required when you load in the second dataset otherwise you will overwrite existing vars in the original data.

How to use lag function to calculate next observation in SAS

Suppose the dataset has 3 columns
Obs Theo Cal
1 20 20
2 21 23
3 21 .
4 22 .
5 21 .
6 23 .
Theo is the theoretical value while Cal is the estimated value.
I need to calculate the missing Cal.
For each Obs, its Cal is a linear combination of previous two Cal values.
Cal(3) = Cal(2) * &coef1 + Cal(1) * &coef2.
Cal(4) = Cal(3) * &coef1 + Cal(2) * &coef2.
But Cal = lag1(Cal) * &coef1 + lag2(Cal) * &coef2 didn't work as I expected.
The problem with using lag is when you use lag1(Cal) you're not getting the last value of Cal that was written to the output dataset, you're getting the last value that was passed to the lag1 function.
It would probably be easier to use a retain as follows:
data want(drop=Cal_l:);
set have;
retain Cal_l1 Cal_l2;
if missing(Cal) then Cal = Cal_l1 * &coef1 + Cal_l2 * &coef2;
Cal_l2 = Cal_l1;
Cal_l1 = Cal;
run;
I would guess you wrote a datastep like so.
data want;
set have;
if missing(cal) then
cal = lag1(cal)*&coef1 + lag2(cal)*&coef2;
run;
LAG isn't grabbing a previous value, but is rather creating a queue that is N long and gives you the end piece of. If you have it behind an IF statement, then you will never put the useful values of CAL into that queue - you'll only be tossing missings into it. See it like so:
data have;
do x=1 to 10;
output;
end;
run;
data want;
set have;
real_lagx = lag(x);
if mod(x,2)=0 then do;
not_lagx = lag(x);
put real_lagx= not_lagx=;
end;
run;
The Real lags are the immediate last value, while the NOT lags are the last even value, because they're inside the IF.
You have two major options here. Use RETAIN to keep track of the last two observations, or use LAG like I did above before the IF statement and then use the lagged values inside the IF statement. There's nothing inherently better or worse with either method; LAG works for what it does as long as you understand it well. RETAIN is often considered 'safer' because it's harder to screw up; it's also easier to watch what you're doing.
data want;
set have;
retain cal1 cal2;
if missing(cal) then cal=cal1*&coef1+cal2*&coef2;
output;
cal2=cal1;
cal1=cal;
run;
or
data want;
set have;
cal1=lag1(cal);
cal2=lag2(cal);
if missing(cal) then cal=cal1*&coef1+cal2*&coef2;
run;
The latter method will only work if cal is infrequently missing - specifically, if it's never missing more than once from any three observations. In the initial example, the first cal (row 3) will be populated, but from there on out it will always be missing. This may or may not be desired; if it's not, use retain.
There might be a way to accomplish it in a DATA step but as for me, when I want SAS to process iteratively, I use PROC IML and a do loop. I named your table SO and succesfully ran the following :
PROC IML;
use SO; /* create a matrix from your table to be used in proc iml */
read all var _all_ into table;
close SO;
Cal=table[,3];
do i=3 to nrow(cal); /* process iteratively the calculations */
if cal[i]=. then do;cal[i]=&coef1.*cal[i-1]+&coef2.*cal[i-2];
end;else do;end;
end;
table[,3]=cal;
Varnames={"Obs" "Theo" "Cal"};
create SO_ok from table [colname=varnames]; /* outputs a new table */
append from table;
close SO_ok;
QUIT;
I'm not saying you couldn't use lag() and a DATA step to achieve what you want to do. But I find that PROC IML is useful and more intuitive when it comes to iterative process.