I have a numeric parameter given to my macro and would like to convert it to date, set to end of month and apply a format.
Following code works for many dates, but not for march; throws 'Literal contains unmatched quote'.
proc format;
picture mydatep
low-high = "'%0d-%0b-%0Y'" (datatype = date);
%macro test(cycle=);
%let enddate = %SYSFUNC(intnx(month, %SYSFUNC(inputn(&cycle., yymmn6.)), 0, e), mydatep.);
%put &enddate.;
%mend;
%test(cycle=201602); /* works --> 29-Feb-2016*/
%test(cycle=201603); /* works not */
%test(cycle=201604); /* works again --> 30-Apr-2016*/
%test(cycle=201402); /* works --> 28-Feb-2014*/
%test(cycle=201403); /* works not */
%test(cycle=201404); /* works again --> 30-Apr-2014*/
I have been using the code for some years now, and never had trouble with it. I am using SAS Analytics Pro 9.4
Solution: I was starting the SAS session via SAS (Unicode). Switching to SAS (Deutsch) [engl: SAS (German)], solved the issue.
I don't know why, though.
#Kenji: "Switching to SAS (Deutsch) [engl: SAS (German)], solved the issue. I don't know why, though."
The explanation is quite simple, some date formats in German and English differ in just a few cases:
German English equal?
----------------------------------------
01Jan2022 01Jan2022 yes
01Feb2022 01Feb2022 yes
01Mär2022 01Mar2022 NO
01Apr2022 01Apr2022 yes
...
01Okt2022 01Oct2022 NO
01Nov2022 01Nov2022 yes
01Dez2022 01Dec2022 NO
So in a German environment, it is a common observation that your code might work in most cases but not for March, October and December.
If you used an endash or emdash instead of hyphens in your PICTURE text that would change the generated character string from 12 bytes to 13 or 14 bytes. Those non ASCII characters require more than one byte of storage.
So if your code used a width of 12 with that format then the value would be truncated removing the closing quote and possibly the last digit of the year also.
Tell PROC FORMAT that the default width for the format should be 14 and not 13.
proc format;
picture mydatep (default=14)
low-high = "'%0d-%0b-%0Y'" (datatype = date);
run;
Example:
23 proc options option=ENCODING option=LOCALE option=DATESTYLE option=DFLANG ;
24 run;
SAS (r) Proprietary Software Release 9.4 TS1M5
ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.
LOCALE=EN_US Specifies a set of attributes in a SAS session that reflect the language, local conventions, and culture for a
geographical region.
DATESTYLE=MDY Specifies the sequence of month, day, and year when ANYDTDTE, ANYDTDTM, or ANYDTTME informat data is ambiguous.
DFLANG=GERMAN Specifies the language for international date informats and formats.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
25
26 options dflang=german locale=de_DE ;
27
28 data test;
29 do month=1 to 12;
30 length string $20 ;
31 string=put(mdy(month,1,2000),mydatep.);
32 put month= string=;
33 end;
34 run;
month=1 string='01-Jan-2000'
month=2 string='01-Feb-2000'
month=3 string='01-Mär-2000
month=4 string='01-Apr-2000'
month=5 string='01-Mai-2000'
month=6 string='01-Jun-2000'
month=7 string='01-Jul-2000'
month=8 string='01-Aug-2000'
month=9 string='01-Sep-2000'
month=10 string='01-Okt-2000'
month=11 string='01-Nov-2000'
month=12 string='01-Dez-2000'
NOTE: The data set WORK.TEST has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
35 proc format;
36 picture mydatep (default=14)
37 low-high = "'%0d-%0b-%0Y'" (datatype = date);
NOTE: Format MYDATEP is already on the library WORK.FORMATS.
NOTE: Format MYDATEP has been output.
38 run;
NOTE: PROZEDUR FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
39
40 data test;
41 do month=1 to 12;
42 length string $20 ;
43 string=put(mdy(month,1,2000),mydatep.);
44 put month= string=;
45 end;
46 run;
month=1 string='01-Jan-2000'
month=2 string='01-Feb-2000'
month=3 string='01-Mär-2000'
month=4 string='01-Apr-2000'
month=5 string='01-Mai-2000'
month=6 string='01-Jun-2000'
month=7 string='01-Jul-2000'
month=8 string='01-Aug-2000'
month=9 string='01-Sep-2000'
month=10 string='01-Okt-2000'
month=11 string='01-Nov-2000'
month=12 string='01-Dez-2000'
NOTE: The data set WORK.TEST has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Related
I want to store an instance of a data step variable in a macro-variable using call symput, then use that macro-variable in the same data step to populate a new field, assigning it a new value every 36 records.
I tried the following code:
data a;
set a;
if MOB = 1 then do;
MOB1_accounts = accounts;
call symput('MOB1_acct', MOB1_accounts);
end;
else if MOB > 1 then MOB1_accounts = &MOB1_acct.;
run;
I have a series of repeating MOB's (1-36). I want to create a field called MOB1_Accts, set it equal to the # of accounts for that cohort where MOB = 1, and keep that value when MOB = 2, 3, 4 etc. I basically want to "drag down" the MOB 1 value every 36 records.
For some reason this macro-variable is returning "1" instead of the correct # accounts. I think it might be a char/numeric issue but unsure. I've tried every possible permutation of single quotes, double quotes, symget, etc... no luck.
Thanks for the help!
You are misusing the macro system.
The ampersand (&) introducer in source code tells SAS to resolve the following symbol and place it into the code submission stream. Thus, the resolved &MOB1_acct. can not be changed in the running DATA Step. In other words, a running step can not change it's source code -- The resolved macro variable will be the same for all implicit iterations of the step because its value became part of the source code of the step.
You can use SYMPUT() and SYMGET() functions to move strings out of and into a DATA Step. But that is still the wrong approach for your problem.
The most straight forward technique could be
use of a retained variable
mod (_n_, 36) computation to determine every 36th row. (_n_ is a proxy for row number in a simple step with a single SET.)
Example:
data a;
set a;
retain mob1_accounts;
* every 36 rows change the value, otherwise the value is retained;
if mod(_n_,36) = 1 then mob1_accounts = accounts;
run;
You didn't show any data, so the actual program statements you need might be slightly different.
Contrasting SYMPUT/SYMGET with RETAIN
As stated, SYMPUT/SYMGET is a possible way to retain values by off storing them in the macro symbol table. There is a penalty though. The SYM* requires a function call and whatever machinations/blackbox goings on are happening to store/retrieve a symbol value, and possibly additional conversions between character and numeric.
Example:
1,000,000 rows read. DATA _null_ steps to avoid writing overhead as part of contrast.
data have;
do rownum = 1 to 1e6;
mob + 1;
accounts = sum(accounts, rand('integer', 1,50) - 10);
if mob > 36 then mob = 1;
output;
end;
run;
data _null_;
set have;
if mob = 1 then call symput ('mob1_accounts', cats(accounts));
mob1_accounts = symgetn('mob1_accounts');
run;
data _null_;
set have;
retain mob1_accounts;
if mob = 1 then mob1_accounts = accounts;
run;
On my system logs
142 data _null_;
143 set have;
144
145 if mob = 1 then call symput ('mob1_accounts', cats(accounts));
146
147 mob1_accounts = symgetn('mob1_accounts');
148 run;
NOTE: There were 1000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.34 seconds
cpu time 0.34 seconds
149
150 data _null_;
151 set have;
152 retain mob1_accounts;
153
154 if mob = 1 then mob1_accounts = accounts;
155 run;
NOTE: There were 1000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
Or
way real cpu
------------- ------ ----
SYMPUT/SYMGET 0.34 0.34
RETAIN 0.04 0.03
I am extracting data from a database that has all values posted in strings, in the format +000000xx.xxx or -00000xx.xxx . I need to convert these to numeric to operate on.
data want;
set have;
numeric_var = string_var*1;
run;
works fine, but, to save compute time and resources on the final running, which will be over a much larger dataset, and in the interest of doing things properly I'd rather do that with a format or informat statement.
data want;
set have;
numeric_var = input(string_var, best8.);
run;
seems to output wrong values and to round everything to 0.
Any ideas?
Using best8. is telling SAS to only consider the first 8 characters of the string, so that's never going to work. You should use just best. or possibly best32. if you feel you have to pre-specify the length.
However, make sure you run some benchmarks before changing your current simple solution. SAS is already doing a character-to-numeric conversion as part of the numeric_var = string_var*1; statement, and is apparently doing it correctly; changing the code to use an informat will not automatically be any faster.
It would be cool if you benchmarked both methods and reported the results back here.
EDIT:
I did some benchmarking on this, out of curiosity. The code and log are below but TL;DR - the informat seems to be very slightly but consistently faster - 7.58 seconds vs 7.83 seconds in the run below on a 50 million observation data set. So the informat method is the way to go, but the 3% performance gain wouldn't be worth refactoring a large program, particularly if you don't have good test coverage to be sure of avoiding regressions.
483 * Set small for testing, big for benchmarking;
484 %let obs = 50000000;
485
486 * Generate test data;
487 data testdata;
488 do i = 1 to &obs;
489 numeric = round(ranuni(0)*100, 0.001);
490 char = '+' || put(numeric, z12.3-L);
491 output;
492 end;
493 run;
NOTE: The data set WORK.TESTDATA has 50000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 12.55 seconds
user cpu time 11.41 seconds
system cpu time 0.84 seconds
memory 4375.18k
OS Memory 20784.00k
Timestamp 12/10/2019 10:36:11 AM
Step Count 51 Switch Count 0
494
495 %macro charToNum(in=, method=, obs=);
496
497 * Convert back to numeric;
498 data converted;
499 set ∈
500 %if "&method" = "MULT-BY-ONE" %then %do;
501 converted = char * 1;
502 %end; %else %if "&method" = "INFORMAT" %then %do;
503 converted = input(char, 32.);
504 %end;
505 if converted ne numeric then do;
506 put "ERROR: Conversion failed: " numeric= char= converted=;
507 end;
508 run;
509
510 %mend;
511
512 %charToNum(in = testdata, method = MULT-BY-ONE, obs = &obs);
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
3:20
NOTE: There were 50000000 observations read from the data set WORK.TESTDATA.
NOTE: The data set WORK.CONVERTED has 50000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 7.83 seconds
user cpu time 5.92 seconds
system cpu time 1.88 seconds
memory 14642.84k
OS Memory 31036.00k
Timestamp 12/10/2019 10:36:18 AM
Step Count 52 Switch Count 0
513 %charToNum(in = testdata, method = INFORMAT, obs = &obs);
NOTE: There were 50000000 observations read from the data set WORK.TESTDATA.
NOTE: The data set WORK.CONVERTED has 50000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 7.58 seconds
user cpu time 5.36 seconds
system cpu time 2.15 seconds
memory 14646.18k
OS Memory 31036.00k
Timestamp 12/10/2019 10:36:26 AM
Step Count 53 Switch Count 0
If you want to keep only the numbers, use the code below.
Using compress this way the numbers in the string will keeped.
The first parameter is the name of variable. The second is optional, this case the caracters to be keeped. Third is "k" that means keep.
data want;
set have;
numeric_var = input(compress(string_var,"0123456789","k"), best8.);
run;
I created a day variable using the following code:
DAY=datepart(checkin_date_time); /*example of checkin_date_time 1/1/2014 4:44:00*/
format DAY DOWNAME.;
Sample Data:
ID checkin_date_time Admit_Type BED_ORDERED_TO_DISPO
1 1/1/2014 4:40:00 ICU 456
2 1/1/2014 5:64:00 Psych 146
3 1/1/2014 14:48:00 Acute 57
4 1/1/2014 20:34:00 ICU 952
5 1/2/2014 10:00:00 Psych 234
6 1/2/2014 3:48:00 Psych 846
7 1/2/2014 10:14:00 ICU 90
8 1/2/2014 22:27:00 ICU 148
I want to analyze some data using Proc Tab where day is one of the class variables and have the day of week appear in chronological order in the output; however, the output table begins with Tuesday. I would like it to start with Sunday. I've read over the the following page http://support.sas.com/resources/papers/proceedings11/085-2011.pdf and tried the proc format invalue code but it's producing a table that where the "day of week" = "21". Not quite sure where to go from here.
Thanks!
proc format;
invalue day_name
'Sunday'=1
'Monday'=2
'Tuesday'=3
'Wednesday'=4
'Thursday'=5
'Friday'=6
'Saturday'=7;
value day_names
1='Sunday'
2='Monday'
3='Tuesday'
4='Wednesday'
5='Thursday'
6='Friday'
7='Saturday';
run;
data Combined_day;
set Combined;
day_of_week = input(day,day_name.);
run;
proc tabulate data = Combined_day;
class Day Admit_Type;
var BED_ORDERED_TO_DISPO ;
format day_of_week day_names.;
table Day*Admit_Type, BED_ORDERED_TO_DISPO * (N Median);
run;
Fundamentally, you are confusing actual values with displayed values (i.e., formats). Specifically, datepart extracts the date portion out of a date/time field. Then, applying a format only changes how it is displayed not actual underlying value. So below DAY never contains the character values of 'WEDNESDAY' or 'THURSDAY' but original integer value (19724 and 19725).
DAY = datepart(checkin_date_time); // DATE VALUE
format DAY DOWNAME.; // FORMATTED DATE VALUE (SAME UNDERLYING DATE VALUE)
Consider actually assigning a column as weekday value using WEEKDAY function. Then apply your user-defined format for proc tabulate.
data Combined_day;
set Combined;
checkin_date = datepart(checkin_date_time); // NEW DATE VALUE (NO TIME)
format checkin_date date9.;
checkin_weekday = weekday(checkin_date); // NEW INTEGER VALUE OF WEEKDAY
run;
proc tabulate data = Combined_day;
class checkin_weekday Admit_Type;
var BED_ORDERED_TO_DISPO ;
format checkin_weekday day_names.; // APPLY USER DEFINED FORMAT
table checkin_weekday*Admit_Type, BED_ORDERED_TO_DISPO * (N Median);
run;
I have a dataset that looks like:
Month Cost_Center Account Actual Annual_Budget
June 53410 Postage 13 234
June 53420 Postage 0 432
June 53430 Postage 48 643
June 53440 Postage 0 917
June 53710 Postage 92 662
June 53410 Phone 73 267
June 53420 Phone 103 669
June 53430 Phone 90 763
...
I would like to first sum the Actual and Annual columns, respectively and then create a variable where it flags if the Actual extrapolated for the entire year is greater than than Annual column.
I have the following code:
Data Test;
set Combined;
%All_CC; /*MACRO TO INCLUDE ALL COST CENTERS*/
%Total_Other_Expenses;/*MACRO TO INCLUDE SPECIFIC Account Descriptions*/
Sum_Actual = sum(Actual);
Sum_Annual = sum(Annual_Budget);
Run_Rate = Sum_Actual*12;
if Run_Rate > Sum_Annual then Over_Budget_Alarm = 1;
run;
However, when I run this code, it does not sum by group, for example, this is the output I get:
Account_Description Sum_Actual Sum_Annual Run_Rate Over_Budget_Alarm
Postage 13 234 146
Postage 0 432 0
Postage 48 643 963 1
Postage 0 917 0
Postage 92 662 634 1
I'm looking for output where all the 'postage' are summed for Actual and Annual, leaving just one row of data.
Use PROC MEANS to summarize the data
Use a data step and IF/THEN statement to create your flags.
proc means data=have N SUM NWAY STACKODS;
class account;
var amount annual_budget;
ods output summary = summary_stats1;
output out = summary_stats2 N = SUM= / AUTONAME;
run;
data want;
set summary_stats;
if sum_actual > sum_annual_budget then flag=1;
else flag=0;
run;
SAS DATA step behavior is quite complex ("About DATA Step Execution" in SAS Language Reference: Concepts). The default behavior, that you're seeing, is: at the end of each iteration (i.e. for each input row) the row is written to the output data set, and the PDV - all data step variables - is reset.
You can't expect to write Base SAS "intuitively" without spending a few days learning it first, so I recommend using PROC SQL, unless you have a reason not to.
If you really want to aggregate in data step, you have to use something called BY groups processing: after ensuring the input data set is sorted by the BY vars, you can use something like the following:
data Test (keep = Month Account Sum_Actual Sum_Annual /*...your Run_Rate and Over_Budget_Alarm...*/);
set Combined; /* the input table */
by Month Account; /* must be sorted by these */
retain Sum_Actual Sum_Annual; /* don't clobber for each input row */
if first.account then do; /* instead do it manually for each group */
Sum_Actual = 0;
Sum_Annual = 0;
end;
/* accumulate the values from each row */
Sum_Actual = sum(Sum_Actual, Actual);
Sum_Annual = sum(Sum_Annual, Annual_Budget);
/* Note that Sum_Actual = Sum_Actual+Actual; will not work if any of the input values is 'missing'. */
if last.account then do;
/* The group has been processed.
Do any additional processing for the group as a whole, e.g.
calculate Over_Budget_Alarm. */
output; /* write one output row per group */
end;
run;
Proc SQL can be very effective for understanding aggregate data examination. With out seeing what the macros do, I would say perform the run rate checks after outputting data set test.
You don't show rows for other months, but I must presume the annual_budget values are constant across all months -- if so, I don't see a reason to ever sum annual_budget; comparing anything to sum(annual_budget) is probably at the incorrect time scale and not useful.
From the show data its hard to tell if you want to know any of these
which (or if some) months had a run_rate that exceeded the annual_budget
which (or if some) months run_rate exceeded the balance of annual_budget (i.e. the annual_budget less the prior months expenditure)
Presume each row in test is for a single year/month/costCenter/account -- if not the underlying data would have to be aggregated to that level.
Proc SQL;
* retrieve presumed constant annual_budget values from data;
* this information might (should) already exist in another table;
* presume constant annual budget value at each cost center | account combination;
* distinct because there are multiple months with the same info;
create table annual_budgets as
select distinct Cost_Center, Account, Annual_Budget
from test;
create table account_budgets as
select account, sum(annual_budget) as annual_budget
from annual_budgets
group by account;
* flag for some run rate condition;
create table annual_budget_mon_runrate_check as
select
2019 as year,
account,
sum(actual) as yr_actual, /* across all month/cost center */
min (
select annual_budget from account_budgets as inner
where inner.account = outer.account
) as account_budget,
max (
case when actual * 12 > annual_budget then 1 else 0 end
) as
excessive_runrate_flag label="At least one month had a cost center run rate that would exceed its annual_budget")
from
test as outer
group by
year, account;
You can add a where clause to restrict the accounts processed.
Changing the max to sum in the flag computation would return the number of cost center months with excessive run rates.
I am trying to make character informat from the range values given in a dataset.
Dataset : Grade
Start End Label Fmtname Type
0 20 A $grad I
21 40 B $grad I
41 60 C $grad I
61 80 D $grad I
81 100 E $grad I
And here is the code i wrote to create the informat
proc format cntlin = grade;
run;
And now the code to create a temp dataset using the new informat
data temp;
input grade : $grad. ## ;
datalines;
21 30 0 45 10
;
The output i wanted was a dataset Temp with values :
Grade
A
B
A
..
Whereas the dataset Temp has values :
Grade
21
30
0
...
SAS Log Entry :
1146 proc format cntlin = grade;
NOTE: Informat $GRAD has been output.
1147 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: There were 5 observations read from the data set WORK.GRADE.
1148
1149
1150 data temp;
1151 input grade : $grad. ## ;
1152
1153 datalines;
NOTE: SAS went to a new line when INPUT statement reached past the end of a
line.
NOTE: The data set WORK.TEMP has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
I am not able to understand why informat is not working. Can anyone please
explain where i am making my mistake.
INFORMATS convert characters to (characters or numbers). So you can't use START/END the way you are doing so, since that only works with numbers.
See the following:
proc format;
invalue $grade
'0'-'20'="A"
'21'-'40'="B"
'41'-'60'="C"
'61'-'80'="D"
'81'-'100'="E";
quit;
proc format;
invalue $grade
'21'='A';
quit;
The latter works, the former gives you an error. So, you could write a dataset with all 101 values (each on a line with START), or just write a format and do it in a second step (read in as a number and then PUT to the format).